| Title: | Imputation with 'mice' and Distributional Random Forests |
|---|---|
| Description: | Provides a custom imputation method for the 'mice' package based on distributional random forests. The package implements the 'mice.impute.DRF' method, which can be used within the standard 'mice' workflow. Missing values are imputed by estimating conditional distributions with distributional random forests and sampling observed responses using forest weights. |
| Authors: | Krystyna Grzesiak [aut, cre] (ORCID: <https://orcid.org/0000-0003-2581-7722>), Jeffrey Näf [aut] (ORCID: <https://orcid.org/0000-0003-0920-1899>) |
| Maintainer: | Krystyna Grzesiak <[email protected]> |
| License: | GPL-3 |
| Version: | 0.1.0 |
| Built: | 2026-06-08 20:23:03 UTC |
| Source: | https://github.com/krystynagrzesiak/micedrf |
Imputes missing values using distributional random forests within the multiple imputation by chained equations framework implemented in the mice package.
mice.impute.DRF( y, ry, x, wy = NULL, min.node.size = 1, num.features = 10, num.trees = 10, ... )mice.impute.DRF( y, ry, x, wy = NULL, min.node.size = 1, num.features = 10, num.trees = 10, ... )
y |
Vector to be imputed. |
ry |
Logical vector indicating which elements of |
x |
Numeric design matrix with |
wy |
Logical vector indicating elements of |
min.node.size |
Target minimum number of observations in each tree leaf
in the distributional random forest. Default is |
num.features |
Number of random features to sample at each split.
Default is |
num.trees |
Number of trees in the distributional random forest.
Default is |
... |
Additional arguments passed by |
This function is called internally by mice when the imputation
method is set to "DRF". For each variable with missing values, a
distributional random forest is fitted to the observed values using the
remaining variables as predictors. Missing values are then imputed by
sampling observed responses according to the forest weights.
A numeric vector of imputed values for the entries of y
indicated by wy. The vector has length sum(wy) and is
returned to mice to replace the missing values in the current
variable.
Näf, J., Scornet, E., and Josse, J. (2024). "What is a good imputation under MAR missingness?" https://arxiv.org/abs/2403.19196.
Cevid, D., Michel, L., Näf, J., Meinshausen, N., and Buehlmann, P. (2022). "Distributional random forests: Heterogeneity adjustment and multivariate distributional regression." Journal of Machine Learning Research, 23(333), 1–79.
library(mice) set.seed(123) X <- matrix(rnorm(1000), nrow = 100) X[runif(length(X)) < 0.3] <- NA imp <- mice(X, method = "DRF", m = 1, maxit = 1, printFlag = FALSE) complete(imp)library(mice) set.seed(123) X <- matrix(rnorm(1000), nrow = 100) X[runif(length(X)) < 0.3] <- NA imp <- mice(X, method = "DRF", m = 1, maxit = 1, printFlag = FALSE) complete(imp)