| Title: | Proper Scoring Rules for Missing Value Imputation |
|---|---|
| Description: | Provides tools for evaluating and ranking missing value imputation methods using proper scoring rules. Implements the Energy-I-Score and the DR-I-Score for the assessment of deterministic, stochastic and multiple imputation methods for numerical and mixed datasets. |
| Authors: | Krystyna Grzesiak [aut, cre] (ORCID: <https://orcid.org/0000-0003-2581-7722>), Loris Michel [aut, ctb], Meta-Lina Spohn [aut, ctb], Jeffrey Näf [aut, ctb] (ORCID: <https://orcid.org/0000-0003-0920-1899>) |
| Maintainer: | Krystyna Grzesiak <[email protected]> |
| License: | GPL-3 |
| Version: | 1.2.0 |
| Built: | 2026-06-09 11:02:34 UTC |
| Source: | https://github.com/krystynagrzesiak/iscores |
Calculates IScores for multiple imputation functions
compare_Iscores(X, methods_list, score = c("energy_IScore", "DR_IScore"), ...)compare_Iscores(X, methods_list, score = c("energy_IScore", "DR_IScore"), ...)
X |
data containing missing values denoted with NA's. |
methods_list |
a named list of imputing functions. |
score |
a vector of names of scores to calculate. It can be
|
... |
other arguments to be passed to energy_IScore or DR_IScore |
a vector of IScores for provided methods
set.seed(111) X <- Iscores:::random_mcar_data(100, 3, 0.2) methods_list <- list(exp = Iscores:::exp_imputation, norm = Iscores:::norm_imputation) compare_Iscores(X, methods_list = methods_list, m = 2, n_proj = 10, n_trees_per_proj = 2 )set.seed(111) X <- Iscores:::random_mcar_data(100, 3, 0.2) methods_list <- list(exp = Iscores:::exp_imputation, norm = Iscores:::norm_imputation) compare_Iscores(X, methods_list = methods_list, m = 2, n_proj = 10, n_trees_per_proj = 2 )
Compute the imputation KL-based scoring rules
DR_IScore( X, imputation_func = NULL, X_imp = NULL, m = 5, n_proj = 100, n_trees_per_proj = 5, min_node_size = 10, n_cores = 1, projection_function = NULL, ... )DR_IScore( X, imputation_func = NULL, X_imp = NULL, m = 5, n_proj = 100, n_trees_per_proj = 5, min_node_size = 10, n_cores = 1, projection_function = NULL, ... )
X |
data containing missing values denoted with NA's. |
imputation_func |
an imputing function. If |
X_imp |
a list of imputed datasets. If |
m |
the number of multiple imputations to consider, default to 5. |
n_proj |
an integer specifying the number of projections to consider for the score. |
n_trees_per_proj |
an integer, the number of trees per projection. |
min_node_size |
the minimum number of nodes in a tree. |
n_cores |
an integer, the number of cores to use. |
projection_function |
a function providing the user-specific projections. |
... |
used for compatibility |
a vector made of the scores for each imputation method.
This method is described in detail in:
Näf, Jeffrey, Meta-Lina Spohn, Loris Michel, and Nicolai Meinshausen. 2022. “Imputation Scores.” https://arxiv.org/abs/2106.03742.
set.seed(111) X <- Iscores:::random_mcar_data(100, 3, 0.2) imputation_func <- Iscores:::exp_imputation DR_IScore(X, imputation_func, m = 2, n_proj = 10, n_trees_per_proj = 2 )set.seed(111) X <- Iscores:::random_mcar_data(100, 3, 0.2) imputation_func <- Iscores:::exp_imputation DR_IScore(X, imputation_func, m = 2, n_proj = 10, n_trees_per_proj = 2 )
Calculating energy distance/statistic.
edistance(X, X_imp, rescale = FALSE)edistance(X, X_imp, rescale = FALSE)
X |
a complete original dataset |
X_imp |
an imputed dataset |
rescale |
a logical, indicating whether the returned value should be
rescaled. Default to |
This function uses the eqdist.e function. According to this implementation, by default, the function returns the energy statistic which is given by
where is the raw energy distance. To
obtain raw energy distance use rescale = TRUE.
X <- matrix(rnorm(100), nrow = 25) X_imp <- matrix(rnorm(100), nrow = 25) edistance(X, X_imp)X <- matrix(rnorm(100), nrow = 25) X_imp <- matrix(rnorm(100), nrow = 25) edistance(X, X_imp)
Calculates Imputation Score for imputation function
energy_IScore( X, imputation_func, X_imp = NULL, multiple = TRUE, N = 50, max_length = NULL, skip_if_needed = TRUE, scale = FALSE, n_cores = 1, silent = TRUE )energy_IScore( X, imputation_func, X_imp = NULL, multiple = TRUE, N = 50, max_length = NULL, skip_if_needed = TRUE, scale = FALSE, n_cores = 1, silent = TRUE )
X |
data containing missing values denoted with NA's. |
imputation_func |
a function that imputes data. |
X_imp |
imputed dataset of the same size as |
multiple |
a logical indicating whether provided imputation method is a multiple imputation approach (i.e. it generates different values to impute for each call). Default to TRUE. Note that if multiple equals to FALSE, N is automatically set to 1. |
N |
a numeric value. Number of samples from imputation distribution H. Default to 50. |
max_length |
Maximum number of variables |
skip_if_needed |
logical, indicating whether some observations should be skipped to obtain complete columns for scoring. If FALSE, NA will be returned for column with no observed variable for training. |
scale |
a logical value. If TRUE, each variable is scaled in the score. |
n_cores |
a number of cores for parallelization. |
silent |
logical indicating whether warnings and messages should be printed. |
This function relies on functions energy_Iscore_num and energy_Iscore_cat. Depending on the presence of factor-type data, these functions compute a score either for purely numerical data or for mixed data types.
If you want to compute the score for numerical data, make sure that the dataset does not contain any factor-type variables.
If you want to compute the score for categorical data, ensure that all categorical variables are preserved as factors.
If your imputation method does not support categorical variables represented as factors, implement a wrapper function that handles the appropriate data type conversions before and after imputation.
a numerical value denoting weighted Imputation Score obtained for provided imputation function and a table with scores and weights calculated for particular columns.
Näf, J., Grzesiak, K., and Scornet, E. (2025). How to rank imputation methods? arXiv preprint. doi:10.48550/arXiv.2507.11297.
set.seed(111) X <- Iscores:::random_mcar_data(100, 4) imputation_func <- Iscores:::exp_imputation energy_IScore(X, imputation_func) X <- Iscores:::random_mcar_mixed_data(100, 4, 2) imputation_func <- Iscores:::median_mode_imputation energy_IScore(X, imputation_func)set.seed(111) X <- Iscores:::random_mcar_data(100, 4) imputation_func <- Iscores:::exp_imputation energy_IScore(X, imputation_func) X <- Iscores:::random_mcar_mixed_data(100, 4, 2) imputation_func <- Iscores:::median_mode_imputation energy_IScore(X, imputation_func)