This function creates a Relative Weights Analysis (RWA) and
returns a list of outputs. RWA provides a heuristic method for estimating
the relative weight of predictor variables in multiple regression, which
involves creating a multiple regression with on a set of transformed
predictors which are orthogonal to each other but maximally related to the
original set of predictors.
rwa()
is optimised for dplyr pipes and shows positive / negative signs for weights.
Usage
rwa(
df,
outcome,
predictors,
applysigns = FALSE,
sort = TRUE,
bootstrap = FALSE,
n_bootstrap = 1000,
conf_level = 0.95,
focal = NULL,
comprehensive = FALSE,
include_rescaled_ci = FALSE
)
Arguments
- df
Data frame or tibble to be passed through.
- outcome
Outcome variable, to be specified as a string or bare input. Must be a numeric variable.
- predictors
Predictor variable(s), to be specified as a vector of string(s) or bare input(s). All variables must be numeric.
- applysigns
Logical value specifying whether to show an estimate that applies the sign. Defaults to
FALSE
.- sort
Logical value specifying whether to sort results by rescaled relative weights in descending order. Defaults to
TRUE
.- bootstrap
Logical value specifying whether to calculate bootstrap confidence intervals. Defaults to
FALSE
.- n_bootstrap
Number of bootstrap samples to use when bootstrap = TRUE. Defaults to 1000.
- conf_level
Confidence level for bootstrap intervals. Defaults to 0.95.
- focal
Focal variable for bootstrap comparisons (optional).
- comprehensive
Whether to run comprehensive bootstrap analysis including random variable and focal comparisons.
- include_rescaled_ci
Logical value specifying whether to include confidence intervals for rescaled weights. Defaults to
FALSE
due to compositional data constraints. Use with caution.
Value
rwa()
returns a list of outputs, as follows:
predictors
: character vector of names of the predictor variables used.rsquare
: the rsquare value of the regression model.result
: the final output of the importance metrics (sorted by Rescaled.RelWeight in descending order by default).The
Rescaled.RelWeight
column sums up to 100.The
Sign
column indicates whether a predictor is positively or negatively correlated with the outcome.When bootstrap = TRUE, includes confidence interval columns for raw weights.
Rescaled weight CIs are available via include_rescaled_ci = TRUE but not recommended for inference.
n
: indicates the number of observations used in the analysis.bootstrap
: bootstrap results (only present when bootstrap = TRUE), containing:ci_results
: confidence intervals for weightsboot_object
: raw bootstrap object for advanced analysisn_bootstrap
: number of bootstrap samples used
lambda
:RXX
: Correlation matrix of all the predictor variables against each other.RXY
: Correlation values of the predictor variables against the outcome variable.
Details
rwa()
produces raw relative weight values (epsilons) as well as rescaled
weights (scaled as a percentage of predictable variance) for every predictor
in the model. Signs are added to the weights when the applysigns
argument
is set to TRUE
.
See https://www.scotttonidandel.com/rwa-web for the
original implementation that inspired this package.
Examples
library(ggplot2)
# Basic RWA (results sorted by default)
rwa(diamonds,"price",c("depth","carat"))
#> $predictors
#> [1] "depth" "carat"
#>
#> $rsquare
#> [1] 0.8506755
#>
#> $result
#> Variables Raw.RelWeight Rescaled.RelWeight Sign
#> 1 carat 0.849946308 99.91428588 +
#> 2 depth 0.000729149 0.08571412 -
#>
#> $n
#> [1] 53940
#>
#> $lambda
#> [,1] [,2]
#> [1,] 0.99990040 0.01411356
#> [2,] 0.01411356 0.99990040
#>
#> $RXX
#> depth carat
#> depth 1.00000000 0.02822431
#> carat 0.02822431 1.00000000
#>
#> $RXY
#> depth carat
#> -0.0106474 0.9215913
#>
# RWA without sorting (preserves original predictor order)
rwa(diamonds,"price",c("depth","carat"), sort = FALSE)
#> $predictors
#> [1] "depth" "carat"
#>
#> $rsquare
#> [1] 0.8506755
#>
#> $result
#> Variables Raw.RelWeight Rescaled.RelWeight Sign
#> 1 depth 0.000729149 0.08571412 -
#> 2 carat 0.849946308 99.91428588 +
#>
#> $n
#> [1] 53940
#>
#> $lambda
#> [,1] [,2]
#> [1,] 0.99990040 0.01411356
#> [2,] 0.01411356 0.99990040
#>
#> $RXX
#> depth carat
#> depth 1.00000000 0.02822431
#> carat 0.02822431 1.00000000
#>
#> $RXY
#> depth carat
#> -0.0106474 0.9215913
#>
# \donttest{
# For faster examples, use a subset of data for bootstrap
diamonds_small <- diamonds[sample(nrow(diamonds), 1000), ]
# RWA with bootstrap confidence intervals (raw weights only)
rwa(diamonds_small,"price",c("depth","carat"), bootstrap = TRUE, n_bootstrap = 100)
#> Running bootstrap analysis with 100 samples...
#> $predictors
#> [1] "depth" "carat"
#>
#> $rsquare
#> [1] 0.8499131
#>
#> $result
#> Variables Raw.RelWeight Rescaled.RelWeight Sign Raw.RelWeight.CI.Lower
#> 1 carat 0.8489637953 99.8883007 + 0.821172909
#> 2 depth 0.0009493473 0.1116993 - -0.002577354
#> Raw.RelWeight.CI.Upper Raw.Significant
#> 1 0.878932350 TRUE
#> 2 0.001598172 FALSE
#>
#> $n
#> [1] 1000
#>
#> $bootstrap
#> $bootstrap$boot_object
#>
#> ORDINARY NONPARAMETRIC BOOTSTRAP
#>
#>
#> Call:
#> boot::boot(data = bootstrap_data, statistic = rwa_boot_statistic,
#> R = n_bootstrap, outcome = outcome, predictors = predictors)
#>
#>
#> Bootstrap Statistics :
#> original bias std. error
#> t1* 0.0009493473 0.0006083043 0.001005676
#> t2* 0.8489637953 -0.0009349687 0.014284910
#>
#> $bootstrap$ci_results
#> $bootstrap$ci_results$raw_weights
#> # A tibble: 2 × 6
#> variable weight_index ci_lower ci_upper ci_method ci_type
#> <chr> <int> <dbl> <dbl> <chr> <chr>
#> 1 depth 1 -0.00258 0.00160 basic raw
#> 2 carat 2 0.821 0.879 basic raw
#>
#>
#> $bootstrap$n_bootstrap
#> [1] 100
#>
#> $bootstrap$conf_level
#> [1] 0.95
#>
#> $bootstrap$comprehensive
#> [1] FALSE
#>
#> $bootstrap$focal
#> NULL
#>
#>
#> $lambda
#> [,1] [,2]
#> [1,] 0.9998984 0.0142579
#> [2,] 0.0142579 0.9998984
#>
#> $RXX
#> depth carat
#> depth 1.0000000 0.0285129
#> carat 0.0285129 1.0000000
#>
#> $RXY
#> depth carat
#> -0.01473139 0.92099482
#>
# Include rescaled weight CIs (use with caution for inference)
rwa(diamonds_small,"price",c("depth","carat"), bootstrap = TRUE,
include_rescaled_ci = TRUE, n_bootstrap = 100)
#> Running bootstrap analysis with 100 samples...
#> Warning: Rescaled weight confidence intervals should be interpreted with caution due to compositional data constraints. Use for descriptive purposes only, not formal statistical inference.
#> $predictors
#> [1] "depth" "carat"
#>
#> $rsquare
#> [1] 0.8499131
#>
#> $result
#> Variables Raw.RelWeight Rescaled.RelWeight Sign Raw.RelWeight.CI.Lower
#> 1 carat 0.8489637953 99.8883007 + 0.820116111
#> 2 depth 0.0009493473 0.1116993 - -0.001621699
#> Raw.RelWeight.CI.Upper Raw.Significant Rescaled.RelWeight.CI.Lower
#> 1 0.874260240 TRUE 99.8007012
#> 2 0.001587324 FALSE -0.2253907
#> Rescaled.RelWeight.CI.Upper
#> 1 100.2253907
#> 2 0.1992988
#>
#> $n
#> [1] 1000
#>
#> $bootstrap
#> $bootstrap$boot_object
#>
#> ORDINARY NONPARAMETRIC BOOTSTRAP
#>
#>
#> Call:
#> boot::boot(data = bootstrap_data, statistic = rwa_boot_statistic,
#> R = n_bootstrap, outcome = outcome, predictors = predictors)
#>
#>
#> Bootstrap Statistics :
#> original bias std. error
#> t1* 0.0009493473 0.0006033209 0.0009143211
#> t2* 0.8489637953 0.0037448894 0.0127520755
#>
#> $bootstrap$boot_object_rescaled
#>
#> ORDINARY NONPARAMETRIC BOOTSTRAP
#>
#>
#> Call:
#> boot::boot(data = bootstrap_data, statistic = rwa_boot_statistic_rescaled,
#> R = n_bootstrap, outcome = outcome, predictors = predictors)
#>
#>
#> Bootstrap Statistics :
#> original bias std. error
#> t1* 0.1116993 0.05818349 0.1031148
#> t2* 99.8883007 -0.05818349 0.1031148
#>
#> $bootstrap$ci_results
#> $bootstrap$ci_results$raw_weights
#> # A tibble: 2 × 6
#> variable weight_index ci_lower ci_upper ci_method ci_type
#> <chr> <int> <dbl> <dbl> <chr> <chr>
#> 1 depth 1 -0.00162 0.00159 basic raw
#> 2 carat 2 0.820 0.874 basic raw
#>
#> $bootstrap$ci_results$rescaled_weights
#> # A tibble: 2 × 6
#> variable weight_index ci_lower ci_upper ci_method ci_type
#> <chr> <int> <dbl> <dbl> <chr> <chr>
#> 1 depth 1 -0.225 0.199 basic rescaled
#> 2 carat 2 99.8 100. basic rescaled
#>
#>
#> $bootstrap$n_bootstrap
#> [1] 100
#>
#> $bootstrap$conf_level
#> [1] 0.95
#>
#> $bootstrap$comprehensive
#> [1] FALSE
#>
#> $bootstrap$focal
#> NULL
#>
#>
#> $lambda
#> [,1] [,2]
#> [1,] 0.9998984 0.0142579
#> [2,] 0.0142579 0.9998984
#>
#> $RXX
#> depth carat
#> depth 1.0000000 0.0285129
#> carat 0.0285129 1.0000000
#>
#> $RXY
#> depth carat
#> -0.01473139 0.92099482
#>
# Comprehensive bootstrap analysis with focal variable
result <- rwa(diamonds_small,"price",c("depth","carat","table"),
bootstrap = TRUE, comprehensive = TRUE, focal = "carat",
n_bootstrap = 100)
#> Running bootstrap analysis with 100 samples...
# View confidence intervals
result$bootstrap$ci_results
#> $raw_weights
#> # A tibble: 3 × 6
#> variable weight_index ci_lower ci_upper ci_method ci_type
#> <chr> <int> <dbl> <dbl> <chr> <chr>
#> 1 depth 1 -0.00147 0.00180 basic raw
#> 2 carat 2 0.817 0.872 basic raw
#> 3 table 3 -0.00381 0.0148 basic raw
#>
#> $random_comparison
#> # A tibble: 3 × 6
#> variable weight_index ci_lower ci_upper ci_method ci_type
#> <chr> <int> <dbl> <dbl> <chr> <chr>
#> 1 Var4 1 -0.00134 0.00416 basic rand_diff
#> 2 Var5 2 0.816 0.870 basic rand_diff
#> 3 Var6 3 -0.00120 0.0156 basic rand_diff
#>
#> $focal_comparison
#> # A tibble: 2 × 6
#> variable weight_index ci_lower ci_upper ci_method ci_type
#> <chr> <int> <dbl> <dbl> <chr> <chr>
#> 1 Var7 1 -0.869 -0.813 basic focal_diff
#> 2 Var8 2 -0.865 -0.805 basic focal_diff
#>
# }