Title: | Tidy Standardized Mean Differences |
---|---|
Description: | Tidy standardized mean differences ('SMDs'). 'tidysmd' uses the 'smd' package to calculate standardized mean differences for variables in a data frame, returning the results in a tidy format. |
Authors: | Malcolm Barrett [aut, cre] |
Maintainer: | Malcolm Barrett <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.0.9000 |
Built: | 2024-11-10 05:00:23 UTC |
Source: | https://github.com/r-causal/tidysmd |
Given a data frame .df
, the function bind_matches
creates binary
indicator variables for each match returned by the MatchIt
library and
binds the resulting columns to .df
. In other words, the result is the
original data frame plus a column for however many matches you want to bind.
bind_matches(.df, ...)
bind_matches(.df, ...)
.df |
A data frame. |
... |
|
.df
with addition columns for every element of ...
geom_love()
and love_plot()
are helper functions to create Love plots in
ggplot2. Love plots are a diagnostic approach to assessing balance before and
after weighting. Many researchers use 0.1 on the absolute SMD scale to
evaluate if a variable is well-balanced between groups, although this is just
a rule of thumb. geom_love()
is a simple wrapper around
ggplot2::geom_point()
, ggplot2::geom_line()
, and
ggplot2::geom_vline()
. It also adds default aesthetics via
ggplot2::aes()
. love_plot()
is a quick plotting function that further
wraps geom_love()
. For more complex Love plots, we recommend using ggplot2
directly.
geom_love( linewidth = 0.8, line_size = NULL, point_size = 1.85, vline_xintercept = 0.1, vline_color = "grey70", vlinewidth = 0.6, vline_size = NULL ) love_plot( .df, linewidth = 0.8, line_size = NULL, point_size = 1.85, vline_xintercept = 0.1, vline_color = "grey70", vlinewidth = 0.6, vline_size = NULL )
geom_love( linewidth = 0.8, line_size = NULL, point_size = 1.85, vline_xintercept = 0.1, vline_color = "grey70", vlinewidth = 0.6, vline_size = NULL ) love_plot( .df, linewidth = 0.8, line_size = NULL, point_size = 1.85, vline_xintercept = 0.1, vline_color = "grey70", vlinewidth = 0.6, vline_size = NULL )
linewidth |
The line size, passed to |
line_size |
Deprecated. Please use |
point_size |
The point size, passed to |
vline_xintercept |
The X intercept, passed to |
vline_color |
The vertical line color, passed to
|
vlinewidth |
The vertical line size, passed to
|
vline_size |
Deprecated. Please use |
.df |
a data frame produced by |
a list of geoms
or a ggplot
plot_df <- tidy_smd( nhefs_weights, race:active, .group = qsmk, .wts = starts_with("w_") ) love_plot(plot_df) # or use ggplot2 directly library(ggplot2) ggplot( plot_df, aes( x = abs(smd), y = variable, group = method, color = method, fill = method ) ) + geom_love()
plot_df <- tidy_smd( nhefs_weights, race:active, .group = qsmk, .wts = starts_with("w_") ) love_plot(plot_df) # or use ggplot2 directly library(ggplot2) ggplot( plot_df, aes( x = abs(smd), y = variable, group = method, color = method, fill = method ) ) + geom_love()
A dataset containing various propensity score weights for
causaldata::nhefs_complete
.
nhefs_weights
nhefs_weights
A data frame with 1566 rows and 14 variables:
Quit smoking
Race
Age
Education level
Smoking intensity
Number of smoke-years
Exercise level
Daily activity level
Participant weight in 1971 (baseline)
ATE weight
ATT weight
ATC weight
ATM weight
ATO weight
tidy_smd()
calculates the standardized mean difference (SMD) for variables
in a dataset between groups. Optionally, you may also calculate weighted
SMDs. tidy_smd()
wraps smd::smd()
, returning a tidy dataframe with the
columns variable
, method
, and smd
, as well as fourth column the
contains the level of .group
the SMD represents. You may also supply
multiple weights to calculate multiple weighted SMDs, useful when comparing
different types of weights. Additionally, the .wts
argument supports
matched datasets where the variable supplied to .wts
is an binary variable
indicating whether the row was included in the match. If you're using
MatchIt, the helper function bind_matches()
will bind these indicators to
the original dataset, making it easier to compare across matching
specifications.
tidy_smd( .df, .vars, .group, .wts = NULL, include_observed = TRUE, include_unweighted = NULL, na.rm = FALSE, gref = 1L, std.error = FALSE, make_dummy_vars = FALSE )
tidy_smd( .df, .vars, .group, .wts = NULL, include_observed = TRUE, include_unweighted = NULL, na.rm = FALSE, gref = 1L, std.error = FALSE, make_dummy_vars = FALSE )
.df |
A data frame |
.vars |
Variables for which to calculate SMD |
.group |
Grouping variable |
.wts |
Variables to use for weighting the SMD calculation. These can be, for instance, propensity score weights or a binary indicator signaling whether or not a participant was included in a matching algorithm. |
include_observed |
Logical. If using |
include_unweighted |
Deprecated. Please use |
na.rm |
Remove |
gref |
an integer indicating which level of |
std.error |
Logical indicator for computing standard errors using
|
make_dummy_vars |
Logical. Transform categorical variables to dummy
variables using |
a tibble
tidy_smd(nhefs_weights, c(age, education, race), .group = qsmk) tidy_smd(nhefs_weights, c(age, education), .group = qsmk, std.error = TRUE) tidy_smd( nhefs_weights, c(age, race, education), .group = qsmk, .wts = c(w_ate, w_att, w_atm) )
tidy_smd(nhefs_weights, c(age, education, race), .group = qsmk) tidy_smd(nhefs_weights, c(age, education), .group = qsmk, std.error = TRUE) tidy_smd( nhefs_weights, c(age, race, education), .group = qsmk, .wts = c(w_ate, w_att, w_atm) )