Horvitz-Thompson estimator of treatment effects

horvitz_thompson(formula, data, condition_prs, blocks, clusters,
  condition_pr_mat = NULL, declaration = NULL, subset,
  se_type = c("youngs", "constant"), collapsed = FALSE, alpha = 0.05,
  condition1 = NULL, condition2 = NULL)

Arguments

formula

An object of class "formula", such as Y ~ Z

data

A data.frame.

condition_prs

An optional bare (unquoted) name of the variable with the condition 2 (treatment) probabilities.

blocks

An optional bare (unquoted) name of the block variable. Use for blocked designs only.

clusters

An optional bare (unquoted) name of the variable that corresponds to the clusters in the data; used for cluster randomized designs. For blocked designs, clusters must be within blocks.

condition_pr_mat

An optional 2n * 2n matrix of marginal and joint probabilities of all units in condition1 and condition2, can be used in place of condition_prs. See details.

declaration

An object of class "ra_declaration", from the randomizr package that is an alternative way of specifying the design. Cannot be used along with any of condition_prs, blocks, clusters, or condition_pr_mat. See details.

subset

An optional bare (unquoted) expression specifying a subset of observations to be used.

se_type

can be one of c("youngs", "constant") and correspond's to estimating the standard errors using Young's inequality (default, conservative), or the constant effects assumption.

collapsed

A boolean used to collapse clusters to their cluster totals for variance estimation, FALSE by default.

alpha

The significance level, 0.05 by default.

condition1

names of the conditions to be compared. Effects are estimated with condition1 as control and condition2 as treatment. If unspecified, condition1 is the "first" condition and condition2 is the "second" according to r defaults.

condition2

names of the conditions to be compared. Effects are estimated with condition1 as control and condition2 as treatment. If unspecified, condition1 is the "first" condition and condition2 is the "second" according to r defaults.

Details

This function implements the Horvitz-Thompson estimator for treatment effects.

Examples

# Set seed set.seed(42) # Simulate data n <- 10 dat <- data.frame(y = rnorm(n)) #---------- # Simple random assignment #---------- dat$p <- 0.5 dat$z <- rbinom(n, size = 1, prob = dat$p) # If you only pass condition_prs, we assume simple random sampling horvitz_thompson(y ~ z, data = dat, condition_prs = p)
#> coefficient_name est se p ci_lower ci_upper df #> 1 z -0.2532128 0.609167 0.6885769 -1.657954 1.151529 8
# Assume constant effects instead horvitz_thompson(y ~ z, data = dat, condition_prs = p, se_type = "constant")
#> coefficient_name est se p ci_lower ci_upper df #> 1 z -0.2532128 0.6038814 0.6860232 -1.645766 1.13934 8
# Also can use randomizr to pass a declaration srs_declaration <- randomizr::declare_ra(N = nrow(dat), prob = 0.5, simple = TRUE) horvitz_thompson(y ~ z, data = dat, declaration = srs_declaration)
#> coefficient_name est se p ci_lower ci_upper df #> 1 z -0.2532128 0.609167 0.6885769 -1.657954 1.151529 8
#---------- # Complete random assignemtn #---------- dat$z <- sample(rep(0:1, each = n/2)) # Can use a declaration crs_declaration <- randomizr::declare_ra(N = nrow(dat), prob = 0.5, simple = FALSE) horvitz_thompson(y ~ z, data = dat, declaration = crs_declaration)
#> coefficient_name est se p ci_lower ci_upper df #> 1 z -0.247794 0.5729701 0.6768192 -1.569065 1.073477 8
# Can precompute condition_pr_mat and pass it # (faster for multiple runs with same condition probability matrix) crs_pr_mat <- declaration_to_condition_pr_mat(crs_declaration) horvitz_thompson(y ~ z, data = dat, condition_pr_mat = crs_pr_mat)
#> coefficient_name est se p ci_lower ci_upper df #> 1 z -0.247794 0.5729701 0.6768192 -1.569065 1.073477 8
#---------- # More complicated assignment #---------- # arbitrary permutation matrix possible_treats <- cbind( c(1, 1, 0, 1, 0, 0, 0, 1, 1, 0), c(0, 1, 1, 0, 1, 1, 0, 1, 0, 1), c(1, 0, 1, 1, 1, 1, 1, 0, 0, 0) ) arb_pr_mat <- permutations_to_condition_pr_mat(possible_treats) # Simulating a column to be realized treatment dat$z <- possible_treats[, sample(ncol(possible_treats), size = 1)] horvitz_thompson(y ~ z, data = dat, condition_pr_mat = arb_pr_mat)
#> coefficient_name est se p ci_lower ci_upper df #> 1 z -1.368389 1.033353 0.2220105 -3.751306 1.014527 8
# Clustered treatment, complete random assigment # Simulating data dat$cl <- rep(1:4, times = c(2, 2, 3, 3)) clust_crs_decl <- randomizr::declare_ra(N = nrow(dat), clust_var = dat$cl, prob = 0.5) dat$z <- randomizr::conduct_ra(clust_crs_decl) # Regular SE using Young's inequality horvitz_thompson(y ~ z, data = dat, declaration = clust_crs_decl)
#> coefficient_name est se p ci_lower ci_upper df #> 1 z 0.02766919 0.2671404 0.9200557 -0.5883577 0.6436961 8
# SE using collapsed cluster totals and Young's inequality horvitz_thompson(y ~ z, data = dat, declaration = clust_crs_decl, collapsed = TRUE)
#> coefficient_name est se p ci_lower ci_upper df #> 1 z 0.02766919 0.2311512 0.9076709 -0.5053664 0.5607048 8