This formula estimates an instrumental variables regression using two-stage least squares with a variety of options for robust standard errors

iv_robust(formula, data, weights, subset, clusters, se_type = NULL,
ci = TRUE, alpha = 0.05, return_vcov = TRUE, try_cholesky = FALSE)

## Arguments

formula an object of class formula of the regression and the instruments. For example, the formula y ~ x1 + x2 | z1 + z2 specifies x1 and x2 as endogenous regressors and z1 and z2 as their respective instruments. A data.frame the bare (unquoted) names of the weights variable in the supplied data. An optional bare (unquoted) expression specifying a subset of observations to be used. An optional bare (unquoted) name of the variable that corresponds to the clusters in the data. The sort of standard error sought. If clusters is not specified the options are "HC0", "HC1" (or "stata", the equivalent), "HC2" (default), "HC3", or "classical". If clusters is specified the options are "CR0", "CR2" (default), or "stata". Can also specify "none", which may speed up estimation of the coefficients. logical. Whether to compute and return p-values and confidence intervals, TRUE by default. The significance level, 0.05 by default. logical. Whether to return the variance-covariance matrix for later usage, TRUE by default. logical. Whether to try using a Cholesky decomposition to solve least squares instead of a QR decomposition, FALSE by default. Using a Cholesky decomposition may result in speed gains, but should only be used if users are sure their model is full-rank (i.e., there is no perfect multi-collinearity)

## Value

An object of class "iv_robust".

The post-estimation commands functions summary and tidy return results in a data.frame. To get useful data out of the return, you can use these data frames, you can use the resulting list directly, or you can use the generic accessor functions coef, vcov, confint, and predict.

An object of class "iv_robust" is a list containing at least the following components:

coefficients

the estimated coefficients

std.error

the estimated standard errors

df

the estimated degrees of freedom

p.value

the p-values from a two-sided t-test using coefficients, std.error, and df

ci.lower

the lower bound of the 1 - alpha percent confidence interval

ci.upper

the upper bound of the 1 - alpha percent confidence interval

term

a character vector of coefficient names

alpha

the significance level specified by the user

se_type

the standard error type specified by the user

res_var

the residual variance

N

the number of observations used

k

the number of columns in the design matrix (includes linearly dependent columns!)

rank

the rank of the fitted model

vcov

the fitted variance covariance matrix

r.squared

the $$R^2$$ of the second stage regrssion

the $$R^2$$ of the second stage regression, but penalized for having more parameters, rank

fstatistic

a vector with the value of the second stage F-statistic with the numerator and denominator degrees of freedom

weighted

whether or not weights were applied

call

the original function call

We also return terms with the second stage terms and terms_regressors with the first stage terms, both of which used by predict.

## Details

This function performs two-stage least squares estimation to fit instrumental variables regression. The syntax is similar to that in ivreg from the AER package. Regressors and instruments should be specified in a two-part formula, such as y ~ x1 + x2 | z1 + z2 + z3, where x1 and x2 are regressors and z1, z2, and z3 are instruments. Unlike ivreg, you must explicitly specify all exogenous regressors on both sides of the bar.

The default variance estimators are the same as in lm_robust. Without clusters, we default to HC2 standard errors, and with clusters we default to CR2 standard errors. 2SLS variance estimates are computed using the same estimators as in lm_robust, however the design matrix used are the second-stage regressors, which includes the estimated endogenous regressors, and the residuals used are the difference between the outcome and a fit produced by the second-stage coefficients and the first-stage (endogenous) regressors. More notes on this can be found at the mathematical appendix.

## Examples

library(fabricatr)
dat <- fabricate(
N = 40,
Y = rpois(N, lambda = 4),
Z = rbinom(N, 1, prob = 0.4),
D  = Z * rbinom(N, 1, prob = 0.8),
X = rnorm(N)
)

# Instrument for treatment D with encouragement Z
tidy(iv_robust(Y ~ D + X | Z + X, data = dat))#>          term   estimate std.error      p.value  ci.lower  ci.upper df outcome
#> 1 (Intercept)  3.7273864 0.4776022 2.471825e-09  2.759673 4.6951003 37       Y
#> 2           D -0.7100564 0.7858244 3.720629e-01 -2.302288 0.8821750 37       Y
#> 3           X  0.1560677 0.3621421 6.690000e-01 -0.577702 0.8898373 37       Y
# Instrument with Stata's ivregress 2sls , small rob HC1 variance
tidy(iv_robust(Y ~ D | Z, data = dat, se_type = "stata"))#>          term   estimate std.error      p.value  ci.lower  ci.upper df outcome
#> 1 (Intercept)  3.6666667 0.4700241 2.083704e-09  2.715153 4.6181808 38       Y
#> 2           D -0.6140351 0.7436379 4.141183e-01 -2.119451 0.8913811 38       Y
# With clusters, we use CR2 errors by default
dat\$cl <- rep(letters[1:5], length.out = nrow(dat))
tidy(iv_robust(Y ~ D | Z, data = dat, clusters = cl))#>          term   estimate std.error      p.value  ci.lower  ci.upper       df
#> 1 (Intercept)  3.6666667 0.2317241 0.0001712102  2.997917 4.3354161 3.646251
#> 2           D -0.6140351 0.4874346 0.2764953738 -1.969373 0.7413026 3.985068
#>   outcome
#> 1       Y
#> 2       Y
# Again, easy to replicate Stata (again with small correction in Stata)
tidy(iv_robust(Y ~ D | Z, data = dat, clusters = cl, se_type = "stata"))#>          term   estimate std.error      p.value  ci.lower  ci.upper df outcome
#> 1 (Intercept)  3.6666667 0.2391517 0.0001055703  3.002675 4.3306582  4       Y
#> 2           D -0.6140351 0.5047569 0.2906673250 -2.015465 0.7873946  4       Y