Causal Inference
Econ 570 Big Data Econometrics - 2nd TA session
Causal Inference
The official website for Causalinference is
https://causalinferenceinpython.org
For an overview of the main features and uses of Causalinference, please refer to
https://github.com/laurencium/causalinference/blob/master/docs/tex/vignette.pdf
A blog dedicated to providing a more detailed walkthrough of Causalinference and the econometric theory behind it can be found at
https://laurencewong.com/software/
Guide document
from https://github.com/laurencium/causalinference/blob/master/docs/tex/vignette.pdf
- Setting and Notation
Let $Y(0)$ denote the potential outcome of a subject in the absence of treatment, and let $Y(1)$ denote the unit’s potential outcome when it is treated. Let $D$ denote treatment status, with $D = 1$ indicating treatment and $D = 0$ indicating control, and let $X$ be a $K$-column vector of covariates or individual characteristics.
For unit $i, i = 1, 2, … ,N$, the observed outcome can be written as
\[Y_{i} = (1 - D_{i})Y_{i}(0) + D_{i}Y_{i}(1).\]The set of observables $(Y_{i},D_{i},X_{i}), i = 1, 2, … ,N$, forms the basic input data set for Causalinference.
Causalinference is appropriate for settings in which treatment can be said to be strongly ignorable, as defined in Rosenbaum and Rubin (1983). That is, for all $x$ in the support of $X$, we have
(i) Unconfoundedness: $D$ is independent of $(Y(0), Y(1))$ conditional on $X = x$;
(ii) Overlap: $c < P(D = 1 | X = x) < 1 - c,$ for some $c > 0$. |
- Initializaion
from causalinference import CausalModel
from causalinference.utils import random_data
Y, D, X = random_data()
causal = CausalModel(Y, D, X)
help(random_data)
Help on function random_data in module causalinference.utils.tools:
random_data(N=5000, K=3, unobservables=False, **kwargs)
Function that generates data according to one of two simple models that
satisfies the unconfoundedness assumption.
The covariates and error terms are generated according to
X ~ N(mu, Sigma), epsilon ~ N(0, Gamma).
The counterfactual outcomes are generated by
Y0 = X*beta + epsilon_0,
Y1 = delta + X*(beta+theta) + epsilon_1.
Selection is done according to the following propensity score function:
P(D=1|X) = Lambda(X*beta).
Here Lambda is the standard logistic CDF.
Parameters
----------
N: int
Number of units to draw. Defaults to 5000.
K: int
Number of covariates. Defaults to 3.
unobservables: bool
Returns potential outcomes and true propensity score
in addition to observed outcome and covariates if True.
Defaults to False.
mu, Sigma, Gamma, beta, delta, theta: NumPy ndarrays, optional
Parameter values appearing in data generating process.
Returns
-------
tuple
A tuple in the form of (Y, D, X) or (Y, D, X, Y0, Y1) of
observed outcomes, treatment indicators, covariate matrix,
and potential outomces.
import numpy as np
def random_data(N=5000, K=3, unobservables=False, **kwargs):
"""
Function that generates data according to one of two simple models that
satisfies the unconfoundedness assumption.
The covariates and error terms are generated according to
X ~ N(mu, Sigma), epsilon ~ N(0, Gamma).
The counterfactual outcomes are generated by
Y0 = X*beta + epsilon_0,
Y1 = delta + X*(beta+theta) + epsilon_1.
Selection is done according to the following propensity score function:
P(D=1|X) = Lambda(X*beta).
Here Lambda is the standard logistic CDF.
Parameters
----------
N: int
Number of units to draw. Defaults to 5000.
K: int
Number of covariates. Defaults to 3.
unobservables: bool
Returns potential outcomes and true propensity score
in addition to observed outcome and covariates if True.
Defaults to False.
mu, Sigma, Gamma, beta, delta, theta: NumPy ndarrays, optional
Parameter values appearing in data generating process.
Returns
-------
tuple
A tuple in the form of (Y, D, X) or (Y, D, X, Y0, Y1) of
observed outcomes, treatment indicators, covariate matrix,
and potential outomces.
"""
mu = kwargs.get('mu', np.zeros(K))
beta = kwargs.get('beta', np.ones(K))
theta = kwargs.get('theta', np.ones(K))
delta = kwargs.get('delta', 3)
Sigma = kwargs.get('Sigma', np.identity(K))
Gamma = kwargs.get('Gamma', np.identity(2))
X = np.random.multivariate_normal(mean=mu, cov=Sigma, size=N)
Xbeta = X.dot(beta)
pscore = logistic.cdf(Xbeta)
D = np.array([np.random.binomial(1, p, size=1) for p in pscore]).flatten()
epsilon = np.random.multivariate_normal(mean=np.zeros(2), cov=Gamma, size=N)
Y0 = Xbeta + epsilon[:,0]
Y1 = delta + X.dot(beta+theta) + epsilon[:,1]
Y = (1-D)*Y0 + D*Y1
if unobservables:
return Y, D, X, Y0, Y1, pscore
else:
return Y, D, X
help(CausalModel)
Help on class CausalModel in module causalinference.causal:
class CausalModel(builtins.object)
| CausalModel(Y, D, X)
|
| Class that provides the main tools of Causal Inference.
|
| Methods defined here:
|
| __init__(self, Y, D, X)
| Initialize self. See help(type(self)) for accurate signature.
|
| est_propensity(self, lin='all', qua=None)
| Estimates the propensity scores given list of covariates to
| include linearly or quadratically.
|
| The propensity score is the conditional probability of
| receiving the treatment given the observed covariates.
| Estimation is done via a logistic regression.
|
| Parameters
| ----------
| lin: string or list, optional
| Column numbers (zero-based) of variables of
| the original covariate matrix X to include
| linearly. Defaults to the string 'all', which
| uses whole covariate matrix.
| qua: list, optional
| Tuples indicating which columns of the original
| covariate matrix to multiply and include. E.g.,
| [(1,1), (2,3)] indicates squaring the 2nd column
| and including the product of the 3rd and 4th
| columns. Default is to not include any
| quadratic terms.
|
| est_propensity_s(self, lin_B=None, C_lin=1, C_qua=2.71)
| Estimates the propensity score with covariates selected using
| the algorithm suggested by [1]_.
|
| The propensity score is the conditional probability of
| receiving the treatment given the observed covariates.
| Estimation is done via a logistic regression.
|
| The covariate selection algorithm is based on a sequence
| of likelihood ratio tests.
|
| Parameters
| ----------
| lin_B: list, optional
| Column numbers (zero-based) of variables of
| the original covariate matrix X to include
| linearly. Defaults to empty list, meaning
| every column of X is subjected to the
| selection algorithm.
| C_lin: scalar, optional
| Critical value used in likelihood ratio tests
| to decide whether candidate linear terms should
| be included. Defaults to 1 as in [1]_.
| C_qua: scalar, optional
| Critical value used in likelihood ratio tests
| to decide whether candidate quadratic terms
| should be included. Defaults to 2.71 as in
| [1]_.
|
| References
| ----------
| .. [1] Imbens, G. & Rubin, D. (2015). Causal Inference in
| Statistics, Social, and Biomedical Sciences: An
| Introduction.
|
| est_via_blocking(self, adj=1)
| Estimates average treatment effects using regression within
| blocks.
|
| This method should only be executed after the sample has been
| stratified.
|
| Parameters
| ----------
| adj: int (0, 1, or 2)
| Indicates how covariate adjustments are to be
| performed for each within-bin regression.
| Set adj = 0 to not include any covariates.
| Set adj = 1 to include treatment indicator D
| and covariates X separately. Set adj = 2 to
| additionally include interaction terms between
| D and X. Defaults to 1.
|
| est_via_matching(self, weights='inv', matches=1, bias_adj=False)
| Estimates average treatment effects using nearest-
| neighborhood matching.
|
| Matching is done with replacement. Method supports multiple
| matching. Correcting bias that arise due to imperfect matches
| is also supported. For details on methodology, see [1]_.
|
| Parameters
| ----------
| weights: str or positive definite square matrix
| Specifies weighting matrix used in computing
| distance measures. Defaults to string 'inv',
| which does inverse variance weighting. String
| 'maha' gives the weighting matrix used in the
| Mahalanobis metric.
| matches: int
| Number of matches to use for each subject.
| bias_adj: bool
| Specifies whether bias adjustments should be
| attempted.
|
| References
| ----------
| .. [1] Imbens, G. & Rubin, D. (2015). Causal Inference in
| Statistics, Social, and Biomedical Sciences: An
| Introduction.
|
| est_via_ols(self, adj=2)
| Estimates average treatment effects using least squares.
|
| Parameters
| ----------
| adj: int (0, 1, or 2)
| Indicates how covariate adjustments are to be
| performed. Set adj = 0 to not include any
| covariates. Set adj = 1 to include treatment
| indicator D and covariates X separately. Set
| adj = 2 to additionally include interaction
| terms between D and X. Defaults to 2.
|
| est_via_weighting(self)
| Estimates average treatment effects using doubly-robust
| version of the Horvitz-Thompson weighting estimator.
|
| reset(self)
| Reinitializes data to original inputs, and drops any estimated
| results.
|
| stratify(self)
| Stratifies the sample based on propensity score.
|
| By default the sample is divided into five equal-sized bins.
| The number of bins can be set by modifying the object
| attribute named blocks. Alternatively, custom-sized bins can
| be created by setting blocks equal to a sorted list of numbers
| between 0 and 1 indicating the bin boundaries.
|
| This method should only be executed after the propensity score
| has been estimated.
|
| stratify_s(self)
| Stratifies the sample based on propensity score using the
| bin selection procedure suggested by [1]_.
|
| The bin selection algorithm is based on a sequence of
| two-sample t tests performed on the log-odds ratio.
|
| This method should only be executed after the propensity score
| has been estimated.
|
| References
| ----------
| .. [1] Imbens, G. & Rubin, D. (2015). Causal Inference in
| Statistics, Social, and Biomedical Sciences: An
| Introduction.
|
| trim(self)
| Trims data based on propensity score to create a subsample with
| better covariate balance.
|
| The default cutoff value is set to 0.1. To set a custom cutoff
| value, modify the object attribute named cutoff directly.
|
| This method should only be executed after the propensity score
| has been estimated.
|
| trim_s(self)
| Trims data based on propensity score using the cutoff
| selection algorithm suggested by [1]_.
|
| This method should only be executed after the propensity score
| has been estimated.
|
| References
| ----------
| .. [1] Crump, R., Hotz, V., Imbens, G., & Mitnik, O. (2009).
| Dealing with Limited Overlap in Estimation of
| Average Treatment Effects. Biometrika, 96, 187-199.
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __dict__
| dictionary for instance variables (if defined)
|
| __weakref__
| list of weak references to the object (if defined)
- Summary Statistics
print(causal.summary_stats)
Summary Statistics
Controls (N_c=2476) Treated (N_t=2524)
Variable Mean S.d. Mean S.d. Raw-diff
--------------------------------------------------------------------------------
Y -0.981 1.758 4.907 2.969 5.888
Controls (N_c=2476) Treated (N_t=2524)
Variable Mean S.d. Mean S.d. Nor-diff
--------------------------------------------------------------------------------
X0 -0.370 0.945 0.353 0.925 0.773
X1 -0.305 0.932 0.298 0.951 0.640
X2 -0.316 0.937 0.299 0.965 0.646
causal.summary_stats.keys()
dict_keys(['N', 'K', 'N_c', 'N_t', 'Y_c_mean', 'Y_t_mean', 'Y_c_sd', 'Y_t_sd', 'rdiff', 'X_c_mean', 'X_t_mean', 'X_c_sd', 'X_t_sd', 'ndiff'])
- Least Squares Estimation
causal.est_via_ols()
print(causal.estimates)
Treatment Effect Estimates: OLS
Est. S.e. z P>|z| [95% Conf. int.]
--------------------------------------------------------------------------------
ATE 2.960 0.035 84.511 0.000 2.891 3.029
ATC 1.992 0.041 48.857 0.000 1.912 2.072
ATT 3.909 0.039 99.022 0.000 3.832 3.987
- Propensity Score Estimation
help(causal.est_propensity_s) # lin_B=[5,6]
Help on method est_propensity_s in module causalinference.causal:
est_propensity_s(lin_B=None, C_lin=1, C_qua=2.71) method of causalinference.causal.CausalModel instance
Estimates the propensity score with covariates selected using
the algorithm suggested by [1]_.
The propensity score is the conditional probability of
receiving the treatment given the observed covariates.
Estimation is done via a logistic regression.
The covariate selection algorithm is based on a sequence
of likelihood ratio tests.
Parameters
----------
lin_B: list, optional
Column numbers (zero-based) of variables of
the original covariate matrix X to include
linearly. Defaults to empty list, meaning
every column of X is subjected to the
selection algorithm.
C_lin: scalar, optional
Critical value used in likelihood ratio tests
to decide whether candidate linear terms should
be included. Defaults to 1 as in [1]_.
C_qua: scalar, optional
Critical value used in likelihood ratio tests
to decide whether candidate quadratic terms
should be included. Defaults to 2.71 as in
[1]_.
References
----------
.. [1] Imbens, G. & Rubin, D. (2015). Causal Inference in
Statistics, Social, and Biomedical Sciences: An
Introduction.
causal.est_propensity_s()
print(causal.propensity)
Estimated Parameters of Propensity Score
Coef. S.e. z P>|z| [95% Conf. int.]
--------------------------------------------------------------------------------
Intercept -0.011 0.043 -0.269 0.788 -0.095 0.072
X0 1.112 0.042 26.228 0.000 1.029 1.195
X1 0.978 0.041 23.666 0.000 0.897 1.059
X2 0.962 0.041 23.549 0.000 0.882 1.042
X2*X2 0.052 0.029 1.785 0.074 -0.005 0.110
causal.propensity.keys()
dict_keys(['lin', 'qua', 'coef', 'loglike', 'fitted', 'se'])
- Improving Covariate Balance
causal.cutoff #default: 0.1
0.1
causal.trim_s() #optimal cutoff
causal.cutoff
0.09833721388327077
print(causal.summary_stats)
Summary Statistics
Controls (N_c=1986) Treated (N_t=1989)
Variable Mean S.d. Mean S.d. Raw-diff
--------------------------------------------------------------------------------
Y -0.533 1.442 4.032 2.203 4.565
Controls (N_c=1986) Treated (N_t=1989)
Variable Mean S.d. Mean S.d. Nor-diff
--------------------------------------------------------------------------------
X0 -0.205 0.861 0.205 0.851 0.478
X1 -0.172 0.886 0.160 0.898 0.372
X2 -0.180 0.894 0.148 0.898 0.367
- Stratifying the Sample
causal.stratify_s()
print(causal.strata)
Stratification Summary
Propensity Score Sample Size Ave. Propensity Outcome
Stratum Min. Max. Controls Treated Controls Treated Raw-diff
--------------------------------------------------------------------------------
1 0.098 0.141 228 22 0.118 0.120 1.198
2 0.141 0.191 205 43 0.164 0.167 1.427
3 0.192 0.290 385 112 0.240 0.238 1.670
4 0.290 0.402 324 173 0.343 0.348 2.338
5 0.402 0.510 275 221 0.454 0.456 2.754
6 0.511 0.612 204 293 0.559 0.563 3.175
7 0.612 0.714 173 324 0.664 0.665 3.553
8 0.714 0.808 119 378 0.759 0.761 4.080
9 0.809 0.902 73 423 0.852 0.858 4.891
# Use this to report the summary statistics for each bin
for stratum in causal.strata:
stratum.est_via_ols(adj=1)
[stratum.estimates['ols']['ate'] for stratum in causal.strata]
[1.1621855864527622,
1.397114235686305,
1.6744682989028234,
2.318619817003348,
2.7066270031865116,
3.143493196600958,
3.564068849326938,
4.079812390028236,
4.8196992785822]
- Treatment Effect Estimation
causal.est_via_ols()
causal.est_via_weighting()
causal.est_via_blocking()
causal.est_via_matching(bias_adj=True)
print(causal.estimates)
Treatment Effect Estimates: OLS
Est. S.e. z P>|z| [95% Conf. int.]
--------------------------------------------------------------------------------
ATE 2.938 0.037 79.193 0.000 2.866 3.011
ATC 2.391 0.042 56.812 0.000 2.309 2.474
ATT 3.485 0.040 87.042 0.000 3.406 3.563
Treatment Effect Estimates: Weighting
Est. S.e. z P>|z| [95% Conf. int.]
--------------------------------------------------------------------------------
ATE 2.953 0.043 68.165 0.000 2.868 3.038
Treatment Effect Estimates: Blocking
Est. S.e. z P>|z| [95% Conf. int.]
--------------------------------------------------------------------------------
ATE 2.947 0.039 74.884 0.000 2.870 3.025
ATC 2.410 0.048 50.531 0.000 2.317 2.504
ATT 3.484 0.042 82.390 0.000 3.401 3.567
Treatment Effect Estimates: Matching
Est. S.e. z P>|z| [95% Conf. int.]
--------------------------------------------------------------------------------
ATE 2.996 0.075 40.175 0.000 2.849 3.142
ATC 2.455 0.091 27.104 0.000 2.277 2.632
ATT 3.536 0.087 40.549 0.000 3.365 3.707
help(causal.est_via_matching)
Help on method est_via_matching in module causalinference.causal:
est_via_matching(weights='inv', matches=1, bias_adj=False) method of causalinference.causal.CausalModel instance
Estimates average treatment effects using nearest-
neighborhood matching.
Matching is done with replacement. Method supports multiple
matching. Correcting bias that arise due to imperfect matches
is also supported. For details on methodology, see [1]_.
Parameters
----------
weights: str or positive definite square matrix
Specifies weighting matrix used in computing
distance measures. Defaults to string 'inv',
which does inverse variance weighting. String
'maha' gives the weighting matrix used in the
Mahalanobis metric.
matches: int
Number of matches to use for each subject.
bias_adj: bool
Specifies whether bias adjustments should be
attempted.
References
----------
.. [1] Imbens, G. & Rubin, D. (2015). Causal Inference in
Statistics, Social, and Biomedical Sciences: An
Introduction.