Causal Inference

12 minute read

Econ 570 Big Data Econometrics - 2nd TA session

Causal Inference

The official website for Causalinference is

https://causalinferenceinpython.org

For an overview of the main features and uses of Causalinference, please refer to

https://github.com/laurencium/causalinference/blob/master/docs/tex/vignette.pdf

A blog dedicated to providing a more detailed walkthrough of Causalinference and the econometric theory behind it can be found at

https://laurencewong.com/software/


Guide document

from https://github.com/laurencium/causalinference/blob/master/docs/tex/vignette.pdf


  1. Setting and Notation

Let $Y(0)$ denote the potential outcome of a subject in the absence of treatment, and let $Y(1)$ denote the unit’s potential outcome when it is treated. Let $D$ denote treatment status, with $D = 1$ indicating treatment and $D = 0$ indicating control, and let $X$ be a $K$-column vector of covariates or individual characteristics.

For unit $i, i = 1, 2, … ,N$, the observed outcome can be written as

\[Y_{i} = (1 - D_{i})Y_{i}(0) + D_{i}Y_{i}(1).\]

The set of observables $(Y_{i},D_{i},X_{i}), i = 1, 2, … ,N$, forms the basic input data set for Causalinference.

Causalinference is appropriate for settings in which treatment can be said to be strongly ignorable, as defined in Rosenbaum and Rubin (1983). That is, for all $x$ in the support of $X$, we have

(i) Unconfoundedness: $D$ is independent of $(Y(0), Y(1))$ conditional on $X = x$;

(ii) Overlap: $c < P(D = 1 X = x) < 1 - c,$ for some $c > 0$.
  1. Initializaion
from causalinference import CausalModel
from causalinference.utils import random_data
Y, D, X = random_data()
causal = CausalModel(Y, D, X)
help(random_data)
Help on function random_data in module causalinference.utils.tools:

random_data(N=5000, K=3, unobservables=False, **kwargs)
    Function that generates data according to one of two simple models that
    satisfies the unconfoundedness assumption.
    
    The covariates and error terms are generated according to
            X ~ N(mu, Sigma), epsilon ~ N(0, Gamma).
    
    The counterfactual outcomes are generated by
            Y0 = X*beta + epsilon_0,
            Y1 = delta + X*(beta+theta) + epsilon_1.
    
    Selection is done according to the following propensity score function:
            P(D=1|X) = Lambda(X*beta).
    
    Here Lambda is the standard logistic CDF.
    
    Parameters
    ----------
    N: int
            Number of units to draw. Defaults to 5000.
    K: int
            Number of covariates. Defaults to 3.
    unobservables: bool
            Returns potential outcomes and true propensity score
            in addition to observed outcome and covariates if True.
            Defaults to False.
    mu, Sigma, Gamma, beta, delta, theta: NumPy ndarrays, optional
            Parameter values appearing in data generating process.
    
    Returns
    -------
    tuple
            A tuple in the form of (Y, D, X) or (Y, D, X, Y0, Y1) of
            observed outcomes, treatment indicators, covariate matrix,
            and potential outomces.
import numpy as np
def random_data(N=5000, K=3, unobservables=False, **kwargs):

	"""
	Function that generates data according to one of two simple models that
	satisfies the unconfoundedness assumption.
	The covariates and error terms are generated according to
		X ~ N(mu, Sigma), epsilon ~ N(0, Gamma).
	The counterfactual outcomes are generated by
		Y0 = X*beta + epsilon_0,
		Y1 = delta + X*(beta+theta) + epsilon_1.
	Selection is done according to the following propensity score function:
		P(D=1|X) = Lambda(X*beta).
	Here Lambda is the standard logistic CDF.
	Parameters
	----------
	N: int
		Number of units to draw. Defaults to 5000.
	K: int
		Number of covariates. Defaults to 3.
	unobservables: bool
		Returns potential outcomes and true propensity score
		in addition to observed outcome and covariates if True.
		Defaults to False.
	mu, Sigma, Gamma, beta, delta, theta: NumPy ndarrays, optional
		Parameter values appearing in data generating process.
	Returns
	-------
	tuple
		A tuple in the form of (Y, D, X) or (Y, D, X, Y0, Y1) of
		observed outcomes, treatment indicators, covariate matrix,
		and potential outomces.
	"""

	mu = kwargs.get('mu', np.zeros(K))
	beta = kwargs.get('beta', np.ones(K))
	theta = kwargs.get('theta', np.ones(K))
	delta = kwargs.get('delta', 3)
	Sigma = kwargs.get('Sigma', np.identity(K))
	Gamma = kwargs.get('Gamma', np.identity(2))

	X = np.random.multivariate_normal(mean=mu, cov=Sigma, size=N)
	Xbeta = X.dot(beta)
	pscore = logistic.cdf(Xbeta)
	D = np.array([np.random.binomial(1, p, size=1) for p in pscore]).flatten()

	epsilon = np.random.multivariate_normal(mean=np.zeros(2), cov=Gamma, size=N)
	Y0 = Xbeta + epsilon[:,0]
	Y1 = delta + X.dot(beta+theta) + epsilon[:,1]
	Y = (1-D)*Y0 + D*Y1

	if unobservables:
		return Y, D, X, Y0, Y1, pscore
	else:
		return Y, D, X
help(CausalModel)
Help on class CausalModel in module causalinference.causal:

class CausalModel(builtins.object)
 |  CausalModel(Y, D, X)
 |  
 |  Class that provides the main tools of Causal Inference.
 |  
 |  Methods defined here:
 |  
 |  __init__(self, Y, D, X)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  est_propensity(self, lin='all', qua=None)
 |      Estimates the propensity scores given list of covariates to
 |      include linearly or quadratically.
 |      
 |      The propensity score is the conditional probability of
 |      receiving the treatment given the observed covariates.
 |      Estimation is done via a logistic regression.
 |      
 |      Parameters
 |      ----------
 |      lin: string or list, optional
 |              Column numbers (zero-based) of variables of
 |              the original covariate matrix X to include
 |              linearly. Defaults to the string 'all', which
 |              uses whole covariate matrix.
 |      qua: list, optional
 |              Tuples indicating which columns of the original
 |              covariate matrix to multiply and include. E.g.,
 |              [(1,1), (2,3)] indicates squaring the 2nd column
 |              and including the product of the 3rd and 4th
 |              columns. Default is to not include any
 |              quadratic terms.
 |  
 |  est_propensity_s(self, lin_B=None, C_lin=1, C_qua=2.71)
 |      Estimates the propensity score with covariates selected using
 |      the algorithm suggested by [1]_.
 |      
 |      The propensity score is the conditional probability of
 |      receiving the treatment given the observed covariates.
 |      Estimation is done via a logistic regression.
 |      
 |      The covariate selection algorithm is based on a sequence
 |      of likelihood ratio tests.
 |      
 |      Parameters
 |      ----------
 |      lin_B: list, optional
 |              Column numbers (zero-based) of variables of
 |              the original covariate matrix X to include
 |              linearly. Defaults to empty list, meaning
 |              every column of X is subjected to the
 |              selection algorithm.
 |      C_lin: scalar, optional
 |              Critical value used in likelihood ratio tests
 |              to decide whether candidate linear terms should
 |              be included. Defaults to 1 as in [1]_.
 |      C_qua: scalar, optional
 |              Critical value used in likelihood ratio tests
 |              to decide whether candidate quadratic terms
 |              should be included. Defaults to 2.71 as in
 |              [1]_.
 |      
 |      References
 |      ----------
 |      .. [1] Imbens, G. & Rubin, D. (2015). Causal Inference in
 |              Statistics, Social, and Biomedical Sciences: An
 |              Introduction.
 |  
 |  est_via_blocking(self, adj=1)
 |      Estimates average treatment effects using regression within
 |      blocks.
 |      
 |      This method should only be executed after the sample has been
 |      stratified.
 |      
 |      Parameters
 |      ----------
 |      adj: int (0, 1, or 2)
 |              Indicates how covariate adjustments are to be
 |              performed for each within-bin regression.
 |              Set adj = 0 to not include any covariates.
 |              Set adj = 1 to include treatment indicator D
 |              and covariates X separately. Set adj = 2 to
 |              additionally include interaction terms between
 |              D and X. Defaults to 1.
 |  
 |  est_via_matching(self, weights='inv', matches=1, bias_adj=False)
 |      Estimates average treatment effects using nearest-
 |      neighborhood matching.
 |      
 |      Matching is done with replacement. Method supports multiple
 |      matching. Correcting bias that arise due to imperfect matches
 |      is also supported. For details on methodology, see [1]_.
 |      
 |      Parameters
 |      ----------
 |      weights: str or positive definite square matrix
 |              Specifies weighting matrix used in computing
 |              distance measures. Defaults to string 'inv',
 |              which does inverse variance weighting. String
 |              'maha' gives the weighting matrix used in the
 |              Mahalanobis metric.
 |      matches: int
 |              Number of matches to use for each subject.
 |      bias_adj: bool
 |              Specifies whether bias adjustments should be
 |              attempted.
 |      
 |      References
 |      ----------
 |      .. [1] Imbens, G. & Rubin, D. (2015). Causal Inference in
 |              Statistics, Social, and Biomedical Sciences: An
 |              Introduction.
 |  
 |  est_via_ols(self, adj=2)
 |      Estimates average treatment effects using least squares.
 |      
 |      Parameters
 |      ----------
 |      adj: int (0, 1, or 2)
 |              Indicates how covariate adjustments are to be
 |              performed. Set adj = 0 to not include any
 |              covariates.  Set adj = 1 to include treatment
 |              indicator D and covariates X separately. Set
 |              adj = 2 to additionally include interaction
 |              terms between D and X. Defaults to 2.
 |  
 |  est_via_weighting(self)
 |      Estimates average treatment effects using doubly-robust
 |      version of the Horvitz-Thompson weighting estimator.
 |  
 |  reset(self)
 |      Reinitializes data to original inputs, and drops any estimated
 |      results.
 |  
 |  stratify(self)
 |      Stratifies the sample based on propensity score.
 |      
 |      By default the sample is divided into five equal-sized bins.
 |      The number of bins can be set by modifying the object
 |      attribute named blocks. Alternatively, custom-sized bins can
 |      be created by setting blocks equal to a sorted list of numbers
 |      between 0 and 1 indicating the bin boundaries.
 |      
 |      This method should only be executed after the propensity score
 |      has been estimated.
 |  
 |  stratify_s(self)
 |      Stratifies the sample based on propensity score using the
 |      bin selection procedure suggested by [1]_.
 |      
 |      The bin selection algorithm is based on a sequence of
 |      two-sample t tests performed on the log-odds ratio.
 |      
 |      This method should only be executed after the propensity score
 |      has been estimated.
 |      
 |      References
 |      ----------
 |      .. [1] Imbens, G. & Rubin, D. (2015). Causal Inference in
 |              Statistics, Social, and Biomedical Sciences: An
 |              Introduction.
 |  
 |  trim(self)
 |      Trims data based on propensity score to create a subsample with
 |      better covariate balance.
 |      
 |      The default cutoff value is set to 0.1. To set a custom cutoff
 |      value, modify the object attribute named cutoff directly.
 |      
 |      This method should only be executed after the propensity score
 |      has been estimated.
 |  
 |  trim_s(self)
 |      Trims data based on propensity score using the cutoff
 |      selection algorithm suggested by [1]_.
 |      
 |      This method should only be executed after the propensity score
 |      has been estimated.
 |      
 |      References
 |      ----------
 |      .. [1] Crump, R., Hotz, V., Imbens, G., & Mitnik, O. (2009).
 |              Dealing with Limited Overlap in Estimation of
 |              Average Treatment Effects. Biometrika, 96, 187-199.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
  1. Summary Statistics
print(causal.summary_stats)
Summary Statistics

                      Controls (N_c=2476)        Treated (N_t=2524)             
       Variable         Mean         S.d.         Mean         S.d.     Raw-diff
--------------------------------------------------------------------------------
              Y       -0.981        1.758        4.907        2.969        5.888

                      Controls (N_c=2476)        Treated (N_t=2524)             
       Variable         Mean         S.d.         Mean         S.d.     Nor-diff
--------------------------------------------------------------------------------
             X0       -0.370        0.945        0.353        0.925        0.773
             X1       -0.305        0.932        0.298        0.951        0.640
             X2       -0.316        0.937        0.299        0.965        0.646
causal.summary_stats.keys()
dict_keys(['N', 'K', 'N_c', 'N_t', 'Y_c_mean', 'Y_t_mean', 'Y_c_sd', 'Y_t_sd', 'rdiff', 'X_c_mean', 'X_t_mean', 'X_c_sd', 'X_t_sd', 'ndiff'])
  1. Least Squares Estimation
causal.est_via_ols()
print(causal.estimates)
Treatment Effect Estimates: OLS

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE      2.960      0.035     84.511      0.000      2.891      3.029
           ATC      1.992      0.041     48.857      0.000      1.912      2.072
           ATT      3.909      0.039     99.022      0.000      3.832      3.987
  1. Propensity Score Estimation
help(causal.est_propensity_s) # lin_B=[5,6]
Help on method est_propensity_s in module causalinference.causal:

est_propensity_s(lin_B=None, C_lin=1, C_qua=2.71) method of causalinference.causal.CausalModel instance
    Estimates the propensity score with covariates selected using
    the algorithm suggested by [1]_.
    
    The propensity score is the conditional probability of
    receiving the treatment given the observed covariates.
    Estimation is done via a logistic regression.
    
    The covariate selection algorithm is based on a sequence
    of likelihood ratio tests.
    
    Parameters
    ----------
    lin_B: list, optional
            Column numbers (zero-based) of variables of
            the original covariate matrix X to include
            linearly. Defaults to empty list, meaning
            every column of X is subjected to the
            selection algorithm.
    C_lin: scalar, optional
            Critical value used in likelihood ratio tests
            to decide whether candidate linear terms should
            be included. Defaults to 1 as in [1]_.
    C_qua: scalar, optional
            Critical value used in likelihood ratio tests
            to decide whether candidate quadratic terms
            should be included. Defaults to 2.71 as in
            [1]_.
    
    References
    ----------
    .. [1] Imbens, G. & Rubin, D. (2015). Causal Inference in
            Statistics, Social, and Biomedical Sciences: An
            Introduction.
causal.est_propensity_s()
print(causal.propensity)
Estimated Parameters of Propensity Score

                    Coef.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
     Intercept     -0.011      0.043     -0.269      0.788     -0.095      0.072
            X0      1.112      0.042     26.228      0.000      1.029      1.195
            X1      0.978      0.041     23.666      0.000      0.897      1.059
            X2      0.962      0.041     23.549      0.000      0.882      1.042
         X2*X2      0.052      0.029      1.785      0.074     -0.005      0.110
causal.propensity.keys()
dict_keys(['lin', 'qua', 'coef', 'loglike', 'fitted', 'se'])
  1. Improving Covariate Balance
causal.cutoff #default: 0.1
0.1
causal.trim_s() #optimal cutoff
causal.cutoff
0.09833721388327077
print(causal.summary_stats)
Summary Statistics

                      Controls (N_c=1986)        Treated (N_t=1989)             
       Variable         Mean         S.d.         Mean         S.d.     Raw-diff
--------------------------------------------------------------------------------
              Y       -0.533        1.442        4.032        2.203        4.565

                      Controls (N_c=1986)        Treated (N_t=1989)             
       Variable         Mean         S.d.         Mean         S.d.     Nor-diff
--------------------------------------------------------------------------------
             X0       -0.205        0.861        0.205        0.851        0.478
             X1       -0.172        0.886        0.160        0.898        0.372
             X2       -0.180        0.894        0.148        0.898        0.367
  1. Stratifying the Sample
causal.stratify_s()
print(causal.strata)
Stratification Summary

              Propensity Score         Sample Size     Ave. Propensity   Outcome
   Stratum      Min.      Max.  Controls   Treated  Controls   Treated  Raw-diff
--------------------------------------------------------------------------------
         1     0.098     0.141       228        22     0.118     0.120     1.198
         2     0.141     0.191       205        43     0.164     0.167     1.427
         3     0.192     0.290       385       112     0.240     0.238     1.670
         4     0.290     0.402       324       173     0.343     0.348     2.338
         5     0.402     0.510       275       221     0.454     0.456     2.754
         6     0.511     0.612       204       293     0.559     0.563     3.175
         7     0.612     0.714       173       324     0.664     0.665     3.553
         8     0.714     0.808       119       378     0.759     0.761     4.080
         9     0.809     0.902        73       423     0.852     0.858     4.891
# Use this to report the summary statistics for each bin
for stratum in causal.strata:
    stratum.est_via_ols(adj=1)
[stratum.estimates['ols']['ate'] for stratum in causal.strata]
[1.1621855864527622,
 1.397114235686305,
 1.6744682989028234,
 2.318619817003348,
 2.7066270031865116,
 3.143493196600958,
 3.564068849326938,
 4.079812390028236,
 4.8196992785822]
  1. Treatment Effect Estimation
causal.est_via_ols()
causal.est_via_weighting()
causal.est_via_blocking()
causal.est_via_matching(bias_adj=True)
print(causal.estimates)
Treatment Effect Estimates: OLS

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE      2.938      0.037     79.193      0.000      2.866      3.011
           ATC      2.391      0.042     56.812      0.000      2.309      2.474
           ATT      3.485      0.040     87.042      0.000      3.406      3.563

Treatment Effect Estimates: Weighting

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE      2.953      0.043     68.165      0.000      2.868      3.038

Treatment Effect Estimates: Blocking

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE      2.947      0.039     74.884      0.000      2.870      3.025
           ATC      2.410      0.048     50.531      0.000      2.317      2.504
           ATT      3.484      0.042     82.390      0.000      3.401      3.567

Treatment Effect Estimates: Matching

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE      2.996      0.075     40.175      0.000      2.849      3.142
           ATC      2.455      0.091     27.104      0.000      2.277      2.632
           ATT      3.536      0.087     40.549      0.000      3.365      3.707
help(causal.est_via_matching)
Help on method est_via_matching in module causalinference.causal:

est_via_matching(weights='inv', matches=1, bias_adj=False) method of causalinference.causal.CausalModel instance
    Estimates average treatment effects using nearest-
    neighborhood matching.
    
    Matching is done with replacement. Method supports multiple
    matching. Correcting bias that arise due to imperfect matches
    is also supported. For details on methodology, see [1]_.
    
    Parameters
    ----------
    weights: str or positive definite square matrix
            Specifies weighting matrix used in computing
            distance measures. Defaults to string 'inv',
            which does inverse variance weighting. String
            'maha' gives the weighting matrix used in the
            Mahalanobis metric.
    matches: int
            Number of matches to use for each subject.
    bias_adj: bool
            Specifies whether bias adjustments should be
            attempted.
    
    References
    ----------
    .. [1] Imbens, G. & Rubin, D. (2015). Causal Inference in
            Statistics, Social, and Biomedical Sciences: An
            Introduction.