Stata操作
⼯具变量法的难点在于找到⼀个合适的⼯具变量并说明其合理性,Stata操作其实相当简单,只需⼀⾏命令就可以搞定,我们通常使⽤的⼯具变量法的Stata命令主要就是ivregress命令和ivreg2命令。
ivregress命令
ivregress命令是Stata⾃带的命令,⽀持两阶段最⼩⼆乘(2SLS)、⼴义矩估计(GMM)和有限信息最⼤似然估计(LIML)三种⼯具变量估
计⽅法,我们最常使⽤的是两阶段最⼩⼆乘法(2SLS),因为2SLS最能体现⼯具变量的实质,并且在球形扰动项的情况下,2SLS是最有效率的⼯具变量法。
顾名思义,两阶段最⼩⼆乘法(2SLS)需要做两个回归:
(1)第⼀阶段回归:⽤内⽣解释变量对⼯具变量和控制变量回归,得到拟合值。(2)第⼆阶段回归:⽤被解释变量对第⼀阶段回归的拟合值和控制变量进⾏回归。
如果要使⽤2SLS⽅法,我们只需在ivregress后⾯加上2sls即可,然后将内⽣解释变量lnjinshipop和⼯具变量bprvdist放在⼀个⼩括号中,⽤=号连接。选项first表⽰报告第⼀阶段回归结果,选项cluster()表⽰使⽤聚类稳健的标准误。
ivregress 2sls lneduyear (lnjinshipop=bprvdist) lnnightlight lncoastdist tri suitability lnpopdensity urbanrates i.provid , first cluster(provid)
第⼀阶段回归结果
First-stage regressions----------------------- Number of obs = 274 No. of clusters = 28 F( 7, 239) = 85.27 Prob > F = 0.0000 R-squared = 0.6487 Adj R-squared = 0.5988 Root MSE = 0.4442
------------------------------------------------------------------------------ | Robust
lnjinshipop | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+----------------------------------------------------------------lnnightlight | .183385 .0682506 2.69 0.008 .0489354 .3178346 lncoastdist | .0350333 .077158 0.45 0.650 -.1169634 .1870299 tri | 1.06676 .5637082 1.89 0.060 -.0437105 2.177231 suitability | -.0769726 .0549697 -1.40 0.163 -.1852596 .0313144lnpopdensity | .196144 .0843727 2.32 0.021 .0299349 .3623532 urbanrates | 3.352916 1.687109 1.99 0.048 .029414 6.676419 | provid |
12 | .2051006 .0551604 3.72 0.000 .096438 .3137632 13 | -1.890425 .0951146 -19.88 0.000 -2.077795 -1.703055......
64 | -1.301895 .1581021 -8.23 0.000 -1.613346 -.9904433 |
bprvdist | -.0846917 .0107859 -7.85 0.000 -.1059393 -.0634441 _cons | 2.126233 .9791046 2.17 0.031 .1974567 4.05501------------------------------------------------------------------------------
从表中可以看出,⼯具变量bprvdist的系数为-0.085,标准误为0.011,在1%的⽔平上显著。
第⼆阶段回归结果
Instrumental variables (2SLS) regression Number of obs = 274 Wald chi2(34) = 13.62 Prob > chi2 = 0.9993 R-squared = 0.7874 Root MSE = .05434
(Std. Err. adjusted for 28 clusters in provid)------------------------------------------------------------------------------ | Robust
lneduyear | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- lnjinshipop | .0834221 .0116773 7.14 0.000 .060535 .1063092lnnightlight | .0592777 .0102138 5.80 0.000 .0392589 .0792964 lncoastdist | .0093399 .0118791 0.79 0.432 -.0139426 .0326224 tri | -.1044346 .063917 -1.63 0.102 -.2297097 .0208404
suitability | -.0008944 .0125824 -0.07 0.943 -.0255555 .0237666lnpopdensity | -.0472648 .0115102 -4.11 0.000 -.0698245 -.0247051 urbanrates | -.0353689 .1784687 -0.20 0.843 -.3851612 .3144233 | provid |
12 | -.10732 .0088755 -12.09 0.000 -.1247157 -.0899243 13 | .0244675 .0276005 0.89 0.375 -.0296284 .0785634......
64 | -.0740796 .0350354 -2.11 0.034 -.1427477 -.0054116 |
_cons | 2.014146 .1490819 13.51 0.000 1.721951 2.306341------------------------------------------------------------------------------
第⼆阶段回归结果就是使⽤⼯具变量法对内⽣性问题进⾏“治疗”之后得到的估计结果,很多论⽂⼀般也只报告第⼆阶段回归结果。从表中可以看出,进⼠密度lnjinshipop的系数为0.083,标准误为0.012,在1%的⽔平上显著。
弱⼯具变量检验
检验⼯具变量的有效性,⾸先要检验⼯具变量的相关性。存在弱⼯具变量问题时,2SLS 估计不仅难以矫正OLS 估计的偏差,反因有更⼤的标准差⽽有更低的效率, 导致“治疗⽐疾病本⾝更坏”。我们可以使⽤estat firststage命令对第⼀阶段的结果进⾏分析,以判断是否存在弱⼯具变量问题。
. estat firststage,forcenonrobust
First-stage regression summary statistics
-------------------------------------------------------------------------- | Adjusted Partial Robust
Variable | R-sq. R-sq. R-sq. F(1,27) Prob > F -------------+------------------------------------------------------------ lnjinshipop | 0.6487 0.5988 0.3276 61.914 0.0000 -------------------------------------------------------------------------- (F statistic adjusted for 28 clusters in provid)
Minimum eigenvalue statistic = 116.419
Critical Values # of endogenous regressors: 1 Ho: Instruments are weak # of excluded instruments: 1 --------------------------------------------------------------------- | 5% 10% 20% 30% 2SLS relative bias | (not available) -----------------------------------+--------------------------------- | 10% 15% 20% 25%
2SLS Size of nominal 5% Wald test | 16.38 8.96 6.66 5.53 LIML Size of nominal 5% Wald test | 16.38 8.96 6.66 5.53 ---------------------------------------------------------------------
结果显⽰,偏=0.3276(偏是扣除了其他外⽣变量后⼯具变量的解释⼒度),说明⼯具变量bprvdist对内⽣变量lnjinshipop有很强的解释⼒度。F统计量=61.914>10,根据经验准则可以判断,我们的⼯具变量不是⼀个弱⼯具变量。除此之外,estat firststage命令还会报告⼀个最⼩特征值统计量(Minimum eigenvalue statistic),⼀般⼤于2SLS Size of nominal 5% Wald test中10%对应的临界值就表明不存在弱⼯具变量问题。特别说明:豪斯曼检验(即内⽣性检验)和过度识别检验(即外⽣性检验)在此就不予以介绍了,因为这两个检验都是很“鸡肋”的,并且这⾥只有⼀个⼯具变量,也没法做过度识别检验,后⾯我将专门写⼀篇推⽂对这两个检验进⾏详细说明。
ivregress 2sls Y X1 X2 X3 X4 (X1= Z1 Z2), robustestat overid
ivreg2命令
ivreg2命令是对ivregress命令的改进和优化,功能更加强⼤,⽀持的估计⽅法更多(默认使⽤2SLS),并且会直接报告⼯具变量的⼏个统计检
验结果。
ivreg2命令是⼀个外部命令,所以使⽤之前需要安装(ssc install ivreg2)。它的语法格式和ivregress基本⼀致:ivreg2 lneduyear (lnjinshipop=bprvdist) lnnightlight lncoastdist tri suitability lnpopdensity urbanrates i.provid , first cluster(provid)
第⼀阶段回归结果
First-stage regression of lnjinshipop:
Statistics robust to heteroskedasticity and clustering on providNumber of obs = 274Number of clusters (provid) = 28
------------------------------------------------------------------------------ | Robust
lnjinshipop | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- bprvdist | -.0846917 .0107633 -7.87 0.000 -.1058948 -.0634886
lnnightlight | .183385 .0681077 2.69 0.008 .0492169 .3175531 lncoastdist | .0350333 .0769964 0.45 0.650 -.116645 .1867116 tri | 1.06676 .5625276 1.90 0.059 -.0413849 2.174906 suitability | -.0769726 .0548546 -1.40 0.162 -.1850328 .0310876lnpopdensity | .196144 .084196 2.33 0.021 .030283 .3620051 urbanrates | 3.352916 1.683576 1.99 0.048 .0363742 6.669459 | provid |
12 | .2051006 .0550449 3.73 0.000 .0966656 .3135357 13 | -1.890425 .0949154 -19.92 0.000 -2.077402 -1.703447......
64 | -1.301895 .157771 -8.25 0.000 -1.612694 -.9910956 |
_cons | 2.126233 .9770541 2.18 0.031 .201496 4.050971------------------------------------------------------------------------------
第⼆阶段回归结果
IV (2SLS) estimation--------------------Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity and clustering on provid
Number of clusters (provid) = 28 Number of obs = 274 F( 34, 27) = 0.34 Prob > F = 0.9984
Total (centered) SS = 3.805186838 Centered R2 = 0.7874Total (uncentered) SS = 1282.736705 Uncentered R2 = 0.9994Residual SS = .8089841945 Root MSE = .05434------------------------------------------------------------------------------ | Robust
lneduyear | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- lnjinshipop | .0834221 .0116773 7.14 0.000 .060535 .1063092lnnightlight | .0592777 .0102138 5.80 0.000 .0392589 .0792964 lncoastdist | .0093399 .0118791 0.79 0.432 -.0139426 .0326224 tri | -.1044346 .063917 -1.63 0.102 -.2297097 .0208404 suitability | -.0008944 .0125824 -0.07 0.943 -.0255555 .0237666lnpopdensity | -.0472648 .0115102 -4.11 0.000 -.0698245 -.0247051 urbanrates | -.0353689 .1784687 -0.20 0.843 -.3851612 .3144233 | provid |
12 | -.10732 .0088755 -12.09 0.000 -.1247157 -.0899243 13 | .0244675 .0276005 0.89 0.375 -.0296284 .0785634......
64 | -.0740796 .0350354 -2.11 0.034 -.1427477 -.0054116 |
_cons | 2.014146 .1490819 13.51 0.000 1.721951 2.306341------------------------------------------------------------------------------
弱⼯具变量检验结果
ivreg2命令会直接报告Cragg-Donald Wald F 统计量和Kleibergen-Paap Wald rk F统计量两个⽤于弱⼯具变量检验的统计量,其中Cragg-Donald Wald F 统
计量假设扰动项iid(独⽴同分布),⽽Kleibergen-Paap Wald rk F统计量没有做扰动项iid的假设。需要注意的是,这⾥的原假设是所选⼯具变量是弱⼯具变量,当两个F统计量⼤于Stock-Yogo weak ID test critical values中10%偏误的临界值时,可以拒绝原假设,认为不存在弱⼯具变量问题。
Weak identification test
Ho: equation is weakly identified
Cragg-Donald Wald F statistic 116.42Kleibergen-Paap Wald rk F statistic 61.91Stock-Yogo weak ID test critical values for K1=1 and L1=1: 10% maximal IV size 16.38 15% maximal IV size 8.96 20% maximal IV size 6.66 25% maximal IV size 5.53Source: Stock-Yogo (2005). Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
补充:xtivreg2命令
ivreg2 和 xtivreg2 之间的差异,与reg 和 xtreg 之间的差异⼤体类似。
我们知道,OLS加⼊虚拟变量(LSDV)等价于FE模型。运⾏以下命令
ivreg2允许有时间效应(即适⽤于混合⾯板),⽽xtivreg2只能做固定效应,不能添加时间的固定效应
xtset id year
xtivreg2 ys k (n=l2.n l3.n), fe first savefp(first)
outreg2 [first second] using xxx.doc, tstat bdec(3) tdec(2) replace
xtivreg2会报告过度检验的结果,同ivreg2添加时间固定效应建议使⽤xtivreg
xtset stkcd year
xtreg Ln_geodistance_ew rdls ewdistance_mean $control i.year,feest store first
outreg2[first] using first.doc,tstat bdec(3) tdec(3) addstat(Ajusted R2,`e(r2_a)') replace
xtivreg ln_Cash_ratio1 (Ln_geodistance_ew=rdls ewdistance_mean) $control i.year,fe //xtivreg2 加⼊不了i.year outreg2 using xxx1.doc,cttop(second) tstat bdec(3) tdec(2)
如果进⾏过度检验则需要
xtoverid
添加字符串类型固定效应则
xtset stkcd year
xi:xtreg Ln_geodistance_ew rdls ewdistance_mean $control i.industry2,feest store first
outreg2[first] using first.doc,tstat bdec(3) tdec(3) addstat(Ajusted R2,`e(r2_a)') replace
xi:xtivreg ln_Cash_ratio1 (Ln_geodistance_ew=rdls ewdistance_mean) $control i.industry2,fe //xtivreg2 加⼊不了i.year outreg2 using xxx1.doc,cttop(second) tstat bdec(3) tdec(2)
因篇幅问题不能全部显示,请点此查看更多更全内容