These are my slides written for the graduate course of Intermediate Econometrics. The whole project is based on Threshold effects in non-dynamic panels: Estimation, testing, and inference (Hansen, 1999)
Part One: Slides
First of all, we need to generate the data of y,x,n
set.seed(123) # I set the seed for the sake of repeatability
e=rnorm(100,mean=0,sd=1)
x=rnorm(100,mean=0,sd=3^2)
n=1:100
y=rep(0,times=100)
y[1:50]=1+2*x[1:50]+e[1:50]
y[51:100]=1-2*x[51:100]+e[51:100]
data=data.frame(n,x,y,e)
Then we can plot out the relationship between y and n, as well as y and x.
library(ggplot2)
p1=ggplot(data,aes(x=n,y=y))+geom_point()
p2=ggplot(data,aes(x=x,y=y))+geom_point()
p1
p2
Regression results
Suppose we do not aware the existance of thereshold effect
reg1=lm(y~x,data)
summary(reg1)
If we know the thereshold and seprate the whole data frame into two groups,then conduct regressions separately.
data1=subset(data, n<=50)
data2=subset(data,n>50)
reg2=lm(y~x,data1)
reg3=lm(y~x,data2)
summary(reg2)
summary(reg3)
If we do not know where the thereshold is, then what shoudl we do?
reg=list()
rss=array()
for (i in 1:99)
{
dum=x
dum[(i+1):100]=0
reg[[i]]=lm(y~x+dum)
rss[i]=sum(residuals(reg[[i]])^2)
}
order(rss)
Rss=data.frame(n=1:99,rss)
ggplot(Rss,aes(x=n,y=rss))+geom_point()
Now if we have two theresholds, we can still use nested loop to discern these two theresholds. However Hensen proposed a more elegant solution.
Prepare the data:
set.seed(456) # I set the seed for the sake of repeatability
e=rnorm(100,mean=0,sd=1)
x=rnorm(100,mean=0,sd=3^2)
n=1:100
y=rep(0,times=100)
y[1:30]=1+2*x[1:30]+e[1:30]
y[31:60]=1-2*x[31:60]+e[31:60]
y[61:100]=1+4*x[61:100]+e[61:100]
data=data.frame(n,x,y,e)
Then we can plot out the relationship between y and n, as well as y and x.
library(ggplot2)
p1=ggplot(data,aes(x=n,y=y))+geom_point()
p2=ggplot(data,aes(x=x,y=y))+geom_point()
p1
p2
Now we use the method proposed by Hensen to discern theresholds.
Pinpoint the first thereshold:
reg=list()
rss1=array()
for (i in 1:99)
{
dum=x
dum[(i+1):100]=0
reg[[i]]=lm(y~x+dum)
rss1[i]=sum(residuals(reg[[i]])^2)
}
which.min(rss1)
#plot the figure
Rss1=data.frame(n=1:99,rss1)
ggplot(Rss1,aes(x=n,y=rss1))+geom_point()
Pinpoint the second thereshold
reg2=list()
rss2=array()
for(i in 1:99)
{
dum1=x
dum2=x
left=min(i,which.min(rss1))
right=max(i,which.min(rss1))
dum1[(left+1):100]=0
dum2[1:right]=0
reg2[[i]]=lm(y~x+dum1+dum2)
rss2[i]=sum(residuals(reg2[[i]])^2)
}
which.min(rss2)
#plot the figure
Rss2=data.frame(n=1:99,rss2)
ggplot(Rss2,aes(x=n,y=rss2))+geom_point()
Pinpoint the first thereshold again
reg3=list()
rss3=array()
for(i in 1:99)
{
dum1=x
dum2=x
left=min(i,which.min(rss2))
right=max(i,which.min(rss2))
dum1[(left+1):100]=0
dum2[1:right]=0
reg3[[i]]=lm(y~x+dum1+dum2)
rss3[i]=sum(residuals(reg3[[i]])^2)
}
which.min(rss3)
#plot the figure
Rss3=data.frame(n=1:99,rss3)
ggplot(Rss3,aes(x=n,y=rss3))+geom_point()
put those aboving figures into one picture.
all=rbind(data.frame(n=1:99,rss=rss1,t="loop 1"),data.frame(n=1:99,rss=rss2,t="loop 2"),data.frame(n=1:99,rss=rss3,t="loop 3"))
p=ggplot(all, aes(x=n,y=rss))
p+geom_path(aes(position=t,color=t))+geom_point(aes(position=t,color=t))
Bootstrap, in this case, we only consider one threshold.
set.seed(123) # I set the seed for the sake of repeatability
e=rnorm(100,mean=0,sd=1)
x=rnorm(100,mean=0,sd=3^2)
n=1:100
y=rep(0,times=100)
y[1:50]=1+2*x[1:50]+e[1:50]
y[51:100]=1-2*x[51:100]+e[51:100]
data=data.frame(n,x,y,e)
breg1=lm(y~x,data)
s0=sum(residuals(breg1)^2)
library(boot)
fvalue=function(data,indices){
d=data[indices,]
bregloop=list()
brss=array()
for (i in 1:99)
{
dum=d$x
dum[(i+1):100]=0
bregloop[[i]]=lm(y~x+dum,data=d)
brss[i]=sum(residuals(bregloop[[i]])^2)
}
a=which.min(brss)
dum=d$x
dum[(a+1):100]=0
breg2=lm(y~x+dum,data=d)
s1=sum(residuals(breg2)^2)
f=(s0/s1-1)*(100-1)
return(f)
}
result=boot(data=data,fvalue,R=99)
f=result$t
result=boot(data=data,fvalue,R=99,formula1=y~x+dum)
f=result$t
The real f0
dum=data$x
dum[51:100]=0
reg4=lm(y~x+dum,data)
rss4=sum(residuals(reg4)^2)
f0=(s0/rss4-1)*(100-1)
table(f>f0) # so the p-value=0, we should reject H0
Last question, get the confidence intervals for gamma
LR=(rss-rss4)/(rss4/(100-1))
order(LR)
c=-2*log(1-sqrt(1-0.01)) #we set the asymptotic level alpha at 1%
LR[LR<c]
In statistics, nonlinear regression is a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables.
1.Linear Regression
\(Y\_i=\beta\_1+\beta\_2X\_i+\mu\_i\)
2.Nonlinear Regression
\(Y\_i=\beta\_1e^{\beta\_2X\_i}+\mu\_i\)
1.Delta Method
Example
Suppose we have the following regression:
\(Y\_i=\theta\_0+\frac{\alpha}{1-\alpha}X\_1+\frac{1}{2}\frac{\sigma-1}{\sigma}\frac{\alpha}{(1-\alpha)^2}X\_2\)
So, in this case we can get $\theta=\begin{pmatrix} \alpha \\ \sigma \end{pmatrix}$ $\gamma=\begin{pmatrix} \frac{\alpha}{1-\alpha} \\ \frac{1}{2}\frac{\sigma-1}{\sigma}\frac{\alpha}{(1-\alpha)^2} \end{pmatrix}$
\(\gamma=g(\theta) \Rightarrow \begin{cases}\gamma\_1=g\_1(\alpha,\sigma)\\\ \gamma\_2=g\_2(\alpha,\sigma) \end{cases}\)
Construction of Jacoby matrix
\(\hat{G} = \begin{bmatrix}
\frac{\partial\gamma\_1}{\partial\theta\_1} & \frac{\partial\gamma\_1}{\partial\theta\_2} \\\
\frac{\partial\gamma\_2}{\partial\theta\_1} & \frac{\partial\gamma\_2}{\partial\theta\_2}
\end{bmatrix}\)
So,
\(\hat{G} = \begin{bmatrix}
\frac{\partial(\frac{\alpha}{1-\alpha})}{\partial\alpha} & \frac{\partial(\frac{\alpha}{1-\alpha})}{\partial\sigma} \\\
\frac{\partial(\frac{1}{2}\frac{\sigma-1}{\sigma}\frac{\alpha}{(1-\alpha)^2})}{\partial\alpha} & \frac{\partial(\frac{1}{2}\frac{\sigma-1}{\sigma}\frac{\alpha}{(1-\alpha)^2})}{\partial\sigma}
\end{bmatrix}\)
By conducting OLS regression, we can get $\hat{\gamma_1}$, $\hat{\gamma_2}$ and the error covariance matrix $\Omega$.
Following equation (2), we can get:
\(\widehat{Var}(\hat{\gamma})=(X^TX)^{-1}X^T{\Omega}X(X^TX)^{-1}\)
Then, according to equation (1), we can get:
\(\widehat{Var}(\hat{\theta})\equiv(\hat{G})^{-1}\widehat{Var}(\hat{\gamma})(\hat{G}^T)^{-1}\)
2. nls(.) function
nls
is a build-in function in R, which is used to determine the nonlinear (weighted) least-squares estimates of the parameters of a nonlinear model.
Usage of nls
:
nls(formula, data, start, control, algorithm, trace, subset, weights, na.action, model, lower, upper, ...)
Note: You can find out more explanation on parameter settings by browsing R help documentation.
Example
Suppose we still have the following regression:
\(Y\_i=\theta\_0+\frac{\alpha}{1-\alpha}X\_1+\frac{1}{2}\frac{\sigma-1}{\sigma}\frac{\alpha}{(1-\alpha)^2}X\_2\)
We can use nls
to get the estimation of parameters $\alpha$, $\sigma$ directly.
#In this method, we need to tell R the approximate initial value of alpha and sigma in the first place.
lm=nls(y~constant+alpha/(1-alpha)*x1+1/2*(sigma-1)/sigma*alpha/(1-alpha)^2*x2,start=list(constant=5,alpha=0.3,sigma=1.5),trace=F)
summary(lm)
Winford H. Masanjala and Chris Papageorgiou, “The Solow Model with CES
Technology: Nonlinearities and Parameter Heterogeneity”, Journal of Applied
Econometrics, Vol. 19, No. 2, 2004, pp. 171-201.
CODE=Country number in Summers-Heston dataset.
NONOIL=1 for nonoil producing countries.
INTER=1 for countries with better quality data.
OECD=1 for OECD countries.
GDP60=Per capita GDP in 1960.
GDP85=Per capita GDP in 1985.
GDPGRO=Average growth rate of per capita GDP (1960-1985).
POPGRO=Average growth rate of working-age population (1960-1985).
IONY=Average ratio of investment (including Government Investment) to GDP(1960-1985).
SCHOOL=Average fraction of working-age population enrolled in secondary school (1960-1985).
LIT60=fraction of the population over 15 years old that is able to read and write in 1960.
NA indicates that the observation is missing. This dataset has also being used in Durlauf and Johnson (JAE 1995).
There are 121 observations for each variable. All of the data with the exception of LIT60 are from Mankiw, Romer and Weil (QJE 1992), who in turn constructed the data from Penn World Tables 4.0. LIT60 is from the World Bank’s World Development Report.
CODE=Country number in Summers-Heston dataset.
GDP60=Per capita GDP in 1960.
GDP85=Per capita GDP in 1985.
POPGRO=Average growth rate of working-age population (1960-1985).
IONY=Average ratio of investment to GDP (1960-1985).
SCHOOL=Average fraction of working-age population enrolled in secondary school (1960-1985).
LIT60=fraction of the population over 15 years old that is able to read and write in 1960.
There are 96 observations for each variable. All of the data with the exception of LIT60 are from Mankiw, Romer and Weil (QJE 1992) who in turn constructed the data from Penn World Tables 4.0. LIT60 is from the World Bank’s World Development Report.
CODE=Country number in Summers-Heston dataset.
GDP60=Per capita GDP in 1960.
GDP85=Per capita GDP in 1985.
IONY=Average ratio of investment to GDP (1960-1995).
SCHOOL=Average fraction of working-age population enrolled in secondary school (1960-1995).
POPGRO=Average growth rate of working-age population (1960-1995).
There are 90 observations for each variable. All of the data are from Bernanke and Gurkaynak (NBER Macroeconomics Annual 2001) who constructed the data from Penn World Tables 6.0.
As you can see in the following table, the term regression can be confusing because there are so many specialized varieties. In this class we will only focus on OLS, Nonparametric Regression and Robust Regression.
\(\hat{Y\_i}=\hat{\beta\_0}+\hat{\beta\_1}X\_{1,i}+...+\hat{\beta\_k}X\_{k,i} \qquad i=1...n\)
where $n$ is the number of observatiions and $k$ is the number of predictor variables. In this equation:
$\hat{Y_i}$: is the predicted value of the dependent variable for observation $i$ (specifically, it is the estimated mean of the $Y$ distribution, conditional on the set of predictor values).
$X_{j,i}$: is the $j^{th}$ predictor value for the $i^{th}$ observation.
$\hat{\beta_0}$: is the intercept (the predicted value of $Y$ when all the predictor variables equal 0).
$\hat{\beta_j}$: is the regression coefficient for the $j^{th}$ predictor (slope representing the change in $Y$ for a unit change in $X_j$).
To properly interpret the coefficients of the OLS model, you must satisfy a number of statistical assumptions:
lm
function.lm
:
lm(formula, data, subset, weights, na.action, method = "qr", model = TRUE, x = FALSE, y = FALSE, qr= TRUE, singular.ok= TRUE, contrasts = NULL, offset, ...)
Note: You can find out more explanation on parameter settings by browsing R help documentation.
reg=lm(y~x1+x2...+xn, mydata)
Then you use the following commend to tell R to report the result:
summary(reg)
There are some symbols listed in the following table, which are commonly used in regression.
library(car)
lht(reg, "the constrains you want to test")
anova
function
anova(reg1, reg2)
2.Use waldtest
function
library(lmtest)
waldtest(reg1,reg2,vcov=vcovHC(reg2,type="HC0"))
Q: What is vcovHC? (Hint: Please find the answer in R help documentation or Google.)
What is heteroskedasticity?
One of our assumptions for linear regression is that the random error terms in our regression have same variance. However, there are times that regression ends with heteroskedasticity, which means the random error terms have different variances.
What may happen if heteroskedasticity does exist in our regression?
In this situation, t-test and F-test may become not reliable and misleading. Those supposed-to-be-significant coefficients are not significant any more.
How to discern heteroskedasticity?
```R
reg=lm(y~x1+x2,mydata)
bptest(reg)
#This is one solution
bptest(reg, ~x1+x2+I(x1^2)+I(x2^2)+x1:x2, mydata)
#This is another one.
library(bstats)
white.test(reg)
* How to get robust standard error when heteroskedasticity appears?
```R
# The first method (recommended)
library(sandwich)
library(lmtest)
coeftest(reg, vcov=vcovHC(reg,”HC0”))
# The second method
library(car)
sqrt(diag(vcovHC(reg1,type="HC0")))
# The third method
library(car)
sqrt(diag(hccm(reg1),type="hc0")))
R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team, of which Chambers is a member. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. For more information, please see Wiki: R(programming language)
** Most important:** They are all commercial software, which means they are not free to individual user.
One feature of R is that most functions can be achieved by the installation of different packages. There are 6523 packages on the websit of CRAN and the number is still growing.
install.package()
install.packages()
, then choose the package you need. install.packages("lmtest")
, R will install the “lmtest” package and those supporting packages automatically.
library()
library(lmtest)
to load the “lmtest” package before you want to use this package. update.packages()
to update those packages you have installed.=
and <-
can both be used as assignment symbol. =
instead of <-
. For more information on the subtle difference between these two symbol, please see stackoverflow and 知乎.#
in R to indicate the following part is comment and should not be compiled.setwd("c:\myprogram")
, you should use ‘setwd(“c:/myprogram”)’ or setwd("c:\\myprogram")
.help()
or ?the_name_of_funciton
to read help documentation. The website Rseeker and Google are also really helpful.A dataset is usually a rectangular array of data with rows representing observations
and columns representing variables. The following table provides an example of a hypothetical
patient dataset.
How to construct a dataset?
1.Import or type data into the data structures.
2.Choose a type of data structures to store data.
Data input
From the following figure, we can see that R can cope with different data formats.
read.table()
#This can be used to read txt file, the argument na.strings="."
should be used to convert the missing value (“.” in txt file) to NA value.
read.csv()
#This can be used to read CSV file, highly recommended.
load()
#This can be used to load Rdata file.
Note: You can find out more information on how to input other different types of data in the book R in Action (R语言实战).
Data structures
R has a wide variety of objects for holding data, including scalars, vectors, matrices,
arrays, data frames, and lists. They differ in terms of the type of data they can hold,
how they are created, their structural complexity, and the notation used to identify and
access individual elements. The figure shows a diagram of these data structures.
1.Vectors
Vectors are one-dimensional arrays that can hold numeric data, character data, or logical
data. The combine function c()
is used to form the vector.
a=c(1,2,3,4) #This is a numeric vector.
b=c("one","two","three") #This is a character vector.
c=c(TRUE,FALSE,TRUE) #This is a logical vector.
2.Matrices
A matrix is a two-dimensional array where each element has the same mode (numeric,
character, or logical). Matrices are created with the matrix
function.
x=matrix(1:20, nrow=5, ncol=4) #This can create a 5*4 matrix using the number 1~20.
The comment operation symbols in matrix:
Dimensions of matrix x: dim(x)
Rows of matrix x: nrow(x)
Columns of matrix x: col(x)
Transpose of matrix x: t(x)
The value of the determinant of matrix x: det(x)
If the determinant is not 0, then we can get the inverse of matrix x: solve(x)
Eigenvalue and eigenvector: y=eigen(x)
, then y$val
is eigenvalue, y$vec
is eigenvector.
Multiplication of matrices: a %*% b
Arithmetic operations and power operation on every element of matrix: + - * / ^
3.Data frames
Personally, I think data frame is the most important and common data structure you will deal with in R. A data frame is more general than a matrix in that different columns can contain different modes of data (numeric, character, etc.).
patientID= c(1, 2, 3, 4)
age = c(25, 34, 28, 52)
diabetes = c("Type1", "Type2", "Type1", "Type1")
status = c("Poor", "Improved", "Excellent", "Poor")
patientdata= data.frame(patientID, age, diabetes, status)
patientdata #patientdata is a data frame.
The different ways to read data from data frame:
patientdata[1:2]
patientdata[c("diabetes", "status")]
patientdata$age
1.summary
summary is a generic function used to produce result summaries of the results of various model fitting functions.
summary(data$example)
or summary(data)
2.sum
sum returns the sum of all the values present in its arguments.
sum(data$example, na.rm=TRUE)
3.nrow and ncol
nrow
and ncol
return the number of rows or columns present in x.
4.cbind and rbind
Take a sequence of vector, matrix or data-frame arguments and combine by columns or rows, respectively.
5.replace
replace
replaces the values in x with indices given in list by those given in values.
data$V1=replace(data$V1,data$V1<5,0)
In this example, if the data in column V1 is less than 5, then those data will be converted to 0. This function is really useful when you want to convert your numerical variable to dummy variable.
6.subset
Return subsets of vectors, matrices or data frames which meet conditions.
subset(data, subset, select)
7.aggregate
Splits the data into subsets, computes summary statistics for each, and returns the result in a convenient form.
aggregate(data, by=list(data$variable), mean)
In this example, based on the column variable, R splits the data into subsets and computes mean of those subsets.