Ch 5: Simple Regression
The number of people living on American farms has declined steadily during the last century. Here are data on the farm population (millions of persons) from 1935 to 1980.
Farm population | |
Year | Population |
---|---|
1935 | 32.1 |
1940 | 30.5 |
1945 | 24.4 |
1950 | 23.0 |
1955 | 19.1 |
1960 | 15.6 |
1965 | 12.4 |
1970 | 9.7 |
1975 | 8.9 |
1980 | 7.2 |
Input Data
Input data using "seq" command.
year <- c(seq(1935, 1980, 5)) pop <- c(31.1, 30.5, 24.4, 23.0, 19.1, 15.6, 12.4, 9.7, 8.9, 7.2)
Note that seq(start value, end value, interval) command produces regular sequences.
Regression
Estimate a regression model with "lm(depedent variable ~ indepdendent variable)" command.
> lm(pop~year) Call: lm(formula = pop ~ year) Coefficients: (Intercept) year 1145.4727 -0.5759
> regmodel <- lm(pop~year) > summary(regmodel) Call: lm(formula = pop ~ year) Residuals: Min 1Q Median 3Q Max -1.4709 -1.1098 -0.2885 0.7136 2.2321 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1145.47273 61.20329 18.72 6.86e-08 *** year -0.57588 0.03127 -18.42 7.77e-08 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 Residual standard error: 1.42 on 8 degrees of freedom Multiple R-squared: 0.977, Adjusted R-squared: 0.9741 F-statistic: 339.3 on 1 and 8 DF, p-value: 7.773e-08
Residuals and Predicted values
Compute predicted values (y-hat) after estimating a regression model.
> fitted(lm(pop~year)) 1 2 3 4 5 6 7 8 31.147273 28.267879 25.388485 22.509091 19.629697 16.750303 13.870909 10.991515 9 10 8.112121 5.232727
Compute residuals (y - y-hat).
> resid(lm(pop~year)) 1 2 3 4 5 6 -0.04727273 2.23212121 -0.98848485 0.49090909 -0.52969697 -1.15030303 7 8 9 10 -1.47090909 -1.29151515 0.78787879 1.96727273
Draw a line of the best-fit
Draw a scatterplot first and add a line of the best-fit.
plot(year, pop, col="red", pch=16) abline(lm(pop~year))
[ Going back to Using R: Index ]