# 小白学习R语言——回归分析实例之男女身高体重

R读取数据

note：如果.txt里面含有中文，需要补充encoding = "UTF-8"

1，数据输入

2，研究身高分布。

a,身高分布

b，身高与其他的关系

pairs(cbind(height,weight,age))

oldpar=par(mfcol=c(1,3))
There were 11 warnings (use warnings() to see them)
boxplot(weight~sex,ylab="weight")
boxplot(height~sex,ylab="height")

boxplot(age~sex,ylab="age")
par(oldpar)

3，样本检验

attach(c1)

t.test(height~sex,conf.level=0.99)

Welch Two Sample t-test

data:  height by sex
t = -0.79241, df = 2.1575, p-value =
0.5059
alternative hypothesis: true difference in means is not equal to 0
99 percent confidence interval:
-90.68703  75.68703
sample estimates:
mean in group F mean in group M
160.0           167.5

p0.01(检验水平)，故男女身高差异显著。

4，回归分析

lm.fit1=lm(weight~height,data=c1)
lm.fit1

Call:
lm(formula = weight ~ height, data = c1)

Coefficients:
(Intercept)       height
-85.3553       0.8868

summary(lm.fit1)

Call:
lm(formula = weight ~ height, data = c1)

Residuals:
王      李      张      陈      赵
2.3289 -1.5395 -5.4079  5.1579 -0.5395

Coefficients:
Estimate Std. Error t value
(Intercept) -85.3553    38.6565  -2.208
height        0.8868     0.2368   3.745
Pr(|t|)
(Intercept)   0.1143
height        0.0332 *
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1
‘ ’ 1

Residual standard error: 4.616 on 3 degrees of freedom
Multiple R-squared:  0.8238, Adjusted R-squared:  0.765
F-statistic: 14.02 on 1 and 3 DF,  p-value: 0.03323

#观察回归效果

oldpar=par(mfrow=c(2,2),mar=c(2.5,2,1.5,0.2),mgp=c(1.2,0.2,0))
plot(lm.fit1)

#加入其他变量能否改善模型预报能力

Model:
weight ~ height
Df Sum of Sq    RSS    AIC
none              63.934 16.742
age     1    11.624 52.311 17.739
sex     1    13.268 50.667 17.579

AIC越小越好。

predict(lm.fit1)
王       李       张       陈       赵
47.67105 56.53947 65.40789 69.84211 56.53947
plot(weight)
plot(lm.fit1,col="blue",pch=8)

new.data=data.frame(height=c(150,160),sex=c("M","F"))
predict(lm.fit1,new.data)

• 二次元