For this section we will use bwdf dataset and try to find relation between age and race. Age is a numeric variable while race is a factor variable with 3 levels. Hence, this is a comparison between 3 groups:

**ANOVA (Analysis of variance)**

R has aov function to perform analysis of variance:

**code:**

> res = aov(age~race, data=bwdf)

> res

Call:

aov(formula = age ~ race, data = bwdf)

Terms:

race Residuals

Sum of Squares 230.080 5048.205

Deg. of Freedom 2 186

Residual standard error: 5.209692

Estimated effects may be unbalanced

> summary(res)

Df Sum Sq Mean Sq F value Pr(>F)

race 2 230 115.04 4.239 0.0158 *

Residuals 186 5048 27.14

---

Signif. Codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

P value indicates significant relation between age and race. To determine which of the groups are significantly different from each other, TukeyHSD and pairwise t.tests can be done:

**code:**

> TukeyHSD(res)

Tukey multiple comparisons of means

95% family-wise confidence level

Fit: aov(formula = age ~ race, data = bwdf)

$race

diff lwr upr p adj

2-1 -2.7532051 -5.474434 -0.03197581 0.0466676

3-1 -1.9036070 -3.863030 0.05581649 0.0589215

3-2 0.8495982 -1.994371 3.69356754 0.7603654

P values show that ages of race groups 1 vs 2 and 1 vs 3 are significantly different from each other, while there is no significant difference between groups 2 and 3.

**code:**

> with(bwdf, pairwise.t.test(age, race))

Pairwise comparisons using t tests with pooled SD

data: age and race

1 2

2 0.053 -

3 0.053 0.481

P value adjustment method: holm

P values show significant difference between race groups 1 vs 2 and 1 vs 3 but not between 2 vs 3.

**Non-parametric test**

For non-normally distributed data and for small sample sizes, Kruskal Wallis test can be performed as a non-parametric test for analysis of variance:

**code:**

> res = kruskal.test(age~race, data=bwdf)

> res

Kruskal-Wallis rank sum test

data: age by race

Kruskal-Wallis chi-squared = 7.2515, df = 2, p-value = 0.02663

**Using regression: **

Linear regression can also be used to test relation between multiple groups:

**code:**

> summary(lm(age~race, data=bwdf))

Call:

lm(formula = age ~ race, data = bwdf)

Residuals:

Min 1Q Median 3Q Max

-10.2917 -4.2917 -0.5385 3.6119 20.7083

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 24.2917 0.5317 45.686 <0.0000000000000002 ***

race2 -2.7532 1.1518 -2.390 0.0178 *

race3 -1.9036 0.8293 -2.295 0.0228 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.21 on 186 degrees of freedom

Multiple R-squared: 0.04359, Adjusted R-squared: 0.03331

F-statistic: 4.239 on 2 and 186 DF, p-value: 0.01585

P values show that race groups 2 and 3 are significantly different from race group 1.