##### Citations
Title Text Both

## Comparing 2 groups: unpaired data

Student's t test:
This is the classical test used to compare 2 series of numbers. For example, if we wish to compare the ages of mothers of newborns with low birth weight versus normal birth weights:

Code:

> t.test(age~low, data=bwdf)

Welch Two Sample t-test

data:  age by low
t = 1.7737, df = 136.94, p-value = 0.07834
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.1558349  2.8687423
sample estimates:
mean in group 0 mean in group 1
23.66154        22.30508

Above output shows a P value of 0.08. Values of <0.05 are considered significant, those between 0.05 and 0.1 are considered showing a trend towards significance, while values > 0.1 clearly indicate a non-significant relation.

This test can also be used with 2 separate vectors (series of numbers) which are not part of same dataframe:

Code:

> xx
[1]  6  1  3  9 10  7  8  4  5  2
> yy
[1]  3  6  9  1  4  8  5 10  7  2

> t.test(xx,yy)

Welch Two Sample t-test

data:  xx and yy
t = 0, df = 18, p-value = 1
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.844662  2.844662
sample estimates:
mean of x mean of y
5.5       5.5

Non-parametric test:
For non-normally distributed data and for small sample sizes, it is better to use Mann Whitney U or Wilcoxan test:

Code:

> wilcox.test(age~low, data=bwdf)

Wilcoxon rank sum test with continuity correction

data:  age by low
W = 4238, p-value = 0.2471
alternative hypothesis: true location shift is not equal to 0

> wilcox.test(xx,yy)

Wilcoxon rank sum test with continuity correction

data:  xx and yy
W = 50, p-value = 1
alternative hypothesis: true location shift is not equal to 0

Warning message:
In wilcox.test.default(xx, yy) : cannot compute exact p-value with ties

Using regression:
Regression can also be used to determine relation between 2 groups of numbers:

Code:

> summary(lm(age~low, data=bwdf))

Call:
lm(formula = age ~ low, data = bwdf)

Residuals:
Min      1Q  Median      3Q     Max
-9.6615 -4.3051 -0.6615  3.6949 21.3385

Coefficients:
Estimate Std. Error t value            Pr(>|t|)
(Intercept)  23.6615     0.4627  51.143 <0.0000000000000002 ***
low1         -1.3565     0.8281  -1.638               0.103
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.275 on 187 degrees of freedom
Multiple R-squared:  0.01415,   Adjusted R-squared:  0.008875
F-statistic: 2.683 on 1 and 187 DF,  p-value: 0.1031

P value shows that there is no significant relation between 2 groups.