R language Access Menu

Contingency tables

These are commonly used to analyze 2 categorical variables. For example, in our dataset bwdf, race and smoke are 2 categorical variables. A contingency table can be created easily with table function.: 

code:

                  > tt = with(bwdf, table(smoke, race))
                  > 
                  > tt
                 race
                smoke  1  2  3
                        0 44 16 55
                       1 52 10 12
                 > 

Also, the row and column percentages can be calculated using prop.table function.

code:

> prop.table(tt)
     race
smoke          1          2          3
    0 0.23280423 0.08465608 0.29100529
    1 0.27513228 0.05291005 0.06349206

> prop.table(tt, 1)
     race
smoke         1         2         3
    0 0.3826087 0.1391304 0.4782609
    1 0.7027027 0.1351351 0.1621622

> prop.table(tt, 2)
     race
smoke         1         2         3
    0 0.4583333 0.6153846 0.8208955
    1 0.5416667 0.3846154 0.1791045

The addmargins function can be used to show row and column sums

code:

> addmargins(prop.table(tt))
     race
smoke          1          2          3        Sum
  0   0.23280423 0.08465608 0.29100529 0.60846561
  1   0.27513228 0.05291005 0.06349206 0.39153439
  Sum 0.50793651 0.13756614 0.35449735 1.00000000

> addmargins(prop.table(tt,1))
     race
smoke         1         2         3       Sum
  0   0.3826087 0.1391304 0.4782609 1.0000000
  1   0.7027027 0.1351351 0.1621622 1.0000000
  Sum 1.0853114 0.2742656 0.6404230 2.0000000

> addmargins(prop.table(tt,2))
     race
smoke         1         2         3       Sum
  0   0.4583333 0.6153846 0.8208955 1.8946135
  1   0.5416667 0.3846154 0.1791045 1.1053865
  Sum 1.0000000 1.0000000 1.0000000 3.0000000

The round function can be used to get round values: 

code:

        > round(addmargins(prop.table(tt,2)),2)
        race
        smoke    1    2    3  Sum
        0   0.46 0.62 0.82 1.89
        1   0.54 0.38 0.18 1.11
        Sum 1.00 1.00 1.00 3.00

Chi-square test

code:
> chisq.test(tt)

        Pearson's Chi-squared test

data:  tt
X-squared = 21.779, df = 2, p-value = 0.00001865

 

Non-parametric test

Fisher's exact test can be used as a non-parametric test for contingency tables: 

code:

> fisher.test(tt)

        Fisher's Exact Test for Count Data

data:  tt
p-value = 0.000009799
alternative hypothesis: two.sided

Using regression: 
Logistic regression can also be used to determine relation between 2 factor or categorical variables: 

code:

> summary(glm(smoke~race, data=bwdf, family=binomial))

Call:
glm(formula = smoke ~ race, family = binomial, data = bwdf)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.2491  -0.9854  -0.6283   1.1073   1.8546  

Coefficients:
            Estimate Std. Error z value   Pr(>|z|)    
(Intercept)   0.1671     0.2048   0.816      0.415    
race2        -0.6371     0.4522  -1.409      0.159    
race3        -1.6895     0.3788  -4.460 0.00000818 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 253.04  on 188  degrees of freedom
Residual deviance: 230.05  on 186  degrees of freedom
AIC: 236.05

Number of Fisher Scoring iterations: 4 

As can be seen here, race 3 but not race 2 is significantly more related to smoking than race 1.

Genotype allele table analysis: 

For genotype allele table Cochran Armitage test is often recommended: 

code:
> mydf = data.frame(case=c(410,10,6), control=c(129,26,12)) 
> mydf
  case control
1  410     129
2   10      26
3    6      12

> library(coin)
         
          # IndependenceTest(control ~ case, data=mydf)
> independence_test(control ~ case, data=mydf)

        Asymptotic General Independence Test

 data:  control by case 
 Z = 1.407, p-value = 0.1594
 alternative hypothesis: two.sided


    Comments & Feedback