Correlation matrix

This is a very useful technique where every variable of data frame is correlated with all other variables, so that at a glance once can see which variables are correlated. Obviously, all variables should be numeric for this purpose. Simple cor() function of base R can be used for this: 

code:

> round(cor(birthwt), 3)
         low    age    lwt   race  smoke    ptl     ht     ui    ftv    bwt
low    1.000 -0.119 -0.170  0.138  0.161  0.196  0.152  0.169 -0.063 -0.785
age   -0.119  1.000  0.180 -0.173 -0.044  0.072 -0.016 -0.075  0.215  0.090
lwt   -0.170  0.180  1.000 -0.165 -0.044 -0.140  0.236 -0.153  0.141  0.186
race   0.138 -0.173 -0.165  1.000 -0.339  0.008  0.020  0.054 -0.098 -0.195
smoke  0.161 -0.044 -0.044 -0.339  1.000  0.188  0.013  0.062 -0.028 -0.190
ptl    0.196  0.072 -0.140  0.008  0.188  1.000 -0.015  0.228 -0.044 -0.155
ht     0.152 -0.016  0.236  0.020  0.013 -0.015  1.000 -0.109 -0.072 -0.146
ui     0.169 -0.075 -0.153  0.054  0.062  0.228 -0.109  1.000 -0.060 -0.284
ftv   -0.063  0.215  0.141 -0.098 -0.028 -0.044 -0.072 -0.060  1.000  0.058
bwt   -0.785  0.090  0.186 -0.195 -0.190 -0.155 -0.146 -0.284  0.058  1.000

The function corr.test() of psych package gives out confidence intervals as well as P values in a dataframe format:

code:

> library(psych)
> cc = corr.test(birthwt)    
> print(cc, short=F)
Call:corr.test(x = birthwt)
Correlation matrix 
        low   age   lwt  race smoke   ptl    ht    ui   ftv   bwt
low    1.00 -0.12 -0.17  0.14  0.16  0.20  0.15  0.17 -0.06 -0.78
age   -0.12  1.00  0.18 -0.17 -0.04  0.07 -0.02 -0.08  0.22  0.09
lwt   -0.17  0.18  1.00 -0.17 -0.04 -0.14  0.24 -0.15  0.14  0.19
race   0.14 -0.17 -0.17  1.00 -0.34  0.01  0.02  0.05 -0.10 -0.19
smoke  0.16 -0.04 -0.04 -0.34  1.00  0.19  0.01  0.06 -0.03 -0.19
ptl    0.20  0.07 -0.14  0.01  0.19  1.00 -0.02  0.23 -0.04 -0.15
ht     0.15 -0.02  0.24  0.02  0.01 -0.02  1.00 -0.11 -0.07 -0.15
ui     0.17 -0.08 -0.15  0.05  0.06  0.23 -0.11  1.00 -0.06 -0.28
ftv   -0.06  0.22  0.14 -0.10 -0.03 -0.04 -0.07 -0.06  1.00  0.06
bwt   -0.78  0.09  0.19 -0.19 -0.19 -0.15 -0.15 -0.28  0.06  1.00
Sample Size 
[1] 189

Probability values (Entries above the diagonal are adjusted for multiple tests.) 
       low  age  lwt race smoke  ptl   ht   ui  ftv  bwt
low   0.00 1.00 0.63 1.00  0.77 0.27 0.97 0.63 1.00 0.00
age   0.10 0.00 0.45 0.57  1.00 1.00 1.00 1.00 0.12 1.00
lwt   0.02 0.01 0.00 0.70  1.00 1.00 0.04 0.97 1.00 0.37
race  0.06 0.02 0.02 0.00  0.00 1.00 1.00 1.00 1.00 0.28
smoke 0.03 0.54 0.55 0.00  0.00 0.35 1.00 1.00 1.00 0.32
ptl   0.01 0.33 0.05 0.91  0.01 0.00 1.00 0.07 1.00 0.94
ht    0.04 0.83 0.00 0.79  0.85 0.83 0.00 1.00 1.00 1.00
ui    0.02 0.30 0.04 0.46  0.40 0.00 0.14 0.00 1.00 0.00
ftv   0.39 0.00 0.05 0.18  0.70 0.54 0.32 0.42 0.00 1.00
bwt   0.00 0.22 0.01 0.01  0.01 0.03 0.05 0.00 0.43 0.00

 To see confidence intervals of the correlations, print with the short=FALSE option

 Confidence intervals based upon normal theory.  To get bootstrapped values, try cor.ci
           lower     r upper    p
low-age    -0.26 -0.12  0.02 0.10
low-lwt    -0.30 -0.17 -0.03 0.02
low-race   -0.01  0.14  0.28 0.06
low-smoke   0.02  0.16  0.30 0.03

low-ptl     0.05  0.20  0.33 0.01
low-ht      0.01  0.15  0.29 0.04
low-ui      0.03  0.17  0.30 0.02
low-ftv    -0.20 -0.06  0.08 0.39
low-bwt    -0.83 -0.78 -0.72 0.00
age-lwt     0.04  0.18  0.31 0.01
age-race   -0.31 -0.17 -0.03 0.02
age-smoke  -0.19 -0.04  0.10 0.54
age-ptl    -0.07  0.07  0.21 0.33
age-ht     -0.16 -0.02  0.13 0.83
age-ui     -0.22 -0.08  0.07 0.30
age-ftv     0.07  0.22  0.35 0.00
age-bwt    -0.05  0.09  0.23 0.22
lwt-race   -0.30 -0.17 -0.02 0.02
lwt-smoke  -0.19 -0.04  0.10 0.55
lwt-ptl    -0.28 -0.14  0.00 0.05
lwt-ht      0.10  0.24  0.37 0.00
lwt-ui     -0.29 -0.15 -0.01 0.04
lwt-ftv     0.00  0.14  0.28 0.05
lwt-bwt     0.04  0.19  0.32 0.01
race-smoke -0.46 -0.34 -0.21 0.00
race-ptl   -0.13  0.01  0.15 0.91
race-ht    -0.12  0.02  0.16 0.79
race-ui    -0.09  0.05  0.19 0.46
race-ftv   -0.24 -0.10  0.05 0.18
race-bwt   -0.33 -0.19 -0.05 0.01
smoke-ptl   0.05  0.19  0.32 0.01
smoke-ht   -0.13  0.01  0.16 0.85
smoke-ui   -0.08  0.06  0.20 0.40
smoke-ftv  -0.17 -0.03  0.12 0.70
smoke-bwt  -0.32 -0.19 -0.05 0.01
ptl-ht     -0.16 -0.02  0.13 0.83
ptl-ui      0.09  0.23  0.36 0.00
ptl-ftv    -0.19 -0.04  0.10 0.54
ptl-bwt    -0.29 -0.15 -0.01 0.03
ht-ui      -0.25 -0.11  0.03 0.14
ht-ftv     -0.21 -0.07  0.07 0.32
ht-bwt     -0.28 -0.15  0.00 0.05
ui-ftv     -0.20 -0.06  0.08 0.42
ui-bwt     -0.41 -0.28 -0.15 0.00
ftv-bwt    -0.09  0.06  0.20 0.43

 

Correlation matrix can also be shown as a plot: 

code:

    > library(corrplot)
    > M = cor(mydf)
    > corrplot.mixed(M, lower = "ellipse", upper = "number", order='hclust', addrect=2)

                   

Another method of displaying correlation matrix is using corrplot() function of arm package: 

code:

    > library(arm)
    > corrplot(birthwt, color=T, abs=F)

                   

Scatterplots corresponding to correlation matrix can also be produced using simple pairs() function: 

code:

    > pairs(bwdf[-1],  pch = 21, col = as.numeric(bwdf[,1]))

                   


    Comments & Feedback