Test to assess distribution

The data can be, however, skewed to one side or be excessively tall in the center or have other shapes. In addition to the histogram and density curve mentioned above, This can be checked with parameters of skewness, kurtosis and shapiro test. 

Skewness, kurtosis and Shapiro-Wilk tests are some ways to assess normality, apart from visual inspection of histogram and density curves. 

Skewness
This can be assessed using following function: 

code: 
> myskewness = function(x) {
    m3 = mean((x-mean(x))^3)
    skew = m3/(sd(x)^3)
    round(skew,2)
}
Or by a function in package e1071: 
> library(e1071)
> x <- rnorm(100)
> skewness(x)
[1] -0.121306

Normal distribution has skewness of 0. If skewness is less than ?1 or greater than +1, the distribution is highly skewed. If skewness is between ?1 and ?½ or between +½ and +1, the distribution is moderately skewed. If skewness is between ?½ and +½, the distribution is approximately symmetric.

Kurtosis
This can be assessed using following function: 

code: 
> mykurtosis = function(x) {  
    m4 = mean((x-mean(x))^4) 
    kurt = m4/(sd(x)^4)-3  
    round(kurt,2)
}
Or by a function in package e1071: 
> library(e1071)
> x <- rnorm(100)
> kurtosis(x)
[1] 0.6027943

A normal distribution has excess kurtosis of 0 (absolute Kurtosis exactly 3). Any distribution with kurtosis ?3 (excess ?0) is called mesokurtic. A distribution with kurtosis <3 (excess kurtosis <0) is called platykurtic. Compared to a normal distribution, its central peak is lower and broader, and its tails are shorter and thinner. A distribution with kurtosis >3 (excess kurtosis >0) is called leptokurtic. Compared to a normal distribution, its central peak is higher and sharper, and its tails are longer and fatter, i.e., it is peaked.

Shapiro-Wilk test
This is a commonly used test for normality of variables. The null hypothesis here is that the data is normally distributed. P less than 0.05 indicates the H0 hypothesis is rejected and data is not normally distributed. P value > 0.05 indicates that the data is normally distributed. One can remember it as if P value indicates chances of normality. The greater the P value, larger are the chances of normal distribution. The P value can be obtained by following command:

code: 

> shapiro.test(xx)$p    # if > 0.05 means variable xx is normally distributed. 

The sample size for Shapiro test should be between 3 and 5000. With large sample sizes, even small deviation from normality leads to P value becoming <0.05, indicating non-normal distribution.   

Kolmogorov-Smirnov test

code: 

> with(birthwt,  ks.test(age,"pnorm",mean(age),sd(age) ) )

        One-sample Kolmogorov-Smirnov test

data:  age
D = 0.094517, p-value = 0.06831
alternative hypothesis: two-sided

Anderson-Darling test for normality 

code: 

> library(nortest)
> ad.test(birthwt$age)

        Anderson-Darling normality test

data:  birthwt$age
A = 1.9229, p-value = 0.00006384


    Comments & Feedback