Introduction, Installation and Basic Principles

R is a software environment and computer language for performing simple to complex statistical operations on data. It also has extensive features for graphics. Both these make it ideally suited for scienti c and research computing. Although it seems complex as compared to many commercially available GUI programs, its approach is relatively simple. It offers much greater options and possibilities once its basic principles are understood. The initial learning curve pays greatly in the long run.

This website aims to get beginners started on R in order to perform common statistical tasks related to analysis and graphic depiction of scienti c and research data. There are numerous other resources on R as well and one is encouraged to search the internet for further information.

R is an open source program that is free to download from the internet (http://www.R-project.org/). One can download the executable le (options available at https://cran.r-project.org/) and run it to install onto the system. Although some graphical user interface programs are available to work with R, entering commands on the terminal is the easiest and most exible way to work with R. All options are possible when using the terminal. One can open R terminal by double-clicking on R icon that appears on the desktop after installation. One can then enter commands and get results on R terminal. Try simple operations to get started:

Code:
2 + 3
Output:
[1] 5

Simple values can be put in variables as follows:
Code:
xx = 10
yy = 5

Arithmetical operations (+, -, , , ^or power) can be performed on these variables:
Code:
xx * yy
Output:
[1] 50
Similarly, characters, text and TRUE/FALSE can also be assigned. Series of numbers (called vectors) can be assigned to these variables:
Code:
xx = c(1,2,3)
yy = c(2,5,10)

R operates in using vectors and operates every value of xx with yy as follows:
Code:
xx + yy
Output:
[1] 3 7 13
If yy has less values than xx, the values of yy are recylced:
Code:
xx = c(3,14,51,156)
yy = 2
xx * yy

Output:
[1] 6 28 102 312

Functions:
Functions are key elements of R which perform operations in the data. There are many built-in functions in R. For example, to get the mean and standard deviation of xx, one can use mean and sd functions:
Code:
mean(xx)
Output:
[1] 56
Code:
sd(xx)
Output:
[1] 69.75672
Help and detailed information of these function can easily be seen using ' ?' operator. For example to see help on the mean function, type:
Code:
?mean
Some commonly used built-in functions that can be used on it are (note text after # is taken as comment in R):
Code:
abs(v) # absolute
min(v) # minimum
max(v) # maximum
range(v) # maximum
sum(v) # sum
mean(v) # mean
median(v) # median
sd(v) # standard deviation
sqrt(v) # square root
round(v) # rounded value
floor(v) # integer below this number

ceiling(v) # ineteger above this number
trun(v) # truncate the decimal part
log10(v) # common base 10 log
log(v) # natural log
exp(v) # exponential

Note 'v' can be a single value of a vector (series of numbers). Also that output of the commands is not shown shown here. Similarly, there are many built-in functions available for strings (text) such as substr() for substrings, grep() for pattern search, strsplit() for splitting strings, paste() and paste0() for joining multiple strings, toupper() and tolower() to change case of text, etc. Most of these need multiple arguments and one should see built-in help using ?fnname in R terminal. Similarly, t.test(), chisq.test(), wilcox.test() and anova() are also built-in functions to provide statistical analysis on data. Simplest function to draw graphs is plot() function. These are discussed further in other pages.

Extending by own functions:
A major advantage of R software is that it can easily be extended by one's own functions. These can be simple functions just to change a command name to make life easier or they can be complex functions available on the internet. A simple function to change command for installing packages can be following:

Code:
myinstall = function(packageNameString)
install.packages(packageNameString)

Note that in computer languages, any text is commonly referred to as 'string' (since text can be seen as a string of characters). The function is de ned as above. First there is function name ('myinstall' here) followed by '=' sign. Generally '¡ -' is recommended but '=' also works just fi ne. This is followed by 'function' keyword and then are the arguments that can be sent to this function. The curly brackets enclose the commands that will be performed if this function is called. Now, instead of typing install.packages('packageName'), one can just type:

Code:
myinstall('packageName')

Note that the name has to be in quotes or double-quotes since it is a 'string' (of characters) and not name of any object in R. Default values can be put with function arguments which will be used if arguments are not sent. For example:

Code:
mySquare = function(sentnum = 5)
sentnum * sentnum
mySquare(10)

Output:
[1] 100
Code:
mySquare()
Output:
[1] 25

Find possible arguments of a function:
For this one can either get built-in help for the function by typing:

Code:
?nameOfFunction

Or one can also know about possible arguments using following useful function:

Code:

For example, to know possible arguments of prism.plots function:

Code:
Output:
[1] "prism.plots=function (formula, data, centerfunc = mean,
def.axis = TRUE, jitter.y = FALSE, add = FALSE, start = 0,
...)"

Above function can be written (de ned) in a function le, say myfns.r. This le can be loaded from R terminal by giving following command:

Code:
source('myfns.r')

Above command can also be placed in .Rprofi le file for automatically loading 'myfns.r' le at start of R.

Extending R by installing useful packages:
R program can be extended by installing many useful dedicated packages which are also freely available on the internet. The command (from R terminal) to install a package is:

Code:
install.packages('packageName')

If there are any errors while installing new packages, one should quit R (by q() command) and restart R in vanilla mode before trying again:

Code:
R -vanilla

Once installed, these packages have to be loaded with library function before use:

Code:
library(packageName) # without quotes

Note: part of commands that come after '#' sign are taken to be comments and not executed by R.

By combining multiple statements, functions can be used to create complex analyses and graphs from data. Following single plot has graphs of 'low' variable analyzed with all other variables of bwdf dataset:

Complex tables of analytical information can also be printed out using custom functions:

----------------------------------------------------------------
Showing relation between {low} and all other variables:
----------------------------------------------------------------

NULL

Variable  Pvalue_EffectSize_Test          OddsRatio_or_Mean_SD_of_Groups

1      age               NS(tTest)           23.66(5.58); 22.31(4.51)(0;1)

2      lwt P=0.0131;cd=0.37(tTest)        133.3(31.72); 122.14(26.56)(0;1)

3     race               NS(Chisq) 2.31; 0.91; 5.79 (OR; lowerCI; upperCI)

4    smoke P=0.0396;cv=0.15(Chisq) 2.01; 1.07; 3.79 (OR; lowerCI; upperCI)

5      ptl P=0.0121;cd=0.43(tTest)             0.13(0.46); 0.34(0.54)(0;1)

6       ht               NS(Chisq)   3.31; 0.99; 12 (OR; lowerCI; upperCI)

7       ui P=0.0355;cv=0.15(Chisq) 2.56; 1.12; 5.89 (OR; lowerCI; upperCI)

8      ftv               NS(tTest)             0.84(1.07); 0.69(1.04)(0;1)

----------------------------------------------------------------

P value < 0.05 are shown here.

Note effect sizes used:

cd= Cohen's d with t-test/Wilcoxan test

(0.2=small, 0.5=moderate, 0.8=large effect size)

cv= Cramer's V with Chi-squred/Fisher's test

(0-1 like correlation coefficient R)

----------------------------------------------------------------

Cut function to group a number series (vector)

Often it is required to divide a number series into groups and cut() function can be used for this.  In following custom function called myquantiles, cut function is used to divide a series (vector) of number (called vect here) into the specified number of quantiles (default value is 3):

> myquantiles = function(vect, N=3){

cut(xx, breaks=quantile(xx, probs=seq(0,1, by=1/N)), include.lowest=T) }

For testing, one can get a sample of 20 values from range 1 to 100

> xx = sample(1:100, 20)

> xx

[1] 19 89 56 62 92 28 35 95 14  1 41 78 69 74 12 33 71 84  8 21

This vector can be cut into 3 quantiles with following command:

> qq = myquantiles(xx, 3)

> qq

[1] [1,29.7]    (70.3,95]   (29.7,70.3] (29.7,70.3] (70.3,95]   [1,29.7]    (29.7,70.3] (70.3,95]   [1,29.7]    [1,29.7]

[11] (29.7,70.3] (70.3,95]   (29.7,70.3] (70.3,95]   [1,29.7]    (29.7,70.3] (70.3,95]   (70.3,95]   [1,29.7]    [1,29.7]

Levels: [1,29.7] (29.7,70.3] (70.3,95]

Three quantiles range from 1 to 29.7, 29.7 to 70.3 and 70.3 to 95 for this series of numbers.  The 3 levels can be renamed with following command:

> levels(qq) = c("low", "mid", "high")

> qq

[1] low  high mid  mid  high low  mid  high low  low  mid  high mid  high low  mid  high high low  low

Levels: low mid high

These can be put together in a data.frame:

> dd = data.frame(Num=xx, Grp=qq)

Num  Grp

1  19  low

2  89 high

3  56  mid

4  62  mid

5  92 high

6  28  low

**************************