R is a software environment and computer language for performing simple to complex statistical operations on data. It also has extensive features for graphics. Both these make it ideally suited for scientic and research computing. Although it seems complex as compared to many commercially available GUI programs, its approach is relatively simple. It offers much greater options and possibilities once its basic principles are understood. The initial learning curve pays greatly in the long run.

This website aims to get beginners started on R in order to perform common statistical tasks related to analysis and graphic depiction of scientic and research data. There are numerous other resources on R as well and one is encouraged to search the internet for further information.

R is an open source program that is free to download from the internet (http://www.R-project.org/). One can download the executable le (options available at https://cran.r-project.org/) and run it to install onto the system. Although some graphical user interface programs are available to work with R, entering commands on the terminal is the easiest and most exible way to work with R. All options are possible when using the terminal. One can open R terminal by double-clicking on R icon that appears on the desktop after installation. One can then enter commands and get results on R terminal. Try simple operations to get started:

**Code:**

2 + 3

**Output:**

[1] 5

Adding data:

Simple values can be put in variables as follows:

**Code:**

xx = 10

yy = 5

Arithmetical operations (+, -, , , ^or power) can be performed on these variables:

**Code:**

xx * yy

**Output:**

[1] 50

Similarly, characters, text and TRUE/FALSE can also be assigned. Series of numbers (called vectors) can be assigned to these variables:

**Code:**

xx = c(1,2,3)

yy = c(2,5,10)

R operates in using vectors and operates every value of xx with yy as follows:

**Code:**

xx + yy

**Output:**

[1] 3 7 13

If yy has less values than xx, the values of yy are recylced:

**Code:**

xx = c(3,14,51,156)

yy = 2

xx * yy

**Output:**

[1] 6 28 102 312

Functions:

Functions are key elements of R which perform operations in the data. There are many built-in functions in R. For example, to get the mean and standard deviation of xx, one can use mean and sd functions:

**Code:**

mean(xx)

**Output:**

[1] 56

**Code:**

sd(xx)

**Output:**

[1] 69.75672

Help and detailed information of these function can easily be seen using ' ?' operator. For example to see help on the mean function, type:

**Code:**

?mean

Some commonly used built-in functions that can be used on it are (note text after # is taken as comment in R):

**Code:**

abs(v) # absolute

min(v) # minimum

max(v) # maximum

range(v) # maximum

sum(v) # sum

mean(v) # mean

median(v) # median

sd(v) # standard deviation

sqrt(v) # square root

round(v) # rounded value

floor(v) # integer below this number

ceiling(v) # ineteger above this number

trun(v) # truncate the decimal part

log10(v) # common base 10 log

log(v) # natural log

exp(v) # exponential

Note 'v' can be a single value of a vector (series of numbers). Also that output of the commands is not shown shown here. Similarly, there are many built-in functions available for strings (text) such as substr() for substrings, grep() for pattern search, strsplit() for splitting strings, paste() and paste0() for joining multiple strings, toupper() and tolower() to change case of text, etc. Most of these need multiple arguments and one should see built-in help using ?fnname in R terminal. Similarly, t.test(), chisq.test(), wilcox.test() and anova() are also built-in functions to provide statistical analysis on data. Simplest function to draw graphs is plot() function. These are discussed further in other pages.

Extending by own functions:

A major advantage of R software is that it can easily be extended by one's own functions. These can be simple functions just to change a command name to make life easier or they can be complex functions available on the internet. A simple function to change command for installing packages can be following:

**Code:**

myinstall = function(packageNameString)

install.packages(packageNameString)

Note that in computer languages, any text is commonly referred to as 'string' (since text can be seen as a string of characters). The function is dened as above. First there is function name ('myinstall' here) followed by '=' sign. Generally '¡ -' is recommended but '=' also works just fine. This is followed by 'function' keyword and then are the arguments that can be sent to this function. The curly brackets enclose the commands that will be performed if this function is called. Now, instead of typing install.packages('packageName'), one can just type:

**Code:**

myinstall('packageName')

Note that the name has to be in quotes or double-quotes since it is a 'string' (of characters) and not name of any object in R. Default values can be put with function arguments which will be used if arguments are not sent. For example:

**Code:**

mySquare = function(sentnum = 5)

sentnum * sentnum

mySquare(10)

**Output:**

[1] 100

**Code:**

mySquare()

**Output:**

[1] 25

Find possible arguments of a function:

For this one can either get built-in help for the function by typing:

**Code:**

?nameOfFunction

Or one can also know about possible arguments using following useful function:

**Code:**

header¡-function(x) UseMethod('header',x)

For example, to know possible arguments of prism.plots function:

**Code:**

header(prism.plots)

**Output:**

[1] "prism.plots=function (formula, data, centerfunc = mean,

spreadfunc = function(x) return(sd(x)/sqrt(length(x))),

def.axis = TRUE, jitter.y = FALSE, add = FALSE, start = 0,

...)"

Loading own custom function at startup:

Above function can be written (dened) in a function le, say myfns.r. This le can be loaded from R terminal by giving following command:

**Code:**

source('myfns.r')

Above command can also be placed in .Rprofile file for automatically loading 'myfns.r' le at start of R.

Extending R by installing useful packages:

R program can be extended by installing many useful dedicated packages which are also freely available on the internet. The command (from R terminal) to install a package is:

**Code:**

install.packages('packageName')

If there are any errors while installing new packages, one should quit R (by q() command) and restart R in vanilla mode before trying again:

**Code:**

R -vanilla

Once installed, these packages have to be loaded with library function before use:

**Code:**

library(packageName) # without quotes

Note: part of commands that come after '#' sign are taken to be comments and not executed by R.

By combining multiple statements, functions can be used to create complex analyses and graphs from data. Following single plot has graphs of 'low' variable analyzed with all other variables of bwdf dataset:

**Output graph:**

Complex tables of analytical information can also be printed out using custom functions:

**Output:**

----------------------------------------------------------------

Showing relation between {low} and all other variables:

----------------------------------------------------------------

**NULL**

** Variable Pvalue_EffectSize_Test OddsRatio_or_Mean_SD_of_Groups**

**1 age NS(tTest) 23.66(5.58); 22.31(4.51)(0;1)**

**2 lwt P=0.0131;cd=0.37(tTest) 133.3(31.72); 122.14(26.56)(0;1)**

**3 race NS(Chisq) 2.31; 0.91; 5.79 (OR; lowerCI; upperCI)**

**4 smoke P=0.0396;cv=0.15(Chisq) 2.01; 1.07; 3.79 (OR; lowerCI; upperCI)**

**5 ptl P=0.0121;cd=0.43(tTest) 0.13(0.46); 0.34(0.54)(0;1)**

**6 ht NS(Chisq) 3.31; 0.99; 12 (OR; lowerCI; upperCI)**

**7 ui P=0.0355;cv=0.15(Chisq) 2.56; 1.12; 5.89 (OR; lowerCI; upperCI)**

**8 ftv NS(tTest) 0.84(1.07); 0.69(1.04)(0;1)**

----------------------------------------------------------------

P value < 0.05 are shown here.

Note effect sizes used:

cd= Cohen's d with t-test/Wilcoxan test

(0.2=small, 0.5=moderate, 0.8=large effect size)

cv= Cramer's V with Chi-squred/Fisher's test

(0-1 like correlation coefficient R)

----------------------------------------------------------------

** Cut function to group a number series (vector)**

Often it is required to divide a number series into groups and cut() function can be used for this. In following custom function called myquantiles, cut function is used to divide a series (vector) of number (called vect here) into the specified number of quantiles (default value is 3):

**Code:**

> myquantiles = function(vect, N=3){

cut(xx, breaks=quantile(xx, probs=seq(0,1, by=1/N)), include.lowest=T) }

For testing, one can get a sample of 20 values from range 1 to 100

**Code:**

> xx = sample(1:100, 20)

> xx

**Output:**

[1] 19 89 56 62 92 28 35 95 14 1 41 78 69 74 12 33 71 84 8 21

This vector can be cut into 3 quantiles with following command:

**Code:**

> qq = myquantiles(xx, 3)

**Output:**

[1] [1,29.7] (70.3,95] (29.7,70.3] (29.7,70.3] (70.3,95] [1,29.7] (29.7,70.3] (70.3,95] [1,29.7] [1,29.7]

[11] (29.7,70.3] (70.3,95] (29.7,70.3] (70.3,95] [1,29.7] (29.7,70.3] (70.3,95] (70.3,95] [1,29.7] [1,29.7]

Levels: [1,29.7] (29.7,70.3] (70.3,95]

Three quantiles range from 1 to 29.7, 29.7 to 70.3 and 70.3 to 95 for this series of numbers. The 3 levels can be renamed with following command:

**Code:**

> levels(qq) = c("low", "mid", "high")

**Output:**

[1] low high mid mid high low mid high low low mid high mid high low mid high high low low

Levels: low mid high

These can be put together in a data.frame:

**Code:**

> dd = data.frame(Num=xx, Grp=qq)

> head(dd)

**Output:**

Num Grp

1 19 low

2 89 high

3 56 mid

4 62 mid

5 92 high

6 28 low

**************************