Interactive Apps

idataio input and output of data into R isummary - graphs and numerical summaries, with or without groups
ihist - histogram isplot - scatterplot, with or without groups
isubset - data subsetting  

Routines

barchart bplot - Boxplot
changeOrder chi.gof.test - Chisquare Goodnes-of-fit Test
chi.ind.test - Chisquare Test for Independence ci.mean.sim - Simulation of confidence intervals for one mean
dlr - Least Squares Regression with one Dummy Variable dlr.predict Prediction for SLR with Dummy Variable
fivenumber - five number summary flplot - Fitted Line Graph
hplot - Histogram  
intplot - Interaction Plot kruskalwallis - nonparamteric alternative to ANOVA
mallows - Best Subset Regression mannwhitney - nonparamteric alternative to the 2 sample t test
mlr Multiple Regression mlr.predict - Prediction for Multiple Regression
mplot - Marginal Plot multGraph - Combine several graphs into one
nplot - Normal Probability Plot one.sample.prop - One Proportion
one.sample.t - One Mean oneway - One way ANOVA
pearson.test - Test for Correlation prop.ps - Power and Sample Size for one Proportion
slr - Regression for One Predictor slr.predict Prediction for Regression with one Predictor
splot Scatterplot, also with groups stat.table - Summary Statistics
t.ps - Power and Sample Size for one Mean test.mean.sim - Simulation of hypotheisi testing for one mean
tukey - Tukey Multiple Comparison twoway - two way ANOVA
wilcoxon - nonparametric alternative to one.sample.t  

Interactive Apps

These are apps that open a new window and then allow the user to do all the work using (mostly) point and click.

Most of these apps are called with data sets as arguments. They will accept any number of arguments, which can be either vectors, matrices or data frames. If any of the the later arguments do not match the first one in length they are ignored. Some apps also return a data set.

Most of the apps also show the commands that could be used in R directly to produce the same results, either with the Resma3 commands or without them.

idataio

Routine to read data into R and export data to a file.

It allows for

• data entered from the keyboard into a spreadsheet  
• data read from a file
• data downloaded from the internet
• data copied from another program such as a browser or an Excel spreadsheet

Almost all standard file formats are supported, such as csv, excel, html, etc. For a complete ist see rio - R Input/Output

Examples:

> dta <- idataio()

> idataio(mtcars)

isummary

graphical and numerical summaries of one numerical vector, optionally rouped by a categorical variable

Examples:

> attach(mtcars)
> isummary(mtcars)
> isummary(mpg)
> isummary(mpg, gears)

ihist

draws histograms

Examples:

> attach(mtcars)
> ihist(mtcars)
> x <- rnorm(1000)
> ihist(x)

isplot

scatterplots

Examples:

> attach(mtcars)
> isplot(mtcars)
> isplot(mpg, disp, gear, cyl)

isubset

subsetting a data frame or vector

Examples:

> attach(mtcars)
> y <- isubset(mtcars)
> x <- rnorm(1000)
> y <- isubset(x)

R Routines

The routines I wrote for this course all use the following standard (where it makes sense)

first argument y is a numeric vector ("Response")

second argument x is either a numeric or categorical vector or matrix ("Predictor" or "Factor")

If both y and x are numeric sometimes there is a third argument z, a categorical vector ("Group")

Obvious exceptions: routines for categorical data analysis (barchart, chi.ind.test, chi.gof.test)
Many of the routines have the following arguments:

returnResult=FALSE (Optional): if TRUE returns results as vector for further use. This allows storing the results, for example to do simulation.

returnGraph=FALSE (Optional): if TRUE returns graph object for further use. This allows storing a graph object. Because all graphs are ggplot2 objects it also allows modifying graphs in ways not included in the routines.

Example: add a title to a graph:

> attach(mothers)
> plt<-bplot(Length,Status,returnGraph=TRUE)
> plt+ggtitle("My Boxplot")


You can get all the routines by either downloading and opening Resma3.RData or by downloading routines.r, saving it in some folder and then in R typing

> source("c:/folder/routines.r")

You can also copy all of it to the clipboard and then in R type

> source("clipboard")

The data sets used in the examples below are available at exampledata.r, again save the file and use source.

Printing and Saving Graphs

to print a graph select the graph window and click on File > Print ..

Saving Graphs: all of the routines that generate graphs have an argument graphname. There is also an object called myinfo with an element localfolder in the RData file. These together allow you to generate png's of any graphs as follows:

first set myfiles$localfolder to the folder on your computer where you which to save the graph. For example

> myinfo$localfolder <- "C:/esma/graphs/"

As long as you don't want to change the folder and you save the RData file you only need to do this once.

Now say you want to do the boxplot of the Weight variable in the euros data set, and save it with the name coinweight.png. Simply execute

> attach(euros)
> bplot(Weight,graphname="coinweights")

Inputting Data

The routine getData can be used to enter or import data into R. Values can be entered directly from the keyboard, from a table that was copied to the clipboard or they can be read from an Excel worksheet, either a xlsx or csv file. Those files can be stored locally on the computer or reside on some web site.

Arguments:

file: either leave empty, "copy" or the name of an excel worksheet (with either a csv o an xlsx extension). If the file resides on the hard drive either the full path name has to be given or it is assumed that the file is in the folder from where R was started.
col_names=TRUE: first line has column names
row_names=FALSE: first column has row names

Example: say we have the following info on some people
Age First Second
Old 10 16
Young 15 12
Old 20 21
Old 23 21
Young 12 25
Old 12 17
Young 15 14

We want to get the table above into R. Here are three ways to do this using getData:

1) type

> people <- getData()

this will bring up a window with a spreadsheet and you can just enter the values. For the column names click in the respective cells. When you are done entering the data click on the little red x.

2) use the mouse to highlight the whole table, switch to R and run

> people <- getData("copy")

Note: sometimes this results in a warning:
incomplete final line found by readTableHeader on 'clipboard'

but it generally works anyway.

> people <- getData("copy",row_names=TRUE)

this also works if the data is already in an Excel worksheet. Just highlight the data there and do as above.

3) Open Microsoft Excel and enter the info as usual. Save the file as excel spreadsheet (with the xlsx extension) in the folder c:\\tmp. Now in R type

> people <- getData(file="c:/tmp/filename.xlsx")

if you save the Excel worksheet in the same folder where you have (and started) Resma3.RData you don't have to give the folder path! Alternatively you can save the file in Excel as a CSV file and the use

> people <- getData(file="c:/tmp/filename.csv")

Suggestion: try option 2 first. If it works you are done! If not try option 1. If the data is already in Excel worksheet of course use option 3. This might also be a good idea if the data set is a bit larger because at least you have already saved it.

If the file is located somewhere on the internet you can get it by using the url. This data set is located on my website at http://academic.uprm.edu/wrolke/Resma3/exampletable.xlsx and so you can get with

> people <- getData("http://academic.uprm.edu/wrolke/Resma3/exampletable.xlsx")


some times you might make a mistake entring the data, or you want to change a few values. In that case use

> students<-edit(students)

This brings up the spreadsheet and you can do the changes there!


> attach(mothers)

data set is used in many of the examples below

Standard R Routines

attach

Arguments:

x: a data frame

makes column names "visible" to R

Examples:

> mean(Length)
Error in mean(Length) : object 'Length' not found

> attach(mothers)
> mean(Length)
[1] 49.54894 

Note: you need to do this only once in any R session, it will stay until you close R.

mean, median, sd, IQR, quantile

Summary statistics for quantitative data

Arguments:

x: a numeric vector
na.rm = FALSE

Examples:

> mean(Length)
[1] 49.54894

> median(Length) [1]
49.6

> sd(Length)
[1] 3.387128

> IQR(Length)
[1] 4.25

> quantile(Length, c(0.25,0.75))

25% 75%
47.45 51.70 

Note: all these routines have an argument na.rm = FALSE, so if the data set has missing values (NA) the result is NA. Simply use na.rm = TRUE

table

Tables and cross-tabulation for categorical data

Arguments:

x: either a categorical vector or a data frame with two categorical columns
y: a second categorical vector (if x is a vector as well)

Examples:

> head(rogaine,3)

Growth Group
No Growth Treatment
No Growth Treatment
No Growth Treatment
 

> table(rogaine)

Control Treatment
No Growth 423 301
New Vellus 150 172
Min Growth 114 178
Mod Growth 29 58
Den Growth 1 5
 

cor

Pearson's correlation coefficient

Arguments:

x: either a numericl vector or a data frame with two or more numeric columns
y: a second numeric vector (if x is a vector as well)
use = "everything", set to use="complete.obs" if NA's in the data

Examples:

> x<-rnorm(50)
> y<-rnorm(50)
> cor(x,y)

[1] 0.2388644

> cor(cbind(x,y))

x y
x 1 0.2388
y 0.24388 1

subset

find a subset of a data set based on some condition(s)

Arguments:

x: a data frame
cond: some logical condition
select (Optional): which columns should be returned, default is all of them
drop=FALSE, if just one column is selected as output use drop=TRUE

Examples:

> head(subset(wrinccensus, Satisfaction>=4, select=Income),3)
> head(subset(wrinccensus, Satisfaction>=4 & Gender=="Male"),3)
> head(subset(wrinccensus, Satisfaction>=4 & Gender=="Male", select=c(Income,Job.Level)),3)
> head(subset(wrinccensus, Satisfaction>=4 & Gender=="Male", select=Income),3)

Note that the last one results in a data frame with one column. You might want it as a numeric vector:

head(subset(wrinccensus, Satisfaction>=4 & Gender=="Male", select=Income, drop=TRUE),3)

NOTE: see also interactive app isubset

Routines for Summary Statistics

stat.table

tables of summary statistics, with or without groups

Arguments

y: numeric vector (Required)
x: categorical variable (Optional)
Mean=TRUE: if set to FALSE table finds medians and IQRs

Examples:

> stat.table(Length)
> stat.table(Length,Status)
> stat.table(Length,Status,Mean=FALSE)

Routines for One Variable

fivenumber

five number summary and IQR , with or without groups

Arguments:

y: quantitative vector
x: (optional) categorical vector

Example:

> fivenumber( y = Length)
Minimum Q1 Median Q3 Maximum
40.2 47.45 49.6 51.7 56.5

IQR = 4.25

> fivenumber( y = Length, x = Status)

Minimum Q1 Median Q3 Maximum
Drug Free 44.3 49.85 51.3 52.75 56.5
First Trimester 45.1 47.9 48.9 51 53.9
Throughout 40.2 46.52 48.15 50.28 55

IQR = 4.25

one.sample.t

Confidence interval or hypothesis test for one mean

Arguments:

y: either a vector with numbers or the sample mean of the data
shat, n: standard deviation and sample size (only needed if y is sample mean)
muNull: mean in null hypothesis (if missing confidence interval is found)
alternative = "equal": alternative hypothesis
conf.level = 95
ndigit = 1 (number of digits for rounding)

Examples:

> one.sample.t(Length,conf.level=90)
> one.sample.t(49.55,3.38,94,conf.level=90,ndigit=2)
> one.sample.t(Length,muNull=50,alternative="less")

t.ps

power and sample size calculations for one mean

Arguments:

n: sample size
diff: difference in means
sigma: standard deviation
power: power of test
E (optional): error of confidence interval (for sample size calculation only)
conf.level=90: confidence level of confidence interval (for sample size calculation only)
alpha = 0.05: type I error probability
alternative = "equal": alternative hypothesis

routine finds whatever argument is left out (n, diff or power)

Examples:

> t.ps(n=100,diff=1.23,sigma=5,alpha=0.1,alternative="greater")
> t.ps(power=90,d=1,sigma=13,alpha=0.1,alternative="greater")
> t.ps(sigma= 0.5, E=0.125, conf.level=99)

wilcoxon

Wilcoxon rank sum test for one quantitative variable - non parametric alternative to one.sample.t

Arguments:

y: quantitative vector
muNull: mean in null hypothesis (if missing confidence interval is found)
alternative = "equal": alternative hypothesis
conf.level = 95

Examples:

> wilcoxon(Length, conf.level=90)
> wilcoxon(Length, muNull=50, alternative="greater")

one.sample.prop

Confidence interval or hypothesis test for one proportion (percentage, probability)

Arguments:

x: number of successes
n: number of trials
piNull: proportion in null hypothesis (if missing confidence interval is found)
alternative = "equal": alternative hypothesis
conf.level = 95

Examples:

> one.sample.prop(40,100,conf.level=90)
> one.sample.prop(40,100,pNull=0.5,alternative=less)

prop.ps

Power and sample size calculations for one proportion

Arguments:

n: sample size
phat: alternative proportion
piNull: proportion under null hypothesis
power: power of test
E (optional): error of confidence interval (for sample size calculation only)
conf.level=90: confidence level of confidence interval (for sample size calculation only)
alpha = 0.05: type I error probability
alternative = "two.sided": alternative hypothesis

routine finds whatever argument is left out (n, phat or power)

Examples:

> prop.ps(n=100,phat=0.65,piNull=0.5)
> prop.ps(power=90,phat=0.65,piNull=0.5)

chi.gof.test

Chisquare test for multinomial proportions

Arguments:

x: observed counts
p: hypothesized proportions

Example

> chi.gof.test(c(12,17,20,15,10,26),rep(1,6)/6)

Routines for Two Variables

pearson.test

Confidence interval and hypothesis test for Pearson's correlation coefficient

Arguments:

y: quantitative vector
x: quantitative vector
doTest = TRUE (if FALSE confidence interval is found)
conf.level = 95 confidence level of interval

***Note: when the routine is run R sometimes gives a

Warning message:
Continuous x aesthetic -- did you forget aes(group=...)?

just ignore this

Example:

> attach(Draft)

> pearson.test( Draft.Number , Day.of.Year )

Routines for Simulations

ci.mean.sim

does a simulation for coverage of the t test confidence intervals

Arguments:

n : sample size
mu: mean
sigma: standard deviation
conf.level: nominal coverage

Example:

> ci.mean.sim(n=500,mu=75,sigma=30,conf.level=99)

test.mean.sim

does a simulation of the p value of the t test. If muNull=mu it finds the true type I error α, otherwise the power of the test. In either case it draws the histogram of p values.

Arguments:

n : sample size
mu: mean
muNull=mu: value of mean under null hypothesis
sigma: standard deviation
alpha: nominal alpha

Examples:

> test.mean.sim(n=20,mu=5,sigma=1,alpha=0.1)
> test.mean.sim(n=20,mu=5,muNull=5.5,sigma=1,alpha=0.1)

Routines for Graphs

barchart

bar charts

Arguments:

y: a table (often from a a call the table routine)
Percent: if missing graph uses counts. Other values are "Grand", "Row" or "Column" for respective percentages
newOrder: for changing the order of the bars
Polygon = FALSE if TRUE adds polygon

Examples:

> attach(rogaine)

> barchart(table(Growth))
> barchart(table(Growth),Percent="Grand")
> barchart(table(Growth),Percent="Grand",Polygon=TRUE)

> barchart(table(rogaine))
> barchart(table(rogaine))
> barchart(table(rogaine),Percent="Row")

hplot

Histogram, if desired with fitted density

Arguments:

x: numerical data
f: name of distribution (Optional)
par: parameters of distribution(Optional)
n: number of bins (Optional)
label_x, main_title: x axis label and graph title (Optional)

Examples:

> hplot(Length)
> hplot(Length, label_x = "Length of Babies (cm)", main_title = "Mothers, Babies and Cocain Use")
> hplot(Length, f = "norm", par = c(mean(Length), sd(Length)))

bplot

Boxplot

/ Violinplot

Arguments:

y: numeric vector or matrix or data frame
x: catagorical vector (Optional)
Violin = FALSE: if TRUE does violin plot
orientation="vertical", if orientation="horizontal" boxplot is drawn horizontally
label_x, label_y, main_title: axes labels and graph title (Optional)

Examples:

> bplot(Length)
> bplot(Length, Status)
> bplot(Length, Status, label_y = "Length of Babies (cm)", label_x = "Drug Status", main_title = "Mothers, Babies and Cocain Use")

splot

Scatterplot, possibly with groups and fits

Arguments:

y: numeric vector , y axis
x: numeric vector, x axis
z: catagorical variable (Optional)
w: second catagorical variable (Optional)
plotPoints=TRUE: if FALSE dots are not plotted
addLine = 0: adds lines, if addLine=1 least squares regression line, if addLine=2 LOESS, if addLine=3 it does the line graph
jitter = FALSE: if true jitters dots
useFacets = FALSE: if TRUE usess facets instead of colors for z
errorbars = FALSE: if TRUE adds error band to fit
label_x, label_y, label_z, main_title: axes labels and graph title (Optional)
addText, addText_x, addText_y: add text to graph (Optional)
psize = 1: size of plotting symbols
psymbols: change plotting symbols. can use either symbols added on keyboard or numbers corresponding to R symbols key(Optional)
pcolors: change colors, can use either numbers corresponding to R color key or explicit text : pcolor="red" (Optional)
ref_x, ref_y: add reference lines (Optional)
log_x = FALSE, log_y = FALSE: change to log scale
noLegends = FALSE: rmove all alegends

Examples:

> attach(salaries)
> splot(Salary,Years)
>splot(Salary,Years,addLine=1)
> splot(Salary,Years, Level,addLine=1)
> splot(Salary,Years, addLine=3)

> attach(upr)
> splot(y = Freshmen.GPA, x = IGS, z = Gender, useFacets = TRUE, addLine = 1, label_y = "Freshmen GPA", label_x = "Indice de Ingreso", main_title = "UPR Admissions", jitter=TRUE, psymbols = ".", pcolors = "blue", ref_x = 300, ref_y=3.5) 

NOTE: see also ineractive app isplot

mplot

Marginal plot with scatterplot and boxplots

Arguments:

y: numeric vector , y axis
x: numeric vector, x axis
z: catagorical variable (Optional)
addLine = 0: adds lines, if addLine=1 least squares regression line, if addLine=2 LOESS, if addLine=3 it does the line graph

Examples:

> mplot(Salary,Years)

Note: when the routine is run R sometimes gives a

Warning message:
Continuous x aesthetic -- did you forget aes(group=...)?

Just ignore that

flplot

Fitted line plot, allows for log transforms or polynomial fitting

Arguments:

y: numeric vector , y axis
x: numeric vector, x axis
z: catagorical variable (Optional)
additive = FALSE: if true fits parallel lines
logx = FALSE, logy = FALSE: if true applies log transforms
polydeg = 1: degree of polynomial to be fit
jitter = FALSE: if true jitters dots

Examples:

> attach(longjump)
> flplot(LongJump,Year)
> flplot(LongJump,Year,polydeg=2)
> flplot(elusage[,3],elusage[,4],logx=T,logy=T)

nplot

Normal probability plot

Arguments:

y: numerical vector
x: categorical vector (Optional)

Examples:

> nplot(euros[,1])

intplot

Interaction plot

Arguments:

y: numerical vector
x and z: categorical vectors

Examples:

> attach(fermentation)
> iplot(Ethanol , Sugar, Oxygen)

multGraph

combine (up to four graphs) in one

Arguments:

ggplt objects, likely generated using other graph functions with the argument returnGraph=TRUE
titles (Optional) titles for each graph

Examples:

> attach(gasoline)
> plt1<-bplot(MPG,Gasoline,returnGraph=TRUE)
> plt2<-bplot(MPG,Automobile,returnGraph=TRUE)
> multGraph(plt1,plt2)

> x<-rnorm(1000)
> multGraph(hplot(x,n=10,returnGraph=TRUE),
hplot(x,n=25,returnGraph=TRUE),hplot(x,n=50,returnGraph=TRUE),hplot(x,n=100,returnGraph=TRUE),titles=paste(c(10,25,50,100),"bins"))

Routines for Fitting

chi.ind.test

Chisquare test of independence

Arguments:

x: a table of counts

Examples:

> chi.ind.test(table(rogaine))

oneway

ANOVA with one factor

Arguments:

y: numeric vector
x: categorical vector
ndigit = 1: rounding answer to 1 digit
var.equal = TRUE: assume equal variance
conf.level = 95: in the case of a categorical variable with 2 levels finds a 95% confidence interval for the difference in means

Examples:

> oneway(Length, Status)

kruskalwallis

Non-parametric ANOVA

Arguments:

y: numeric vector
x: categorical vector

Examples:

> kruskalwallis(Length, Status)

twoway

ANOVA with two factors

Arguments:

y: numeric vector
x, z: categorical vectors
with.interaction = TRUE: assume interaction is present (defaults to FALSE if there are no repeated measurements)

Examples:

> attach(gasoline)
> twoway( MPG , Gasoline , Automobile)
> twoway( MPG , Gasoline , Automobile, with.interaction="FALSE")

tukey

Tukey's Multiple Comparison in ANOVA

Arguments:

y: numeric vector
x : categorical vector
z : second categorical vector (Optional)
with.interaction = TRUE: assume interaction is present (defaults to FALSE if there are no repeated measurements)
which="first": do comparison for first categorical variable (x), or change to which="second" or which="interaction"

Examples:

> tukey(mothers[,2],mothers[,1])
> tukey( MPG , Gasoline , Automobile, which="first")
> tukey( MPG , Gasoline , Automobile, which="interaction")

slr

Linear Regression with one predictor, including polynomial regression

Arguments:

y, x: numerical vectors
no.intercept = FALSE: fit intercept?
polydeg = 1: fit polynomial of higher degree?
show.tests=FALSE: if TRUE t tests for coefficients are shown

Examples:

> slr(wine[,3],wine[,2])
> slr(wine[,3],wine[,2],polydeg=2)
> slr(log(wine[,3]),wine[,2],polydeg=2)

slr.predict

Prediction for simple linear regression

Arguments:

same as slr
newx = x: predict for data values, or give other values for x
interval: either "PI" for prediction intervals or "CI" for confidence intervals
conf.level = 95

Examples:

> slr.predict(wine[,3],wine[,2],newx=c(2,2.5,3),interval="PI",conf.level=90)

mlr

Linear Regression with more than one predictor

Arguments:

y: numerical vector
x: numeric matrix with predictors in columns
show.tests=FALSE: if TRUE t tests for coefficients are shown
returnModel=FALSE, if TRUE fit object is returned (and can be used in other routines)

Examples:

> mlr( houseprice[,1] , houseprice[, -1] )

mlr.predict

Prediction for regression with more than one predictor

Arguments:

same as slr.predict but here x and newx are matrices

Examples:

> newx <- cbind(c(2000,2100,2200),rep(1,3),rep(2,3),rep(2,3))
> mlr.predict( houseprice[,1] , houseprice[, -1] ,newx=newx ,interval="PI")

mallows

Best subset regression with Mallow's Cp

Arguments:

same as mlr

Examples:

> mallows( houseprice[,1] , houseprice[, -1] )

dlr

Linear regression with one dummy variable

Arguments:

y: numerical vector
x: numeric vector
z: categorical vector
additive = FALSE: if parallel lines set to TRUE
show.tests=FALSE: if TRUE t tests for coefficients are shown

Examples:

> dlr(salaries[,1],salaries[,2],salaries[,3])
> dlr(salaries[,1],salaries[,2],salaries[,3],additive=T)

dlr.predict

Prediction for regression with a dummy variable

Arguments:

same as slr.predict but also needs
newz: values of categorical variable for prediction

Examples: > dlr.predict( salaries[, 1] , salaries[, 2] , salaries[, 3], newx=5 ,newz="Low", interval="PI")

Miscellaneous

changeOrder

Change the order of a categorical variable

Arguments:

z: categorical variable
NewOrder: can be a numeric vector specifying a certain order or a categorical vector with ordered values of z

Examples:

> attach(mothers)
> bplot(Length,Status)
> bplot(Length,changeOrder(Status,c(2,1,3)))
> bplot(Length,changeOrder(Status,c("Throughout","First Trimester","Drug Free")))