Interactive Apps | |
---|---|
idataio input and output of data into R | isummary - graphs and numerical summaries, with or without groups |
ihist - histogram | isplot - scatterplot, with or without groups |
isubset - data subsetting |
Routines | |
---|---|
barchart | bplot - Boxplot |
changeOrder | chi.gof.test - Chisquare Goodnes-of-fit Test |
chi.ind.test - Chisquare Test for Independence | ci.mean.sim - Simulation of confidence intervals for one mean |
dlr - Least Squares Regression with one Dummy Variable | dlr.predict Prediction for SLR with Dummy Variable |
fivenumber - five number summary | flplot - Fitted Line Graph |
hplot - Histogram | |
intplot - Interaction Plot | kruskalwallis - nonparamteric alternative to ANOVA |
mallows - Best Subset Regression | mannwhitney - nonparamteric alternative to the 2 sample t test |
mlr Multiple Regression | mlr.predict - Prediction for Multiple Regression |
mplot - Marginal Plot | multGraph - Combine several graphs into one |
nplot - Normal Probability Plot | one.sample.prop - One Proportion |
one.sample.t - One Mean | oneway - One way ANOVA |
pearson.test - Test for Correlation | prop.ps - Power and Sample Size for one Proportion |
slr - Regression for One Predictor | slr.predict Prediction for Regression with one Predictor |
splot Scatterplot, also with groups | stat.table - Summary Statistics |
t.ps - Power and Sample Size for one Mean | test.mean.sim - Simulation of hypotheisi testing for one mean |
tukey - Tukey Multiple Comparison | twoway - two way ANOVA |
wilcoxon - nonparametric alternative to one.sample.t |
These are apps that open a new window and then allow the user to do all the work using (mostly) point and click.
Most of these apps are called with data sets as arguments. They will accept any number of arguments, which can be either vectors, matrices or data frames. If any of the the later arguments do not match the first one in length they are ignored. Some apps also return a data set.
Most of the apps also show the commands that could be used in R directly to produce the same results, either with the Resma3 commands or without them.
Routine to read data into R and export data to a file.
It allows for
• data entered from the keyboard into a spreadsheet
• data read from a file
• data downloaded from the internet
• data copied from another program such as a browser or an Excel spreadsheet
Almost all standard file formats are supported, such as csv, excel, html, etc. For a complete ist see rio - R Input/Output
Examples:
> dta <- idataio()
> idataio(mtcars)
graphical and numerical summaries of one numerical vector, optionally rouped by a categorical variable
Examples:
> attach(mtcars)
> isummary(mtcars)
> isummary(mpg)
> isummary(mpg, gears)
draws histograms
Examples:
> attach(mtcars)
> ihist(mtcars)
> x <- rnorm(1000)
> ihist(x)
scatterplots
Examples:
> attach(mtcars)
> isplot(mtcars)
> isplot(mpg, disp, gear, cyl)
subsetting a data frame or vector
Examples:
> attach(mtcars)
> y <- isubset(mtcars)
> x <- rnorm(1000)
> y <- isubset(x)
first argument y is a numeric vector ("Response")
second argument x is either a numeric or categorical vector or matrix ("Predictor" or "Factor")
If both y and x are numeric sometimes there is a third argument z, a categorical vector ("Group")
Obvious exceptions: routines for categorical data analysis (barchart, chi.ind.test, chi.gof.test)returnResult=FALSE (Optional): if TRUE returns results as vector for further use. This allows storing the results, for example to do simulation.
returnGraph=FALSE (Optional): if TRUE returns graph object for further use. This allows storing a graph object. Because all graphs are ggplot2 objects it also allows modifying graphs in ways not included in the routines.
Example: add a title to a graph:
> attach(mothers)
> plt<-bplot(Length,Status,returnGraph=TRUE)
> plt+ggtitle("My Boxplot")
> source("c:/folder/routines.r")
You can also copy all of it to the clipboard and then in R type
> source("clipboard")
The data sets used in the examples below are available at exampledata.r, again save the file and use source.
Saving Graphs: all of the routines that generate graphs have an argument graphname. There is also an object called myinfo with an element localfolder in the RData file. These together allow you to generate png's of any graphs as follows:
first set myfiles$localfolder to the folder on your computer where you which to save the graph. For example
> myinfo$localfolder <- "C:/esma/graphs/"
As long as you don't want to change the folder and you save the RData file you only need to do this once.
Now say you want to do the boxplot of the Weight variable in the euros data set, and save it with the name coinweight.png. Simply execute
> attach(euros)
> bplot(Weight,graphname="coinweights")
The routine getData can be used to enter or import data into R. Values can be entered directly from the keyboard, from a table that was copied to the clipboard or they can be read from an Excel worksheet, either a xlsx or csv file. Those files can be stored locally on the computer or reside on some web site.
Arguments:
file: either leave empty, "copy" or the name of an excel worksheet (with either a csv o an xlsx extension). If the file resides on the hard drive either the full path name has to be given or it is assumed that the file is in the folder from where R was started.
col_names=TRUE: first line has column names
row_names=FALSE: first column has row names
Example: say we have the following info on some people
Age | First | Second |
---|---|---|
Old | 10 | 16 |
Young | 15 | 12 |
Old | 20 | 21 |
Old | 23 | 21 |
Young | 12 | 25 |
Old | 12 | 17 |
Young | 15 | 14 |
We want to get the table above into R. Here are three ways to do this using getData:
1) type
> people <- getData()
this will bring up a window with a spreadsheet and you can just enter the values. For the column names click in the respective cells. When you are done entering the data click on the little red x.
2) use the mouse to highlight the whole table, switch to R and run
> people <- getData("copy")
Note: sometimes this results in a warning:
incomplete final line found by readTableHeader on 'clipboard'
but it generally works anyway.
> people <- getData("copy",row_names=TRUE)
this also works if the data is already in an Excel worksheet. Just highlight the data there and do as above.
3) Open Microsoft Excel and enter the info as usual. Save the file as excel spreadsheet (with the xlsx extension) in the folder c:\\tmp. Now in R type
> people <- getData(file="c:/tmp/filename.xlsx")
if you save the Excel worksheet in the same folder where you have (and started) Resma3.RData you don't have to give the folder path! Alternatively you can save the file in Excel as a CSV file and the use
> people <- getData(file="c:/tmp/filename.csv")
Suggestion: try option 2 first. If it works you are done! If not try option 1. If the data is already in Excel worksheet of course use option 3. This might also be a good idea if the data set is a bit larger because at least you have already saved it.
If the file is located somewhere on the internet you can get it by using the url. This data set is located on my website at http://academic.uprm.edu/wrolke/Resma3/exampletable.xlsx and so you can get with
> people <- getData("http://academic.uprm.edu/wrolke/Resma3/exampletable.xlsx")
some times you might make a mistake entring the data, or you want to change a few values. In that case use
> students<-edit(students)
This brings up the spreadsheet and you can do the changes there!
> attach(mothers)
data set is used in many of the examples below
Arguments:
x: a data frame
makes column names "visible" to R
Examples:
> mean(Length)
Error in mean(Length) : object 'Length' not found
> attach(mothers)
> mean(Length)
[1] 49.54894
Note: you need to do this only once in any R session, it will stay until you close R.
Arguments:
x: a numeric vector
na.rm = FALSE
Examples:
> mean(Length)
[1] 49.54894
> median(Length)
[1]
49.6
> sd(Length)
[1] 3.387128
> IQR(Length)
[1] 4.25
> quantile(Length, c(0.25,0.75))
25% 75%
47.45 51.70
Note: all these routines have an argument na.rm = FALSE, so if the data set has missing values (NA) the result is NA. Simply use na.rm = TRUE
Tables and cross-tabulation for categorical data
Arguments:
x: either a categorical vector or a data frame with two categorical columns
y: a second categorical vector (if x is a vector as well)
Examples:
> head(rogaine,3)
Growth | Group |
---|---|
No Growth | Treatment |
No Growth | Treatment |
No Growth | Treatment |
> table(rogaine)
Control | Treatment | |
---|---|---|
No Growth | 423 | 301 |
New Vellus | 150 | 172 |
Min Growth | 114 | 178 |
Mod Growth | 29 | 58 |
Den Growth | 1 | 5 |
Pearson's correlation coefficient
Arguments:
x: either a numericl vector or a data frame with two or more numeric columns
y: a second numeric vector (if x is a vector as well)
use = "everything", set to use="complete.obs" if NA's in the data
Examples:
> x<-rnorm(50)
>
y<-rnorm(50)
> cor(x,y)
[1] 0.2388644
> cor(cbind(x,y))
x | y | |
---|---|---|
x | 1 | 0.2388 |
y | 0.24388 | 1 |
find a subset of a data set based on some condition(s)
Arguments:
x: a data frame
cond: some logical condition
select (Optional): which columns should be returned, default is all of them
drop=FALSE, if just one column is selected as output use drop=TRUE
Examples:
> head(subset(wrinccensus, Satisfaction>=4, select=Income),3)
> head(subset(wrinccensus, Satisfaction>=4 & Gender=="Male"),3)
> head(subset(wrinccensus, Satisfaction>=4 & Gender=="Male", select=c(Income,Job.Level)),3)
> head(subset(wrinccensus, Satisfaction>=4 & Gender=="Male", select=Income),3)
Note that the last one results in a data frame with one column. You might want it as a numeric vector:
> head(subset(wrinccensus, Satisfaction>=4 & Gender=="Male", select=Income, drop=TRUE),3)
NOTE: see also interactive app isubsetArguments
y: numeric vector (Required)
x: categorical variable (Optional)
Mean=TRUE: if set to FALSE table finds medians and IQRs
Examples:
> stat.table(Length)
> stat.table(Length,Status)
> stat.table(Length,Status,Mean=FALSE)
Arguments:
y: quantitative vector
x: (optional) categorical vector
Example:
> fivenumber( y = Length)
Minimum | Q1 | Median | Q3 | Maximum | |
---|---|---|---|---|---|
40.2 | 47.45 | 49.6 | 51.7 | 56.5 |
> fivenumber( y = Length, x = Status)
Minimum | Q1 | Median | Q3 | Maximum | |
---|---|---|---|---|---|
Drug Free | 44.3 | 49.85 | 51.3 | 52.75 | 56.5 |
First Trimester | 45.1 | 47.9 | 48.9 | 51 | 53.9 |
Throughout | 40.2 | 46.52 | 48.15 | 50.28 | 55 |
Arguments:
y: either a vector with numbers or the sample mean of the data
shat, n: standard deviation and sample size (only needed if y is sample mean)
muNull: mean in null hypothesis (if missing confidence interval is found)
alternative = "equal": alternative hypothesis
conf.level = 95
ndigit = 1 (number of digits for rounding)
Examples:
> one.sample.t(Length,conf.level=90)
> one.sample.t(49.55,3.38,94,conf.level=90,ndigit=2)
> one.sample.t(Length,muNull=50,alternative="less")
power and sample size calculations for one mean
Arguments:
n: sample size
diff: difference in means
sigma: standard deviation
power: power of test
E (optional): error of confidence interval (for sample size calculation only)
conf.level=90: confidence level of confidence interval (for sample size calculation only)
alpha = 0.05: type I error probability
alternative = "equal": alternative hypothesis
routine finds whatever argument is left out (n, diff or power)
Examples:
> t.ps(n=100,diff=1.23,sigma=5,alpha=0.1,alternative="greater")
> t.ps(power=90,d=1,sigma=13,alpha=0.1,alternative="greater")
> t.ps(sigma= 0.5, E=0.125, conf.level=99)
Wilcoxon rank sum test for one quantitative variable - non parametric alternative to one.sample.t
Arguments:
y: quantitative vector
muNull: mean in null hypothesis (if missing confidence interval is found)
alternative = "equal": alternative hypothesis
conf.level = 95
Examples:
> wilcoxon(Length, conf.level=90)
> wilcoxon(Length, muNull=50, alternative="greater")
Arguments:
x: number of successes
n: number of trials
piNull: proportion in null hypothesis (if missing confidence interval is found)
alternative = "equal": alternative hypothesis
conf.level = 95
Examples:
> one.sample.prop(40,100,conf.level=90)
> one.sample.prop(40,100,pNull=0.5,alternative=less)
Arguments:
n: sample size
phat: alternative proportion
piNull: proportion under null hypothesis
power: power of test
E (optional): error of confidence interval (for sample size calculation only)
conf.level=90: confidence level of confidence interval (for sample size calculation only)
alpha = 0.05: type I error probability
alternative = "two.sided": alternative hypothesis
routine finds whatever argument is left out (n, phat or power)
Examples:
> prop.ps(n=100,phat=0.65,piNull=0.5)
> prop.ps(power=90,phat=0.65,piNull=0.5)
Arguments:
x: observed counts
p: hypothesized proportions
Example
> chi.gof.test(c(12,17,20,15,10,26),rep(1,6)/6)
Arguments:
y: quantitative vector
x: quantitative vector
doTest = TRUE (if FALSE confidence interval is found)
conf.level = 95 confidence level of interval
***Note: when the routine is run R sometimes gives a
Warning message:
Continuous x aesthetic -- did you forget aes(group=...)?
just ignore this
Example:
> attach(Draft)
> pearson.test( Draft.Number , Day.of.Year )
Arguments:
n : sample size
mu: mean
sigma: standard deviation
conf.level: nominal coverage
Example:
> ci.mean.sim(n=500,mu=75,sigma=30,conf.level=99)
Arguments:
n : sample size
mu: mean
muNull=mu: value of mean under null hypothesis
sigma: standard deviation
alpha: nominal alpha
Examples:
> test.mean.sim(n=20,mu=5,sigma=1,alpha=0.1)
> test.mean.sim(n=20,mu=5,muNull=5.5,sigma=1,alpha=0.1)
Arguments:
y: a table (often from a a call the table routine)
Percent: if missing graph uses counts. Other values are "Grand", "Row" or "Column" for respective percentages
newOrder: for changing the order of the bars
Polygon = FALSE if TRUE adds polygon
Examples:
> attach(rogaine)
> barchart(table(Growth))
> barchart(table(Growth),Percent="Grand")
> barchart(table(Growth),Percent="Grand",Polygon=TRUE)
> barchart(table(rogaine))
> barchart(table(rogaine))
> barchart(table(rogaine),Percent="Row")
Arguments:
x: numerical data
f: name of distribution (Optional)
par: parameters of distribution(Optional)
n: number of bins (Optional)
label_x,
main_title: x axis label and graph title (Optional)
Examples:
> hplot(Length)
> hplot(Length, label_x = "Length of Babies (cm)", main_title = "Mothers, Babies and Cocain Use")
> hplot(Length, f = "norm", par = c(mean(Length), sd(Length)))
Arguments:
y: numeric vector or matrix or data frame
x: catagorical vector (Optional)
Violin = FALSE: if TRUE does violin plot
orientation="vertical", if orientation="horizontal" boxplot is drawn horizontally
label_x, label_y, main_title: axes labels and graph title (Optional)
Examples:
> bplot(Length)
> bplot(Length, Status)
> bplot(Length, Status, label_y = "Length of Babies (cm)", label_x = "Drug Status", main_title = "Mothers, Babies and Cocain Use")
Arguments:
y: numeric vector , y axis
x: numeric vector, x axis
z: catagorical variable (Optional)
w: second catagorical variable (Optional)
plotPoints=TRUE: if FALSE dots are not plotted
addLine = 0: adds lines, if addLine=1 least squares regression line, if addLine=2 LOESS, if addLine=3 it does the line graph
jitter = FALSE: if true jitters dots
useFacets = FALSE: if TRUE usess facets instead of colors for z
errorbars = FALSE: if TRUE adds error band to fit
label_x, label_y, label_z,
main_title: axes labels and graph title (Optional)
addText, addText_x, addText_y: add text to graph (Optional)
psize = 1: size of plotting symbols
psymbols: change plotting symbols. can use either symbols added on keyboard or numbers corresponding to R symbols key(Optional)
pcolors: change colors, can use either numbers corresponding to R color key or explicit text : pcolor="red" (Optional)
ref_x, ref_y: add reference lines (Optional)
log_x = FALSE, log_y = FALSE: change to log scale
noLegends = FALSE: rmove all alegends
Examples:
> attach(salaries)
> splot(Salary,Years)
>splot(Salary,Years,addLine=1)
> splot(Salary,Years, Level,addLine=1)
> splot(Salary,Years, addLine=3)
> attach(upr)
>
splot(y = Freshmen.GPA, x = IGS, z = Gender, useFacets = TRUE, addLine = 1, label_y = "Freshmen GPA", label_x = "Indice de Ingreso", main_title = "UPR Admissions", jitter=TRUE, psymbols = ".", pcolors = "blue", ref_x = 300, ref_y=3.5)
Arguments:
y: numeric vector , y axis
x: numeric vector, x axis
z: catagorical variable (Optional)
addLine = 0: adds lines, if addLine=1 least squares regression line, if addLine=2 LOESS, if addLine=3 it does the line graph
Examples:
> mplot(Salary,Years)
Note: when the routine is run R sometimes gives a
Warning message:
Continuous x aesthetic -- did you forget aes(group=...)?
Just ignore that
Arguments:
y: numeric vector , y axis
x: numeric vector, x axis
z: catagorical variable (Optional)
additive = FALSE: if true fits parallel lines
logx = FALSE, logy = FALSE: if true applies log transforms
polydeg = 1: degree of polynomial to be fit
jitter = FALSE: if true jitters dots
Examples:
> attach(longjump)
> flplot(LongJump,Year)
> flplot(LongJump,Year,polydeg=2)
> flplot(elusage[,3],elusage[,4],logx=T,logy=T)
Arguments:
y: numerical vector
x: categorical vector (Optional)
Examples:
> nplot(euros[,1])
Arguments:
y: numerical vector
x and z: categorical vectors
Examples:
> attach(fermentation)
>
iplot(Ethanol , Sugar, Oxygen)
ggplt objects, likely generated using other graph functions with the argument returnGraph=TRUE
titles (Optional) titles for each graph
Examples:
> attach(gasoline)
> plt1<-bplot(MPG,Gasoline,returnGraph=TRUE)
> plt2<-bplot(MPG,Automobile,returnGraph=TRUE)
> multGraph(plt1,plt2)
> x<-rnorm(1000)
> multGraph(hplot(x,n=10,returnGraph=TRUE), hplot(x,n=25,returnGraph=TRUE),hplot(x,n=50,returnGraph=TRUE),hplot(x,n=100,returnGraph=TRUE),titles=paste(c(10,25,50,100),"bins"))
Arguments:
x: a table of counts
Examples:
> chi.ind.test(table(rogaine))
Arguments:
y: numeric vector
x: categorical vector
ndigit = 1: rounding answer to 1 digit
var.equal = TRUE: assume equal variance
conf.level = 95: in the case of a categorical variable with 2 levels finds a 95% confidence interval for the difference in means
Examples:
> oneway(Length, Status)
Arguments:
y: numeric vector
x: categorical vector
Examples:
> kruskalwallis(Length, Status)
Arguments:
y: numeric vector
x, z: categorical vectors
with.interaction = TRUE: assume interaction is present (defaults to FALSE if there are no repeated measurements)
Examples:
> attach(gasoline)
> twoway( MPG , Gasoline , Automobile)
> twoway( MPG , Gasoline , Automobile, with.interaction="FALSE")
Arguments:
y: numeric vector
x : categorical vector
z : second categorical vector (Optional)
with.interaction = TRUE: assume interaction is present (defaults to FALSE if there are no repeated measurements)
which="first": do comparison for first categorical variable (x), or change to which="second" or which="interaction"
Examples:
> tukey(mothers[,2],mothers[,1])
> tukey( MPG , Gasoline , Automobile, which="first")
> tukey( MPG , Gasoline , Automobile, which="interaction")
Arguments:
y, x: numerical vectors
no.intercept = FALSE: fit intercept?
polydeg = 1: fit polynomial of higher degree?
show.tests=FALSE: if TRUE t tests for coefficients are shown
Examples:
> slr(wine[,3],wine[,2])
> slr(wine[,3],wine[,2],polydeg=2)
> slr(log(wine[,3]),wine[,2],polydeg=2)
Arguments:
same as slr
newx = x: predict for data values, or give other values for x
interval: either "PI" for prediction intervals or "CI" for confidence intervals
conf.level = 95
Examples:
> slr.predict(wine[,3],wine[,2],newx=c(2,2.5,3),interval="PI",conf.level=90)
Arguments:
y: numerical vector
x: numeric matrix with predictors in columns
show.tests=FALSE: if TRUE t tests for coefficients are shown
returnModel=FALSE, if TRUE fit object is returned (and can be used in other routines)
Examples:
> mlr( houseprice[,1] , houseprice[, -1] )
Arguments:
same as slr.predict but here x and newx are matrices
Examples:
> newx <- cbind(c(2000,2100,2200),rep(1,3),rep(2,3),rep(2,3))
> mlr.predict( houseprice[,1] , houseprice[, -1] ,newx=newx ,interval="PI")
Arguments:
same as mlr
Examples:
> mallows( houseprice[,1] , houseprice[, -1] )
Arguments:
y: numerical vector
x: numeric vector
z: categorical vector
additive = FALSE: if parallel lines set to TRUE
show.tests=FALSE: if TRUE t tests for coefficients are shown
Examples:
> dlr(salaries[,1],salaries[,2],salaries[,3])
> dlr(salaries[,1],salaries[,2],salaries[,3],additive=T)
Arguments:
same as slr.predict but also needs
newz: values of categorical variable for prediction
Examples:
> dlr.predict( salaries[, 1] , salaries[, 2] , salaries[, 3], newx=5 ,newz="Low", interval="PI")
Arguments:
z: categorical variable
NewOrder: can be a numeric vector specifying a certain order or a categorical vector with ordered values of z
Examples:
> attach(mothers)
> bplot(Length,Status)
> bplot(Length,changeOrder(Status,c(2,1,3)))
> bplot(Length,changeOrder(Status,c("Throughout","First Trimester","Drug Free")))