This page discusses some general concepts of ESMA 3102.
In ESMA 3101 (3015) we were mainly concerned with answering questions about one variable at a time. We considered problems like these:
What is the average height of men in Puerto Rico? (Find the mean or median, or draw a histogram or boxplot, or find a confidence interval)
Are men in Puerto Rico on average taller than 5’10’’? (do a hypothesis test)
Has the average income in Puerto Rico gone up in the last ten years? (hypothesis test)
In ESMA 3102 we are going to study two (or more) variables simultaneously, and we are really interested in their relationships:
Is the average height of men in Puerto Rico different from men in the USA and from men in Europe?
How does the average height of men relate to things like their economic status (income), their race, their diet, et.
How does the average income in Puerto Rico depend on the economic policies of the Government?
We categorize variables as follows:
data is numeric, and arithmetic makes sense (adding, multiplying etc.)
Examples:
Yearly income of a family in Puerto Rico
Temperature in Mayaguez at 12 Noon
Amount paid for the phone bill
everything else
Examples:
A students major
in an experiment to grow wheat three different fertilizers were labeled 1,2 and 3
Your student id number
Note: often whether a variable is categorical or quantitative depends on how (and how precisely) it is measured.
Example Our variable is “rain yesterday”
Did it rain at all yesterday? “Yes” or “No” → categorical
We put a cup outside. The cup has marks for each cubic inch of rain. Our data is the number of cubic inches. Values will be 0, 0.1, 0. 2 etc. → quantitiative
Categorical data comes in one of two versions - ordered or unordered:
Examples
grades in a course: A, B, C, D, W - ordered
gender: Male, Female - unordered
Treatments in a clinical trial: A, B, C - unordered
Treatments in a clinical trial: 1, 2, 3 - unordered
blood pressure: low medium high - ordered
directions: north east south west - unordered
One consequence of having an ordering is that it should be used in graphs, tables etc.
Recognizing what the type of your data is has to be the first thing you do with any data set. It will determine everything that you do later. Getting this wrong likely means anything you do is wrong.
For more on data types see page 32 of the textbook.
It is often useful to think of the problems we discuss in this class as trying to use one (or more) variables to predict another
Predictor | Response |
---|---|
Gender | Grade in Course |
Gender | Income |
GPA in high school, points on college | GPA after the freshmen year in college |
Whether fertilizer was used or not | Yield of crop |
Size of lot, size of house, number of bedrooms, quality of neighborhood | Price of House |
Depending on the type of data we need to use different methods of analysis. Here is a table to help with this:
Predictor(s) | Response | Method |
---|---|---|
Categorical | Categorical | Categorical Data Analysis |
All Categorical | Categorical | ANOVA |
At least one quantitative | Quantitative | Correlation and Regression |
This table maybe the most important item for you to learn - understand - memorize - use. Without it you can not pass this class, or do Statistics in real live!