Introduction to ESMA 3102

This page discusses some general concepts of ESMA 3102.

3101 vs. 3102

In ESMA 3101 (3015) we were mainly concerned with answering questions about one variable at a time. We considered problems like these:

  • What is the average height of men in Puerto Rico? (Find the mean or median, or draw a histogram or boxplot, or find a confidence interval)

  • Are men in Puerto Rico on average taller than 5’10’’? (do a hypothesis test)

  • Has the average income in Puerto Rico gone up in the last ten years? (hypothesis test)

In ESMA 3102 we are going to study two (or more) variables simultaneously, and we are really interested in their relationships:

  • Is the average height of men in Puerto Rico different from men in the USA and from men in Europe?

  • How does the average height of men relate to things like their economic status (income), their race, their diet, et.

  • How does the average income in Puerto Rico depend on the economic policies of the Government?

Categorical vs. Quantitative Variables

We categorize variables as follows:

Quantitative

data is numeric, and arithmetic makes sense (adding, multiplying etc.)

Examples:

  1. Yearly income of a family in Puerto Rico

  2. Temperature in Mayaguez at 12 Noon

  3. Amount paid for the phone bill

Categorical

everything else

Examples:

  1. A students major

  2. in an experiment to grow wheat three different fertilizers were labeled 1,2 and 3

  3. Your student id number

Note: often whether a variable is categorical or quantitative depends on how (and how precisely) it is measured.

Example Our variable is “rain yesterday”

  • Did it rain at all yesterday? “Yes” or “No” → categorical

  • We put a cup outside. The cup has marks for each cubic inch of rain. Our data is the number of cubic inches. Values will be 0, 0.1, 0. 2 etc. → quantitiative

Categorical data comes in one of two versions - ordered or unordered:

Examples

  1. grades in a course: A, B, C, D, W - ordered

  2. gender: Male, Female - unordered

  3. Treatments in a clinical trial: A, B, C - unordered

  4. Treatments in a clinical trial: 1, 2, 3 - unordered

  5. blood pressure: low medium high - ordered

  6. directions: north east south west - unordered

One consequence of having an ordering is that it should be used in graphs, tables etc.

Recognizing what the type of your data is has to be the first thing you do with any data set. It will determine everything that you do later. Getting this wrong likely means anything you do is wrong.

For more on data types see page 32 of the textbook.

Predictor - Response Paradigm

It is often useful to think of the problems we discuss in this class as trying to use one (or more) variables to predict another

Predictor Response
Gender Grade in Course
Gender Income
GPA in high school, points on college GPA after the freshmen year in college
Whether fertilizer was used or not Yield of crop
Size of lot, size of house, number of bedrooms, quality of neighborhood Price of House

Types of Problems in 3102

Depending on the type of data we need to use different methods of analysis. Here is a table to help with this:

Predictor(s) Response Method
Categorical Categorical Categorical Data Analysis
All Categorical Categorical ANOVA
At least one quantitative Quantitative Correlation and Regression

Warning

This table maybe the most important item for you to learn - understand - memorize - use. Without it you can not pass this class, or do Statistics in real live!