Module 10: Quantitative Analysis - Research for Everybody

Topic 1

Comparing Means: t-Tests

The t-test is one of the most commonly used statistical tests. It compares means to determine if there is a statistically significant difference. There are three types of t-tests, each for different research situations.

Types of t-Tests

One-Sample t-Test

Purpose: Compare a sample mean to a known or hypothesized population value

Research Question: Is the average IQ of students in our program different from 100?

H₀: μ = 100 (population mean equals 100)

H₁: μ ≠ 100 (population mean differs from 100)

When to Use:

One group measured once
Comparing to a known standard or benchmark
Comparing to a theoretical value

Independent Samples t-Test

Purpose: Compare means between two different (unrelated) groups

Research Question: Do males and females differ in math anxiety?

H₀: μ₁ = μ₂ (no difference between groups)

H₁: μ₁ ≠ μ₂ (groups differ)

When to Use:

Two separate groups
Different people in each group
Examples: treatment vs. control, male vs. female, experimental vs. comparison

Paired Samples t-Test

Purpose: Compare means from the same group at two different times or under two conditions

Research Question: Did anxiety decrease after the intervention?

H₀: μ_diff = 0 (no change from pre to post)

H₁: μ_diff ≠ 0 (significant change)

When to Use:

Same participants measured twice
Pre-test/post-test designs
Matched pairs (twins, matched controls)

Assumptions of t-Tests

Continuous DV

The dependent variable must be measured at interval or ratio level

Check: Nature of your variable

Independence

Observations are independent (one person's score doesn't affect another's)

Check: Study design—how were data collected?

Normality

Data should be approximately normally distributed

Check: Histogram, Q-Q plot, Shapiro-Wilk test

Less important with large samples (n > 30)

Homogeneity of Variance

Groups should have similar variances (for independent t-test)

Check: Levene's test

If violated, use Welch's t-test

Interpreting t-Test Output

Sample SPSS Output (Independent Samples t-Test)

Group	N	Mean	SD
Treatment	45	78.4	12.3
Control	42	71.2	11.8

t	df	p (2-tailed)	Mean Difference	95% CI
2.78	85	.007	7.2	[2.1, 12.3]

How to Interpret:

t-value (2.78): The test statistic; larger = bigger difference relative to variability
df (85): Degrees of freedom (n₁ + n₂ - 2 for independent t-test)
p-value (.007): Probability of this result if H₀ were true; p < .05 means significant
Mean difference (7.2): Treatment group scored 7.2 points higher
95% CI [2.1, 12.3]: We're 95% confident the true difference is between 2.1 and 12.3

Effect Size: Cohen's d

Cohen's d measures the magnitude of the difference in standard deviation units. It tells you HOW BIG the effect is, not just whether it's significant.

d = (M₁ - M₂) / SD_pooled

Interpreting Cohen's d:

d = 0.2	Small effect	Noticeable but not dramatic
d = 0.5	Medium effect	Moderate, meaningful difference
d = 0.8	Large effect	Substantial difference

Example Calculation:

M₁ = 78.4, M₂ = 71.2, SD_pooled ≈ 12.0

d = (78.4 - 71.2) / 12.0 = 7.2 / 12.0 = 0.60

Interpretation: Medium effect size

Reporting t-Test Results

APA Format:

"An independent-samples t-test was conducted to compare test scores between treatment and control groups. There was a significant difference in scores for the treatment group (M = 78.4, SD = 12.3) and the control group (M = 71.2, SD = 11.8); t(85) = 2.78, p = .007, d = 0.60. The treatment group scored significantly higher than the control group, with a medium effect size."

Always Include:

✓ Type of t-test used
✓ Means and SDs for each group
✓ t-value and degrees of freedom: t(df)
✓ Exact p-value (or < .001 if very small)
✓ Effect size (Cohen's d)
✓ Direction of the difference

                                    Common Mistakes with t-Tests
                                    Using independent t-test for paired data: If same people measured twice, use paired t-test
Running multiple t-tests: Comparing 3+ groups? Use ANOVA instead to avoid inflated Type I error
Ignoring assumptions: Check normality and equal variances
Forgetting effect size: Significance doesn't tell you how large the effect is

                                

Topic 2

Comparing Multiple Groups: ANOVA

Analysis of Variance (ANOVA) extends the t-test to compare means across three or more groups. Instead of running multiple t-tests (which inflates Type I error), ANOVA tests all groups simultaneously.

Why Not Multiple t-Tests?

The Problem

With 4 groups, you'd need 6 separate t-tests. Each test has 5% chance of Type I error (α = .05).

Overall error rate = 1 - (1-.05)⁶ = 1 - .74 = 26% chance of at least one false positive!

The Solution

ANOVA tests all groups in one analysis, maintaining α at .05 overall.

Types of ANOVA

One-Way ANOVA

Purpose: Compare means across 3+ groups on one independent variable

Example: Compare test scores across 4 teaching methods

1 IV (Teaching Method) → 1 DV (Test Score)

Two-Way ANOVA (Factorial)

Purpose: Examine effects of two independent variables AND their interaction

Example: Effect of teaching method AND gender on test scores

2 IVs (Method × Gender) → 1 DV (Test Score)

Tests: Main effect of Method, Main effect of Gender, Method × Gender interaction

Repeated Measures ANOVA

Purpose: Compare means when same participants measured 3+ times

Example: Test scores at baseline, 1 month, 3 months, 6 months

1 Within-Subjects IV (Time) → 1 DV (measured repeatedly)

The Logic of ANOVA

Partitioning Variance

Total Variance

Between-Group Variance

Differences due to group membership

Within-Group Variance

Differences within each group (error)

The F-Ratio

F = Between-Group Variance / Within-Group Variance

F = MS_between / MS_within

F ≈ 1: Variance between groups equals variance within groups → No group differences
F > 1: More variance between groups than within → Possible group differences
Large F: Strong evidence groups differ

Interpreting ANOVA Output

Sample ANOVA Table

Source	SS	df	MS	F	p	η²
Between Groups	1250.5	3	416.8	5.42	.002	.15
Within Groups (Error)	6920.3	90	76.9
Total	8170.8	93

SS (Sum of Squares): Total variability attributable to each source
df: Between = k-1 (groups-1); Within = N-k
MS (Mean Square): SS/df — the variance estimate
F: Test statistic; MS_between / MS_within
p: Significance; p < .05 means groups differ
η² (eta squared): Effect size — proportion of variance explained

Effect Size: Eta Squared (η²)

η² = SS_between / SS_total

Proportion of total variance explained by group membership

η² = .01	Small effect	1% of variance explained
η² = .06	Medium effect	6% of variance explained
η² = .14	Large effect	14% of variance explained

Post-Hoc Tests

A significant ANOVA tells you groups differ, but not WHICH groups differ. Post-hoc tests make pairwise comparisons while controlling for multiple testing.

Tukey's HSD

Most common; compares all pairs; controls Type I error well

Best for: Equal group sizes, all pairs of interest

Bonferroni

Conservative; divides α by number of comparisons

Best for: Few specific comparisons planned

Scheffé

Most conservative; allows complex comparisons

Best for: Complex comparisons, unequal n

Games-Howell

Doesn't assume equal variances

Best for: Unequal variances between groups

Reporting ANOVA Results

APA Format:

"A one-way ANOVA was conducted to examine the effect of teaching method on test scores. There was a statistically significant difference between groups, F(3, 90) = 5.42, p = .002, η² = .15. Post-hoc comparisons using Tukey's HSD indicated that Method A (M = 82.3, SD = 8.7) scored significantly higher than Method C (M = 71.2, SD = 9.1), p = .001. No other pairwise comparisons were significant."

ANOVA Assumptions

Independence: Observations are independent
Normality: DV is approximately normally distributed in each group
Homogeneity of variance: Groups have similar variances (Levene's test)

If violated: Consider Welch's ANOVA or Kruskal-Wallis (non-parametric)

Topic 3

Correlation Analysis

Correlation measures the strength and direction of the relationship between two variables. It tells you whether variables tend to move together and how strongly.

Pearson Correlation Coefficient (r)

Pearson's r measures the linear relationship between two continuous variables. It ranges from -1 to +1.

-1.0 -0.5 0 +0.5 +1.0

Perfect Negative No Relationship Perfect Positive

Interpreting Correlation Strength

r = .10 to .29

Weak/Small

r = .30 to .49

Moderate/Medium

r = .50 to 1.0

Strong/Large

Same guidelines apply to negative correlations (ignore the sign for strength)

Direction of Relationship

Positive Correlation

As X increases, Y increases

Example: Study hours ↑ = Test scores ↑

Negative Correlation

As X increases, Y decreases

Example: Stress ↑ = Sleep quality ↓

No Correlation

No systematic relationship

Example: Shoe size and IQ

Coefficient of Determination (r²)

r² (r-squared) tells you the proportion of variance in Y that is explained by X. Simply square the correlation coefficient.

Example:

If r = .50, then r² = .25

Interpretation: 25% of the variance in Y is explained by X

(75% is explained by other factors)

Other Correlation Types

Spearman's Rho (ρ or rs)

Use when:

Data is ordinal (ranked)
Data is not normally distributed
Relationship is monotonic but not linear

Example: Correlation between rank in class and satisfaction rating

Point-Biserial Correlation

Use when:

One variable is continuous
One variable is dichotomous (2 categories)

Example: Correlation between gender (M/F) and test score

Phi Coefficient (φ)

Use when:

Both variables are dichotomous

Example: Correlation between pass/fail and received tutoring (yes/no)

Reporting Correlations

APA Format:

"There was a strong positive correlation between study hours and test scores, r(98) = .62, p < .001. Students who studied more hours tended to score higher on the test. Study hours explained 38% of the variance in test scores (r² = .38)."

Correlation Matrix (APA Style):

Variable	1	2	3
1. Study Hours	—
2. Test Score	.62**	—
3. Anxiety	-.28*	-.45**	—

*p < .05. **p < .01.

Correlation ≠ Causation!

A correlation between X and Y does NOT mean X causes Y. There are three possibilities:

X causes Y: Study hours → Higher scores
Y causes X: Higher scores → More motivation to study (reverse causation)
Z causes both: Intelligence → More study hours AND higher scores (third variable)

Only experimental designs can establish causation!

Topic 4

Regression Analysis

Regression goes beyond correlation by creating a model to predict one variable from another. While correlation asks "Is there a relationship?", regression asks "Can we predict Y from X, and how well?"

Simple Linear Regression

Simple Linear Regression predicts a continuous dependent variable (Y) from a single independent variable (X) using a straight line.

The Regression Equation

Ŷ = b₀ + b₁X

Ŷ

Predicted value of Y

b₀

Y-intercept (value of Y when X = 0)

b₁

Slope (change in Y for each unit increase in X)

X

Value of the predictor variable

Interpreting Regression Output

Example: Predicting Test Score from Study Hours

Model	R	R²	Adjusted R²	Std. Error
1	.62	.38	.37	8.45

Coefficients	B	SE	β (Beta)	t	p
(Constant)	45.2	3.8	—	11.9	<.001
Study Hours	4.5	0.6	.62	7.5	<.001

How to Interpret:

R² = .38: Study hours explains 38% of variance in test scores
b₀ = 45.2: Predicted score with 0 study hours is 45.2
b₁ = 4.5: Each additional hour of study increases predicted score by 4.5 points
β = .62: Standardized coefficient (same as r for simple regression)
p < .001: Study hours is a significant predictor

Prediction Equation: Ŷ = 45.2 + 4.5(X)

If someone studies 5 hours: Ŷ = 45.2 + 4.5(5) = 45.2 + 22.5 = 67.7

Multiple Regression

Multiple Regression predicts Y from two or more independent variables. It shows the unique contribution of each predictor while controlling for others.

Ŷ = b₀ + b₁X₁ + b₂X₂ + b₃X₃ + ...

Example Output

Predictor	B	β (Beta)	p
(Constant)	52.3	—	<.001
Study Hours	3.8	.48	<.001
Prior GPA	8.2	.35	<.001
Anxiety	-2.1	-.22	.008

R² = .52 — All predictors together explain 52% of variance

Understanding Beta (β) Weights

Beta weights are standardized coefficients that allow comparison across predictors:

Study Hours (β = .48) is the strongest predictor
Prior GPA (β = .35) is the second strongest
Anxiety (β = -.22) has a negative effect (higher anxiety → lower scores)

Larger |β| = stronger relationship, controlling for other variables

Assumptions of Regression

Linearity

Relationship between X and Y is linear

Check: Scatter plot, residual plot

Independence

Residuals (errors) are independent

Check: Durbin-Watson test

Normality of Residuals

Residuals are normally distributed

Check: Histogram of residuals, Q-Q plot

Homoscedasticity

Residual variance is constant across X values

Check: Residual plot (should show even spread)

No Multicollinearity

Predictors are not too highly correlated (for multiple regression)

Check: VIF < 10, Tolerance > 0.1

Reporting Regression Results

APA Format:

"A multiple regression was performed to predict test scores from study hours, prior GPA, and anxiety. The model significantly predicted test scores, F(3, 96) = 34.7, p < .001, R² = .52. All three predictors were significant. Study hours (β = .48, p < .001), prior GPA (β = .35, p < .001), and anxiety (β = -.22, p = .008) all contributed significantly to the model. For each additional hour of study, test scores increased by 3.8 points, controlling for GPA and anxiety."

                                    Regression vs. Correlation
                                    
                                                Correlation
                                                Regression
                                            
                                                Measures strength of relationship
                                                Creates prediction model
                                            
                                                Symmetric (X with Y = Y with X)
                                                Directional (X predicts Y)
                                            
                                                Single coefficient (r)
                                                Equation with slope and intercept
                                            
                                                Two variables only
                                                Can have multiple predictors

Correlation	Regression
Measures strength of relationship	Creates prediction model
Symmetric (X with Y = Y with X)	Directional (X predicts Y)
Single coefficient (r)	Equation with slope and intercept
Two variables only	Can have multiple predictors

Topic 5

Choosing the Right Test

Selecting the appropriate statistical test depends on your research question, the number and type of variables, and the design of your study. This decision tree and summary table will guide you to the right test.

Key Questions to Ask

1

What is your research goal?

Compare groups (differences)
Examine relationships (associations)
Predict outcomes (prediction)

2

What type is your dependent variable (DV)?

Continuous (interval/ratio)
Categorical (nominal/ordinal)

3

How many groups or variables?

1 group, 2 groups, or 3+ groups
1 IV, 2+ IVs

4

What is the study design?

Between-subjects (independent groups)
Within-subjects (repeated measures)

Statistical Test Decision Guide

Comparing Groups (Differences)

Groups	Design	DV Type	Parametric Test	Non-parametric Alternative
1 group vs. known value	—	Continuous	One-sample t-test	Wilcoxon signed-rank
2 groups	Independent	Continuous	Independent t-test	Mann-Whitney U
2 groups	Repeated/Paired	Continuous	Paired t-test	Wilcoxon signed-rank
3+ groups	Independent	Continuous	One-way ANOVA	Kruskal-Wallis
3+ groups	Repeated	Continuous	Repeated measures ANOVA	Friedman test
2+ groups × 2+ factors	Independent	Continuous	Factorial ANOVA	—
2 groups	Independent	Categorical	Chi-square test	Fisher's exact test

Relationships and Prediction

Goal	Variables	Parametric Test	Non-parametric Alternative
Relationship	2 continuous	Pearson correlation (r)	Spearman's rho
Relationship	2 categorical	Chi-square / Phi	—
Relationship	1 continuous + 1 dichotomous	Point-biserial correlation	—
Prediction	1 continuous IV → 1 continuous DV	Simple linear regression	—
Prediction	Multiple IVs → 1 continuous DV	Multiple regression	—
Prediction	Multiple IVs → 1 categorical DV	Logistic regression	—

Visual Decision Tree

What do you want to do?

Compare Groups

How many groups?

1 group

→ One-sample t-test

2 groups

Independent → Independent t-test

Paired → Paired t-test

3+ groups

→ ANOVA

Examine Relationship

Variable types?

Both continuous

→ Pearson correlation

Both categorical

→ Chi-square

Predict Outcome

DV type?

Continuous DV

→ Linear regression

Categorical DV

→ Logistic regression

When to Use Non-Parametric Tests

Use non-parametric tests when:

Data is ordinal (ranked)
Data is severely non-normal (and sample is small)
Sample size is very small (n < 20)
Outliers are present and cannot be addressed
Assumptions of parametric tests are violated

Trade-off:

Non-parametric tests make fewer assumptions but have less statistical power (less likely to detect a real effect). When assumptions are met, parametric tests are preferred.

Quick Reference Checklist

Before running any test, check:

☐ Research question clearly defined
☐ Variables correctly identified (IV, DV)
☐ Measurement level of each variable known
☐ Study design understood (independent vs. repeated)
☐ Assumptions checked (normality, homogeneity)
☐ Sample size adequate
☐ Effect size will be reported

Summary

Module 10 Key Takeaways

What You've Learned

T-tests compare means between two groups (independent) or conditions (paired); always report effect size (Cohen's d)
ANOVA compares means across 3+ groups; post-hoc tests identify which groups differ
Correlation (r) measures relationship strength and direction; remember correlation ≠ causation
Regression creates prediction models; R² indicates variance explained; beta weights show relative importance
Choosing the right test depends on research goal, number of groups, variable types, and study design

Next Steps

In Module 11: Qualitative Analysis, you'll learn techniques for analyzing non-numerical data including thematic analysis, coding strategies, and ensuring rigor in qualitative research.

Continue to Module 11

Practice

Quantitative Analysis Practice

Applied Analysis Tasks

Test Selection: For each scenario, identify the appropriate test:
- Comparing anxiety levels before and after therapy (same participants)
- Comparing job satisfaction across 5 departments
- Examining if GPA predicts starting salary
- Comparing pass rates between two schools
Output Interpretation: Given t(45) = 2.34, p = .024, d = 0.52:
- Is the result significant at α = .05?
- What is the effect size and how would you describe it?
- Write an APA-style sentence reporting this result
Correlation Practice: If r = -.45, p = .003:
- What is the direction of the relationship?
- What is the strength?
- What percentage of variance is explained (r²)?
Regression Prediction: Given Ŷ = 30 + 5X:
- What is the predicted Y when X = 8?
- What does the slope tell us?
- What is the Y-intercept and what does it mean?