Module 05: Sampling Methods - Research for Everybody

Topic 1

Introduction to Sampling

Sampling is the process of selecting a subset of individuals from a larger population to study. Since it's usually impossible to study entire populations, researchers use carefully selected samples to make inferences about populations. Understanding sampling is essential for conducting valid research and interpreting findings appropriately.

Key Terminology

Population

Definition: The entire group you want to study and draw conclusions about

Examples:

All university students in Thailand
All patients with diabetes worldwide
All small businesses in Bangkok
All high school teachers in the United States

Note: Populations can be finite (countable) or infinite (theoretical)

Sample

Definition: A subset of the population that you actually study

Examples:

500 university students from 5 Bangkok universities
200 diabetes patients from two hospitals
50 small businesses in the Sukhumvit area
300 high school teachers from California

Goal: Sample should be representative of the population

Sampling Frame

Definition: A list of all elements in the population from which you'll draw your sample

Examples:

Student enrollment database
Hospital patient registry
Business registration list
Telephone directory
Email list of organization members

Important: Sampling frame may not perfectly match the target population

Sampling Unit

Definition: The element or set of elements considered for selection at each stage

Examples:

Individual persons
Households
Schools or classrooms
Organizations
Geographic areas

Parameter

Definition: A numerical characteristic of the population (what you want to know)

Examples:

Population mean (μ)
Population proportion (p)
Population standard deviation (σ)

Note: Usually unknown—that's why we sample!

Statistic

Definition: A numerical characteristic of the sample (what you calculate from data)

Examples:

Sample mean (x̄)
Sample proportion (p̂)
Sample standard deviation (s)

Goal: Use statistics to estimate parameters

Why We Sample

Cost-Effective

Studying entire populations is usually prohibitively expensive. Sampling dramatically reduces costs while still providing accurate estimates.

Example: Rather than surveying all 5 million university students in a country, survey 1,000 students for reliable estimates at a fraction of the cost.

Time-Efficient

Collecting data from entire populations takes too long. Sampling allows timely completion of research projects.

Example: Election polls sample voters to predict outcomes quickly rather than waiting for everyone to vote.

Practical Feasibility

Some populations are impossible to access completely or don't have complete lists.

Example: No complete list exists of all people with depression, so sampling from accessible sources is necessary.

Sometimes More Accurate

Well-designed samples with careful data collection can be more accurate than careless population censuses.

Example: A careful sample survey with high response rates may be more accurate than a census with many non-responses.

Destructive Testing

When testing destroys the item, you must sample rather than test everything.

Example: Testing battery life requires using batteries until they die—can't test every battery produced!

Infinite Populations

Some populations are theoretical and infinite, making sampling the only option.

Example: All possible measurements of a physical constant or all potential outcomes of a process.

When NOT to Sample: Census

Use a Census (Study Everyone) When:

Population is small: With only 30 people in your department, just survey everyone
Resources allow: You have sufficient time and money for complete enumeration
Precision required: Need exact counts, not estimates (e.g., organizational records)
Political/ethical reasons: Everyone should have opportunity to participate (e.g., employee satisfaction surveys)
High variability: Population so diverse that large sample would be needed anyway

Representative Samples

What Makes a Sample Representative?

A representative sample accurately reflects the characteristics of the population it's drawn from. Key characteristics should be present in the sample in the same proportions as in the population.

Example: University Student Population

Characteristic	Population	Representative Sample	Biased Sample
Gender	55% Female, 45% Male	54% Female, 46% Male ✓	70% Female, 30% Male ✗
Year Level	30% 1st, 25% 2nd, 25% 3rd, 20% 4th	29% 1st, 26% 2nd, 24% 3rd, 21% 4th ✓	50% 1st, 30% 2nd, 15% 3rd, 5% 4th ✗
Major	40% STEM, 35% Social Sci, 25% Humanities	39% STEM, 36% Social Sci, 25% Humanities ✓	60% STEM, 25% Social Sci, 15% Humanities ✗

Warning: Convenience Samples Are Rarely Representative

Just because a sample is large doesn't mean it's representative. A sample of 10,000 people recruited from social media may be less representative than a carefully selected random sample of 400 people.

Two Main Sampling Approaches

Probability Sampling

Every member of the population has a known, non-zero chance of being selected

Characteristics:

Random selection
Known probability of selection
Allows generalization to population
Can calculate sampling error
More rigorous and defensible

Use when: Generalizability is important and you have access to sampling frame

Gold standard for quantitative research

Non-Probability Sampling

Not every member has a known or equal chance of being selected

Characteristics:

Non-random selection
Unknown probability of selection
Limited generalizability
Cannot calculate sampling error statistically
More practical and accessible

Use when: Exploratory research, hard-to-reach populations, qualitative studies, or practical constraints prevent probability sampling

Very common in practice, especially qualitative research

Choosing Between Probability and Non-Probability

The choice depends on:

Research goals: Need to generalize to population or explore in-depth?
Resources: Time, money, and access to sampling frame
Population characteristics: Accessible or hard-to-reach?
Research design: Quantitative hypothesis-testing or qualitative exploration?

Remember: Non-probability sampling doesn't mean bad sampling. It's appropriate for many research purposes—just be clear about limitations.

Topic 2

Probability Sampling Methods

Probability sampling methods use random selection to ensure every member of the population has a known chance of being included. These methods allow you to calculate sampling error and make statistical inferences about the population. Understanding when and how to use each method is essential for rigorous quantitative research.

1. Simple Random Sampling (SRS)

Simple Random Sampling

Most Basic

Definition: Every member of the population has an equal and independent chance of being selected.

How to Do It:

Obtain complete sampling frame (list of all population members)
Assign each member a unique number
Use random number generator or table to select sample
Contact and recruit selected individuals

Example:

Population: 5,000 students at a university

Process:

Obtain list of all 5,000 students from registrar
Number them 0001 to 5000
Use random number generator to select 400 numbers
Survey the 400 selected students

Advantages:

Simple to understand and implement
No bias in selection process
Allows calculation of sampling error
Results generalizable to population

Disadvantages:

Requires complete sampling frame
May be geographically dispersed (expensive to reach)
May not capture rare subgroups
Can be inefficient for heterogeneous populations

When to Use:

Complete sampling frame available
Population relatively homogeneous
Resources allow reaching dispersed sample
No important subgroups that need guaranteed representation

2. Systematic Random Sampling

Systematic Random Sampling

Efficient

Definition: Select every kth element from a list after a random start.

How to Do It:

Calculate sampling interval: k = N/n (population size / desired sample size)
Randomly select starting point between 1 and k
Select every kth element from that point

Example:

Population: 5,000 students; Desired sample: 500

Process:

k = 5,000 / 500 = 10
Randomly select start between 1-10, say 7
Select students #7, 17, 27, 37, 47... until 500 selected

Advantages:

Simpler than simple random sampling
More efficient, especially with physical lists
Ensures spread across population
Works well when list is random order

Disadvantages:

Risk of periodicity in list
Less random than SRS if list has patterns
All elements don't have independent chance
Requires list in random or neutral order

Watch Out for Periodicity!

Problem: If the list has periodic patterns that align with your sampling interval, bias occurs.

Example: List of apartments organized by building with 10 units per building, all 1st units corner units. If k=10, you'd sample only corner units!

Solution: Examine list for patterns before selecting k

3. Stratified Random Sampling

Stratified Random Sampling

Precise

Definition: Divide population into homogeneous subgroups (strata), then randomly sample from each stratum.

How to Do It:

Identify stratification variable (e.g., gender, age group, region)
Divide population into mutually exclusive strata
Determine sample size for each stratum
Randomly sample within each stratum

Two Types:

Proportionate Stratified Sampling

Sample from each stratum proportional to its size in population

Example: University with 60% undergrads, 40% graduates

Sample 500 students: 300 undergrads (60%), 200 graduates (40%)

Disproportionate Stratified Sampling

Oversample small but important subgroups

Example: University with 95% domestic, 5% international students

Sample 400 students: 200 domestic, 200 international (equal groups for comparison)

Note: Use weights in analysis to adjust for oversampling

Detailed Example:

Research: Student stress levels across different majors

Stratum (Major)	Population N	% of Total	Sample n
Engineering	2,000	40%	200
Business	1,500	30%	150
Arts	1,000	20%	100
Sciences	500	10%	50
Total	5,000	100%	500

Randomly select students within each major to reach target numbers

Advantages:

More precise than simple random sampling
Ensures representation of important subgroups
Allows comparison between strata
Can improve efficiency and reduce sampling error
Particularly useful for heterogeneous populations

Disadvantages:

Requires information about population to stratify
More complex to administer
Need separate sampling frame for each stratum
Must choose stratification variables carefully

When to Use:

Important subgroups exist that must be represented
Population is heterogeneous
Want to make comparisons between groups
Information about strata is available
Need more precision than simple random sampling

4. Cluster Sampling

Cluster Sampling

Practical

Definition: Divide population into clusters (groups), randomly select clusters, then study all members within selected clusters or sample within clusters.

How to Do It:

Divide population into naturally occurring clusters
Randomly select a sample of clusters
Either:
- One-stage: Study all members in selected clusters
- Two-stage: Randomly sample members within selected clusters

Cluster vs. Stratified Sampling

Stratified Sampling

Strata are homogeneous within
Sample from ALL strata
Increases precision
Goal: represent diversity

Cluster Sampling

Clusters are heterogeneous within
Sample only SOME clusters
Increases efficiency
Goal: reduce costs

Example: One-Stage Cluster Sampling

Population: Students in 50 schools across a country

Process:

Schools are clusters
Randomly select 10 of the 50 schools
Survey ALL students in those 10 schools

Example: Two-Stage Cluster Sampling

Population: Households in a city

Process:

Stage 1: Randomly select 30 neighborhoods (clusters)
Stage 2: Randomly select 20 households within each selected neighborhood
Total sample: 30 × 20 = 600 households

Advantages:

Very cost-effective and practical
Doesn't require complete sampling frame for population
Concentrates data collection geographically
Useful when population naturally grouped
Easier to administer than dispersed samples

Disadvantages:

Less precise than SRS (higher sampling error)
Clusters may not represent full population diversity
Design effect reduces effective sample size
Requires larger sample for same precision
Complex statistical analysis needed

When to Use:

Population spread over wide geographic area
Complete sampling frame unavailable but cluster list available
Cost or time constraints significant
Natural clusters exist (schools, organizations, neighborhoods)
Efficiency more important than precision

5. Multistage Sampling

Multistage Sampling

Complex

Definition: Combines multiple sampling methods across several stages, typically starting with large units and progressively selecting smaller units.

Example: National Survey

Goal: Survey households across Thailand

Stage 1: Regions

Stratify by region (North, Northeast, Central, South)

Select proportional number of provinces from each region

↓

Stage 2: Districts

Randomly select districts within chosen provinces

↓

Stage 3: Sub-districts

Randomly select sub-districts within chosen districts

↓

Stage 4: Households

Randomly select households within chosen sub-districts

↓

Stage 5: Individuals

Select one adult per household using Kish grid

Advantages:

Highly practical for large populations
Can combine strengths of different methods
Flexible and adaptable
Only need sampling frames at final stage

Disadvantages:

Complex design and administration
Complex statistical analysis
Sampling error accumulates across stages
Requires careful documentation

Choosing the Right Probability Method

Use Simple Random Sampling when: You have complete frame, homogeneous population, resources for dispersed sample

Use Stratified Sampling when: Important subgroups exist, population heterogeneous, want comparisons between groups

Use Cluster Sampling when: Wide geographic spread, limited resources, natural clusters exist

Use Multistage Sampling when: Very large population, no complete frame, need to balance precision and cost

Topic 3

Non-Probability Sampling Methods

Non-probability sampling methods don't involve random selection, meaning not everyone in the population has a known chance of being included. While these methods don't allow statistical generalization to populations, they're often more practical, less expensive, and appropriate for many research purposes—especially exploratory studies and qualitative research.

                                    When Non-Probability Sampling is Appropriate
                                    Qualitative research focused on depth rather than generalization
Exploratory studies generating hypotheses
Hard-to-reach or hidden populations
Pilot studies or pretesting instruments
Time or resource constraints prevent probability sampling
Complete sampling frame unavailable

                                

1. Convenience Sampling

Convenience Sampling

Most Common

Definition: Select participants who are easiest to reach or most readily available.

Examples:

Surveying students in your class
Recruiting participants from social media followers
Interviewing patients at one clinic
Stopping people at shopping mall
Using volunteers who respond to advertisement
Studying your own organization or workplace

Advantages:

Quick and easy
Inexpensive
Requires minimal planning
Useful for pilot studies
Good for initial exploration

Disadvantages:

High risk of bias
Least credible method
Cannot generalize results
Sample likely unrepresentative
Unknown sampling error

Critical Limitation

Convenience samples are almost always biased. People who are easy to reach often differ systematically from those who aren't. Use only when generalization isn't important or as preliminary research. Never claim your convenience sample represents the population.

2. Purposive (Judgmental) Sampling

Purposive Sampling

Strategic

Definition: Researcher deliberately selects participants based on specific characteristics relevant to the research question.

Types of Purposive Sampling:

1. Expert Sampling

Select individuals with specific expertise or experience

Example: Interviewing experienced surgeons about a new surgical technique

2. Extreme/Deviant Case Sampling

Focus on unusual or special cases

Example: Studying highly successful startups to identify success factors

3. Typical Case Sampling

Select average or normal cases

Example: Studying middle-performing schools to understand typical challenges

4. Maximum Variation Sampling

Capture wide range of perspectives and experiences

Example: Including teachers from urban, suburban, and rural schools of varying sizes

5. Homogeneous Sampling

Select similar cases to reduce variation

Example: Only first-generation college students for focused study

6. Critical Case Sampling

Select cases that make a point dramatically

Example: "If it works here, it will work anywhere" or vice versa

Advantages:

Targets information-rich cases
Efficient use of limited resources
Appropriate for qualitative research
Flexible and adaptable
Can focus on specific characteristics

Disadvantages:

Researcher bias in selection
Cannot generalize statistically
Difficult to defend selection
May miss important perspectives

When to Use:

Qualitative research seeking deep understanding
Studying specialized or rare populations
Need participants with specific characteristics
Exploratory research
Resource constraints limit sample size

3. Quota Sampling

Quota Sampling

Structured

Definition: Divide population into categories and set quotas for each, then use convenience sampling within each category until quotas filled.

How to Do It:

Identify important characteristics for representation
Determine proportions in population
Set quotas matching these proportions
Recruit participants non-randomly until quotas met

Example:

Population: University with 60% female, 40% male students

Target sample: 200 students

Quotas: 120 females, 80 males

Process: Recruit students on campus (convenience) until reaching 120 females and 80 males

Quota vs. Stratified Sampling

Similarities:

Both divide population into categories
Both sample from each category
Both ensure representation

Key Difference:

Stratified: Random selection within strata (probability sampling)

Quota: Non-random selection within quotas (non-probability sampling)

Advantages:

Ensures subgroup representation
More representative than simple convenience
Easier and cheaper than stratified random
No sampling frame needed
Quick data collection

Disadvantages:

Still biased within quotas
Cannot calculate sampling error
Selection bias remains
Not as rigorous as stratified random

4. Snowball Sampling

Snowball Sampling

Network-Based

Definition: Existing participants recruit future participants from their networks. Sample "snowballs" as each participant refers others.

How to Do It:

Identify few initial participants (seeds)
Interview/survey initial participants
Ask them to refer others who meet criteria
Contact referred individuals
Repeat process until desired sample size reached

Example:

Research: Experiences of undocumented immigrants

Challenge: No list exists, population hidden due to legal concerns

Solution:

Find initial participant through community organization
After interview, ask if they know others willing to participate
Each participant refers 2-3 others
Continue until 30 interviews completed

Advantages:

Reaches hidden or hard-to-access populations
Builds trust through referrals
Low cost
Sometimes the only feasible method
Leverages social networks

Disadvantages:

High sampling bias (network homogeneity)
Sample lacks independence
May miss isolated individuals
Cannot generalize
Slow sample accumulation
Ethical concerns about privacy

When to Use:

Studying hidden or stigmatized populations
No sampling frame available
Population characterized by strong networks
Gaining access requires insider trust

Common with: Drug users, homeless populations, rare disease patients, elite groups, illegal activities

5. Volunteer/Self-Selection Sampling

Volunteer Sampling

Participant-Driven

Definition: Participants volunteer themselves in response to a general invitation.

Examples:

Online surveys shared on social media
Email invitations to organization members
Recruitment flyers posted on campus
Call for participants on websites
Amazon MTurk, Prolific platforms

Advantages:

Very easy to implement
Low cost, especially online
Can reach large numbers quickly
Participants motivated
Ethical (no coercion)

Disadvantages:

Extreme self-selection bias
Volunteers differ from non-volunteers
Cannot control who responds
May attract people with strong opinions
Least generalizable method

Volunteer Bias

Research shows volunteers systematically differ from non-volunteers: they tend to be more educated, higher socioeconomic status, more social, more interested in the topic, and often have stronger opinions. Never assume volunteer samples represent the general population.

6. Theoretical Sampling (for Qualitative Research)

Theoretical Sampling

Grounded Theory

Definition: Data collection driven by emerging theory. Researcher decides who to sample next based on what's been learned from previous participants.

Process:

Collect initial data
Analyze for emerging concepts
Identify gaps or questions
Sample next participant to address these gaps
Continue until theoretical saturation (no new insights)

Example:

Research: How people cope with chronic illness

Interview 3 recently diagnosed patients → identify "denial" theme
Sample long-term patients → discover "acceptance" process
Sample patients who struggled vs. adjusted well → identify coping strategies
Sample family members → understand support systems
Continue until no new themes emerge

When to Use:

Grounded theory studies
Theory development from data
Iterative qualitative research
Unknown what's important initially

Choosing Non-Probability Methods

Use Convenience when: Pilot testing, quick exploration, severe constraints

Use Purposive when: Specific expertise needed, information-rich cases, qualitative research

Use Quota when: Need representation of subgroups but can't do random sampling

Use Snowball when: Hidden populations, network-based communities, trust needed

Use Volunteer when: Online surveys, broad recruitment, motivated participants acceptable

Use Theoretical when: Building theory from data, iterative qualitative analysis

Reporting Non-Probability Samples

Always be transparent about:

How participants were selected
Why this method was chosen
Limitations for generalizability
How sample characteristics may affect findings

Never imply that non-probability samples represent the population statistically. Use terms like "participants," "respondents," or "this sample" rather than "the population."

Topic 4

Sample Size Determination

Determining appropriate sample size is crucial for research validity and efficiency. Too small a sample lacks statistical power to detect real effects, while too large wastes resources. This topic covers factors affecting sample size decisions and methods for calculating required sample sizes for different research designs.

Why Sample Size Matters

Too Small

Insufficient statistical power
May miss real effects (Type II error)
Wide confidence intervals
Results unreliable
Unethical waste of participants' time

Just Right

Adequate statistical power
Can detect meaningful effects
Reasonable confidence intervals
Reliable results
Efficient use of resources

Too Large

Wastes time and money
May find trivial effects significant
Unnecessary participant burden
Delayed completion
Ethical concerns about efficiency

Factors Affecting Required Sample Size

1

Effect Size

Definition: The magnitude of the difference or relationship you expect to find

Cohen's conventions:

Small effect: d = 0.2 (subtle, requires large sample)
Medium effect: d = 0.5 (moderate, typical in social sciences)
Large effect: d = 0.8 (substantial, requires smaller sample)

Principle: Smaller effects require larger samples to detect

Example: To detect small effect (d=0.2), need ~800 per group. For large effect (d=0.8), need ~50 per group.

2

Statistical Power

Definition: Probability of detecting an effect when it truly exists (1 - β)

Minimum acceptable: .80 (80% chance of detecting effect)
Preferred: .90 (90% chance)
Very high: .95 (95% chance)

Principle: Higher desired power requires larger samples

Trade-off: Power of .80 is convention, balancing rigor with practicality. Higher power is better but requires more resources.

3

Significance Level (α)

Definition: Probability of Type I error (false positive)

Standard: α = .05 (5% false positive rate)
Conservative: α = .01 (1% false positive rate)
Liberal: α = .10 (10% false positive rate)

Principle: More stringent alpha (lower value) requires larger samples

4

Population Variability

Definition: How much individuals differ from each other on your variable

Principle: More variable populations require larger samples for precise estimates

Example: Age of elementary students (narrow range, less variance) needs smaller sample than age of all university students (wide range, more variance)

5

Number of Variables/Groups

Definition: Complexity of your analysis

Simple comparison: 2 groups → ~100 total
Multiple groups: 4 groups → ~200 total
Multiple regression: 10 predictors → ~150+ cases
Factor analysis: Need 5-10 cases per variable

Rule of thumb: More complex analyses need larger samples

6

Expected Response/Attrition Rate

Definition: Proportion who will complete the study

Principle: Account for non-response and dropout

Calculation:

If you need 200 completers and expect 20% attrition:

Recruit: 200 / 0.80 = 250 participants

Sample Size Formulas

1. Sample Size for Estimating a Population Mean

Formula:

n = (Z² × σ²) / E²

Where:

n = required sample size
Z = Z-score for desired confidence level (1.96 for 95%)
σ = population standard deviation (estimate from pilot or literature)
E = desired margin of error

Example:

Goal: Estimate average GPA within ±0.1 points, 95% confidence

Known: σ ≈ 0.5 (from previous research)

Calculation:

n = (1.96² × 0.5²) / 0.1² = (3.84 × 0.25) / 0.01 = 0.96 / 0.01 = 96 students

2. Sample Size for Estimating a Population Proportion

Formula:

n = (Z² × p × (1-p)) / E²

Where:

n = required sample size
Z = Z-score for desired confidence level
p = estimated proportion (use 0.5 if unknown, gives largest sample)
E = desired margin of error

Example:

Goal: Estimate % of students who smoke within ±3%, 95% confidence

Estimated: p ≈ 0.15 (15% from literature)

Calculation:

n = (1.96² × 0.15 × 0.85) / 0.03² = (3.84 × 0.1275) / 0.0009 = 0.49 / 0.0009 = 544 students

3. Sample Size for Comparing Two Means (t-test)

Simplified Formula:

n per group ≈ 16/d²

For: α = .05, power = .80, two-tailed test

d = effect size (Cohen's d)

Quick Reference Table:

Effect Size	n per Group	Total N
Small (d = 0.2)	400	800
Medium (d = 0.5)	64	128
Large (d = 0.8)	25	50

Sample Size for Qualitative Research

Qualitative research doesn't use statistical formulas. Instead, sample size depends on:

1. Theoretical Saturation

Continue sampling until no new themes or insights emerge

Typical: 15-60 participants for interviews, 5-25 for focus groups

2. Research Design

Phenomenology: 5-25 participants
Grounded theory: 20-60 participants
Ethnography: Extended engagement with community
Case study: 1-4 cases

3. Depth vs. Breadth

Homogeneous sample: Fewer participants needed
Diverse sample: More participants to capture variety
In-depth interviews: Fewer participants
Brief surveys: More participants

Practical Guidelines and Rules of Thumb

Minimum Samples by Analysis Type:

Correlations: Minimum 30 for stable estimates
t-tests/ANOVA: Minimum 20-30 per group
Multiple regression: 10-15 cases per predictor
Factor analysis: 5-10 cases per variable, minimum 100
Structural equation modeling: 200-400 minimum
Chi-square: Expected frequency ≥5 in each cell

Survey Research Rules:

Small populations (<100): Census (everyone)
Medium populations (100-1000): 30-50%
Large populations (>1000): 10-20%, minimum 400
National surveys: 1000-1500 for ±3% margin of error

Use Online Sample Size Calculators

Rather than calculating by hand, use free online tools:

G*Power: Free software for power analysis (highly recommended)
Raosoft: Quick sample size calculator
SurveyMonkey: Sample size calculator
Sample Size Calculator (ClinCalc): For clinical studies

Always report the method and assumptions used for your sample size calculation!

When You Can't Achieve Ideal Sample Size

Reality often prevents reaching calculated sample sizes. If this happens:

Document the constraint honestly
Report achieved power (post-hoc power analysis)
Acknowledge as limitation
Be cautious interpreting null results
Consider pilot study or smaller scope
Focus on effect sizes, not just p-values

Remember: A well-executed small study is better than a poorly executed large study!

Topic 5

Sampling Errors and Bias

Understanding sampling errors and bias is crucial for conducting valid research and interpreting findings appropriately. This topic distinguishes between random sampling error (inevitable and quantifiable) and sampling bias (systematic and preventable), and provides strategies for minimizing both.

Two Types of Error

Sampling Error (Random)

Inevitable

Definition: Random variation between sample statistics and population parameters due to chance in selection process.

Characteristics:

Occurs in all samples
Due to chance, not researcher error
Random across samples
Can be quantified (confidence intervals, standard errors)
Decreases with larger sample sizes
Expected and acceptable

Example:

If true population mean GPA is 3.0, different random samples might yield means of 2.95, 3.02, 3.01, 2.98—close but not exact.

How to Manage:

Increase sample size
Use probability sampling
Report confidence intervals
Accept as inherent to sampling

Sampling Bias (Systematic)

Problematic

Definition: Systematic deviation from the true population parameter due to flawed sampling methods.

Characteristics:

Systematic, not random
Due to flawed methods
Consistently in one direction
Cannot be quantified easily
Doesn't decrease with larger samples
Threatens validity

Example:

If you survey only students in the library (who tend to be more studious), every sample will overestimate average GPA—not due to chance, but systematic selection.

How to Prevent:

Use probability sampling
Ensure complete sampling frame
Maximize response rates
Be aware of systematic exclusions

Key Distinction

Sampling Error: "My sample mean is 3.02 instead of exactly 3.00" → Acceptable

Sampling Bias: "My sampling method systematically excludes low-performers, so I'm always too high" → Problem!

Important: Increasing sample size reduces sampling error but does NOT fix sampling bias!

Types of Sampling Bias

1. Selection Bias

Definition: Certain members of population systematically more likely to be selected than others

Examples:

Convenience sampling: Only studying accessible people
Volunteer bias: Only motivated people participate
Snowball sampling: Only networked individuals included
1936 Literary Digest poll: Surveyed phone/car owners (wealthy), wrongly predicted election

Prevention:

Use probability sampling methods
Ensure all population members have chance of selection
Avoid self-selection

2. Non-Response Bias

Definition: Selected participants don't respond, and non-responders differ systematically from responders

Examples:

Mailed surveys: Only motivated people return (often 20-30% response rate)
Online surveys: Only frequent internet users respond
Phone surveys: People screen calls, busy people don't answer
Satisfaction surveys: Extremely satisfied or dissatisfied respond more

Impact by Response Rate:

80%+ response: Minimal concern
60-80%: Some concern, check for bias
40-60%: Significant concern, definitely check
<40%: Major concern, likely biased

Prevention/Mitigation:

Multiple contact attempts
Incentives for participation
Make participation convenient
Follow-up reminders
Compare early vs. late responders
Compare respondents to population data
Report response rates transparently

3. Sampling Frame Error

Definition: Sampling frame doesn't accurately represent target population

Examples:

Undercoverage: Phone directory misses cell-only users
Overcoverage: List includes people outside target population
Outdated list: Alumni database hasn't been updated in years
Incomplete list: Not all population members on list

Prevention:

Use most complete, current frame available
Update lists before sampling
Screen participants for eligibility
Use multiple overlapping frames
Document frame limitations

4. Survivorship Bias

Definition: Only studying "survivors" (those who made it through selection process), missing those who didn't

Examples:

College student studies: Miss those who dropped out
Business success studies: Only study successful companies, ignore failures
Treatment effectiveness: Only follow-up those who completed treatment
WWII planes: Reinforced areas with bullet holes, but should reinforce where survivors weren't hit (fatal areas)

Prevention:

Include dropouts/failures in analysis
Track attrition carefully
Use intent-to-treat analysis
Consider what's missing

5. Undercoverage Bias

Definition: Some population segments systematically excluded

Examples:

Online surveys: Exclude those without internet
Daytime surveys: Miss working people
English-only surveys: Exclude non-English speakers
Landline surveys: Miss young adults (cell-phone only)

Prevention:

Use multiple recruitment methods
Offer multiple response formats
Translate materials
Adjust sampling to reach underrepresented groups

Detecting Sampling Bias

Methods to Identify Bias in Your Sample:

1. Compare Sample to Known Population

Compare your sample demographics to known population statistics

Characteristic	Population	Your Sample	Assessment
Female	55%	53%	✓ Similar
Age 18-22	60%	45%	⚠ Underrepresented
STEM major	40%	65%	✗ Overrepresented

2. Early vs. Late Responders

Compare those who responded immediately to those who required follow-up

Logic: Late responders are more similar to non-responders

If significant differences exist: Non-response bias likely present

3. Follow-Up with Non-Responders

Intensively pursue small sample of non-responders to compare characteristics

Methods: Higher incentives, phone calls, in-person visits

4. Check for Logical Inconsistencies

Unusual patterns may indicate bias:

All participants highly educated
No one in lowest income bracket
Everyone reports above-average performance
Missing entire subgroups

Addressing Sampling Bias

What to Do If Bias is Present:

1. Statistical Weighting

Adjust sample to match population proportions

Example: If 60% of population is female but only 45% of sample, weight female responses higher in analysis

Limitation: Only works for known characteristics; can't fix unknown biases

2. Post-Stratification

After data collection, adjust results to match known population distribution

3. Sensitivity Analysis

Test how results would change under different assumptions about non-responders

Example: "If all non-responders had lowest values, our estimate would be..."

4. Honest Reporting

Most important: Transparently report:

Sampling method used
Response rate achieved
Known differences from population
Potential impact on findings
Limitations of generalizability

                                    Best Practices to Minimize Sampling Bias
                                    Use probability sampling when possible
Obtain complete, current sampling frame
Maximize response rates (80%+ goal)
Use multiple recruitment strategies
Make participation convenient (online + paper, multiple languages)
Offer appropriate incentives
Follow up with non-responders multiple times
Monitor sample composition during data collection
Compare sample to population on key characteristics
Report honestly about limitations

                                

Critical Reminder

Large sample ≠ Representative sample

A biased sample of 10,000 people is still biased. The famous 1936 Literary Digest poll surveyed 2.4 million people but made completely wrong election prediction due to sampling bias.

Priority order:

Eliminate bias (most important)
Then increase sample size

A well-designed small sample beats a poorly designed large sample every time!

Summary

Module 05 Key Takeaways

What You've Learned

Sampling is selecting a subset from a population to make inferences about the whole
Probability sampling allows generalization; non-probability sampling is practical but limited
Sample size depends on effect size, power, significance level, and population variability
Random sampling error is inevitable and quantifiable; sampling bias is systematic and problematic
Maximize response rates and compare sample to population to detect and minimize bias

Next Steps

In Module 06: Data Collection Methods, you'll explore various techniques for gathering data including surveys, interviews, observations, and experiments. Learn to design effective instruments, conduct valid measurements, and choose appropriate data collection methods for your research questions.

Continue to Module 06

Practice

Sampling Practice Exercises

Applied Sampling Tasks

Method Selection: For each research scenario, identify the most appropriate sampling method and justify your choice:
- National survey of healthcare quality
- Study of rare disease patients
- Comparing teaching methods in one school
- Understanding experiences of refugees
Sample Size Calculation: Use G*Power or online calculator to determine sample size for:
- Comparing two groups, medium effect, power=.80
- Estimating population mean within ±2 points
- Correlation study to detect r=.30
Bias Detection: Read a published study and identify potential sources of sampling bias. How might these biases affect conclusions?
Design Your Sampling Plan: For your research question:
- Define your population
- Choose sampling method
- Calculate required sample size
- Identify potential biases
- Plan strategies to minimize bias
Response Rate Strategy: Develop a plan to achieve 70%+ response rate for an online survey, including incentives, reminders, and follow-up procedures.