Topic 1

Introduction to Sampling

Sampling is the process of selecting a subset of individuals from a larger population to study. Since it's usually impossible to study entire populations, researchers use carefully selected samples to make inferences about populations. Understanding sampling is essential for conducting valid research and interpreting findings appropriately.

Key Terminology

Population

Definition: The entire group you want to study and draw conclusions about

Examples:

  • All university students in Thailand
  • All patients with diabetes worldwide
  • All small businesses in Bangkok
  • All high school teachers in the United States

Note: Populations can be finite (countable) or infinite (theoretical)

Sample

Definition: A subset of the population that you actually study

Examples:

  • 500 university students from 5 Bangkok universities
  • 200 diabetes patients from two hospitals
  • 50 small businesses in the Sukhumvit area
  • 300 high school teachers from California

Goal: Sample should be representative of the population

Sampling Frame

Definition: A list of all elements in the population from which you'll draw your sample

Examples:

  • Student enrollment database
  • Hospital patient registry
  • Business registration list
  • Telephone directory
  • Email list of organization members

Important: Sampling frame may not perfectly match the target population

Sampling Unit

Definition: The element or set of elements considered for selection at each stage

Examples:

  • Individual persons
  • Households
  • Schools or classrooms
  • Organizations
  • Geographic areas

Parameter

Definition: A numerical characteristic of the population (what you want to know)

Examples:

  • Population mean (μ)
  • Population proportion (p)
  • Population standard deviation (σ)

Note: Usually unknown—that's why we sample!

Statistic

Definition: A numerical characteristic of the sample (what you calculate from data)

Examples:

  • Sample mean (x̄)
  • Sample proportion (p̂)
  • Sample standard deviation (s)

Goal: Use statistics to estimate parameters

Why We Sample

Cost-Effective

Studying entire populations is usually prohibitively expensive. Sampling dramatically reduces costs while still providing accurate estimates.

Example: Rather than surveying all 5 million university students in a country, survey 1,000 students for reliable estimates at a fraction of the cost.

Time-Efficient

Collecting data from entire populations takes too long. Sampling allows timely completion of research projects.

Example: Election polls sample voters to predict outcomes quickly rather than waiting for everyone to vote.

Practical Feasibility

Some populations are impossible to access completely or don't have complete lists.

Example: No complete list exists of all people with depression, so sampling from accessible sources is necessary.

Sometimes More Accurate

Well-designed samples with careful data collection can be more accurate than careless population censuses.

Example: A careful sample survey with high response rates may be more accurate than a census with many non-responses.

Destructive Testing

When testing destroys the item, you must sample rather than test everything.

Example: Testing battery life requires using batteries until they die—can't test every battery produced!

Infinite Populations

Some populations are theoretical and infinite, making sampling the only option.

Example: All possible measurements of a physical constant or all potential outcomes of a process.

When NOT to Sample: Census

Use a Census (Study Everyone) When:

  • Population is small: With only 30 people in your department, just survey everyone
  • Resources allow: You have sufficient time and money for complete enumeration
  • Precision required: Need exact counts, not estimates (e.g., organizational records)
  • Political/ethical reasons: Everyone should have opportunity to participate (e.g., employee satisfaction surveys)
  • High variability: Population so diverse that large sample would be needed anyway

Representative Samples

What Makes a Sample Representative?

A representative sample accurately reflects the characteristics of the population it's drawn from. Key characteristics should be present in the sample in the same proportions as in the population.

Example: University Student Population
Characteristic Population Representative Sample Biased Sample
Gender 55% Female, 45% Male 54% Female, 46% Male ✓ 70% Female, 30% Male ✗
Year Level 30% 1st, 25% 2nd, 25% 3rd, 20% 4th 29% 1st, 26% 2nd, 24% 3rd, 21% 4th ✓ 50% 1st, 30% 2nd, 15% 3rd, 5% 4th ✗
Major 40% STEM, 35% Social Sci, 25% Humanities 39% STEM, 36% Social Sci, 25% Humanities ✓ 60% STEM, 25% Social Sci, 15% Humanities ✗
Warning: Convenience Samples Are Rarely Representative

Just because a sample is large doesn't mean it's representative. A sample of 10,000 people recruited from social media may be less representative than a carefully selected random sample of 400 people.

Two Main Sampling Approaches

Probability Sampling

Every member of the population has a known, non-zero chance of being selected

Characteristics:

  • Random selection
  • Known probability of selection
  • Allows generalization to population
  • Can calculate sampling error
  • More rigorous and defensible

Use when: Generalizability is important and you have access to sampling frame

Gold standard for quantitative research

Non-Probability Sampling

Not every member has a known or equal chance of being selected

Characteristics:

  • Non-random selection
  • Unknown probability of selection
  • Limited generalizability
  • Cannot calculate sampling error statistically
  • More practical and accessible

Use when: Exploratory research, hard-to-reach populations, qualitative studies, or practical constraints prevent probability sampling

Very common in practice, especially qualitative research

Choosing Between Probability and Non-Probability

The choice depends on:

  • Research goals: Need to generalize to population or explore in-depth?
  • Resources: Time, money, and access to sampling frame
  • Population characteristics: Accessible or hard-to-reach?
  • Research design: Quantitative hypothesis-testing or qualitative exploration?

Remember: Non-probability sampling doesn't mean bad sampling. It's appropriate for many research purposes—just be clear about limitations.

Topic 2

Probability Sampling Methods

Probability sampling methods use random selection to ensure every member of the population has a known chance of being included. These methods allow you to calculate sampling error and make statistical inferences about the population. Understanding when and how to use each method is essential for rigorous quantitative research.

1. Simple Random Sampling (SRS)

Simple Random Sampling

Most Basic

Definition: Every member of the population has an equal and independent chance of being selected.

How to Do It:
  1. Obtain complete sampling frame (list of all population members)
  2. Assign each member a unique number
  3. Use random number generator or table to select sample
  4. Contact and recruit selected individuals
Example:

Population: 5,000 students at a university

Process:

  • Obtain list of all 5,000 students from registrar
  • Number them 0001 to 5000
  • Use random number generator to select 400 numbers
  • Survey the 400 selected students
Advantages:
  • Simple to understand and implement
  • No bias in selection process
  • Allows calculation of sampling error
  • Results generalizable to population
Disadvantages:
  • Requires complete sampling frame
  • May be geographically dispersed (expensive to reach)
  • May not capture rare subgroups
  • Can be inefficient for heterogeneous populations
When to Use:
  • Complete sampling frame available
  • Population relatively homogeneous
  • Resources allow reaching dispersed sample
  • No important subgroups that need guaranteed representation

2. Systematic Random Sampling

Systematic Random Sampling

Efficient

Definition: Select every kth element from a list after a random start.

How to Do It:
  1. Calculate sampling interval: k = N/n (population size / desired sample size)
  2. Randomly select starting point between 1 and k
  3. Select every kth element from that point
Example:

Population: 5,000 students; Desired sample: 500

Process:

  • k = 5,000 / 500 = 10
  • Randomly select start between 1-10, say 7
  • Select students #7, 17, 27, 37, 47... until 500 selected
Advantages:
  • Simpler than simple random sampling
  • More efficient, especially with physical lists
  • Ensures spread across population
  • Works well when list is random order
Disadvantages:
  • Risk of periodicity in list
  • Less random than SRS if list has patterns
  • All elements don't have independent chance
  • Requires list in random or neutral order
Watch Out for Periodicity!

Problem: If the list has periodic patterns that align with your sampling interval, bias occurs.

Example: List of apartments organized by building with 10 units per building, all 1st units corner units. If k=10, you'd sample only corner units!

Solution: Examine list for patterns before selecting k

3. Stratified Random Sampling

Stratified Random Sampling

Precise

Definition: Divide population into homogeneous subgroups (strata), then randomly sample from each stratum.

How to Do It:
  1. Identify stratification variable (e.g., gender, age group, region)
  2. Divide population into mutually exclusive strata
  3. Determine sample size for each stratum
  4. Randomly sample within each stratum
Two Types:
Proportionate Stratified Sampling

Sample from each stratum proportional to its size in population

Example: University with 60% undergrads, 40% graduates

Sample 500 students: 300 undergrads (60%), 200 graduates (40%)

Disproportionate Stratified Sampling

Oversample small but important subgroups

Example: University with 95% domestic, 5% international students

Sample 400 students: 200 domestic, 200 international (equal groups for comparison)

Note: Use weights in analysis to adjust for oversampling

Detailed Example:

Research: Student stress levels across different majors

Stratum (Major) Population N % of Total Sample n
Engineering 2,000 40% 200
Business 1,500 30% 150
Arts 1,000 20% 100
Sciences 500 10% 50
Total 5,000 100% 500

Randomly select students within each major to reach target numbers

Advantages:
  • More precise than simple random sampling
  • Ensures representation of important subgroups
  • Allows comparison between strata
  • Can improve efficiency and reduce sampling error
  • Particularly useful for heterogeneous populations
Disadvantages:
  • Requires information about population to stratify
  • More complex to administer
  • Need separate sampling frame for each stratum
  • Must choose stratification variables carefully
When to Use:
  • Important subgroups exist that must be represented
  • Population is heterogeneous
  • Want to make comparisons between groups
  • Information about strata is available
  • Need more precision than simple random sampling

4. Cluster Sampling

Cluster Sampling

Practical

Definition: Divide population into clusters (groups), randomly select clusters, then study all members within selected clusters or sample within clusters.

How to Do It:
  1. Divide population into naturally occurring clusters
  2. Randomly select a sample of clusters
  3. Either:
    • One-stage: Study all members in selected clusters
    • Two-stage: Randomly sample members within selected clusters
Cluster vs. Stratified Sampling
Stratified Sampling
  • Strata are homogeneous within
  • Sample from ALL strata
  • Increases precision
  • Goal: represent diversity
Cluster Sampling
  • Clusters are heterogeneous within
  • Sample only SOME clusters
  • Increases efficiency
  • Goal: reduce costs
Example: One-Stage Cluster Sampling

Population: Students in 50 schools across a country

Process:

  • Schools are clusters
  • Randomly select 10 of the 50 schools
  • Survey ALL students in those 10 schools
Example: Two-Stage Cluster Sampling

Population: Households in a city

Process:

  • Stage 1: Randomly select 30 neighborhoods (clusters)
  • Stage 2: Randomly select 20 households within each selected neighborhood
  • Total sample: 30 × 20 = 600 households
Advantages:
  • Very cost-effective and practical
  • Doesn't require complete sampling frame for population
  • Concentrates data collection geographically
  • Useful when population naturally grouped
  • Easier to administer than dispersed samples
Disadvantages:
  • Less precise than SRS (higher sampling error)
  • Clusters may not represent full population diversity
  • Design effect reduces effective sample size
  • Requires larger sample for same precision
  • Complex statistical analysis needed
When to Use:
  • Population spread over wide geographic area
  • Complete sampling frame unavailable but cluster list available
  • Cost or time constraints significant
  • Natural clusters exist (schools, organizations, neighborhoods)
  • Efficiency more important than precision

5. Multistage Sampling

Multistage Sampling

Complex

Definition: Combines multiple sampling methods across several stages, typically starting with large units and progressively selecting smaller units.

Example: National Survey

Goal: Survey households across Thailand

Stage 1: Regions

Stratify by region (North, Northeast, Central, South)

Select proportional number of provinces from each region

Stage 2: Districts

Randomly select districts within chosen provinces

Stage 3: Sub-districts

Randomly select sub-districts within chosen districts

Stage 4: Households

Randomly select households within chosen sub-districts

Stage 5: Individuals

Select one adult per household using Kish grid

Advantages:
  • Highly practical for large populations
  • Can combine strengths of different methods
  • Flexible and adaptable
  • Only need sampling frames at final stage
Disadvantages:
  • Complex design and administration
  • Complex statistical analysis
  • Sampling error accumulates across stages
  • Requires careful documentation

Choosing the Right Probability Method

Use Simple Random Sampling when: You have complete frame, homogeneous population, resources for dispersed sample

Use Stratified Sampling when: Important subgroups exist, population heterogeneous, want comparisons between groups

Use Cluster Sampling when: Wide geographic spread, limited resources, natural clusters exist

Use Multistage Sampling when: Very large population, no complete frame, need to balance precision and cost

Topic 3

Non-Probability Sampling Methods

Non-probability sampling methods don't involve random selection, meaning not everyone in the population has a known chance of being included. While these methods don't allow statistical generalization to populations, they're often more practical, less expensive, and appropriate for many research purposes—especially exploratory studies and qualitative research.

When Non-Probability Sampling is Appropriate

  • Qualitative research focused on depth rather than generalization
  • Exploratory studies generating hypotheses
  • Hard-to-reach or hidden populations
  • Pilot studies or pretesting instruments
  • Time or resource constraints prevent probability sampling
  • Complete sampling frame unavailable

1. Convenience Sampling

Convenience Sampling

Most Common

Definition: Select participants who are easiest to reach or most readily available.

Examples:
  • Surveying students in your class
  • Recruiting participants from social media followers
  • Interviewing patients at one clinic
  • Stopping people at shopping mall
  • Using volunteers who respond to advertisement
  • Studying your own organization or workplace
Advantages:
  • Quick and easy
  • Inexpensive
  • Requires minimal planning
  • Useful for pilot studies
  • Good for initial exploration
Disadvantages:
  • High risk of bias
  • Least credible method
  • Cannot generalize results
  • Sample likely unrepresentative
  • Unknown sampling error
Critical Limitation

Convenience samples are almost always biased. People who are easy to reach often differ systematically from those who aren't. Use only when generalization isn't important or as preliminary research. Never claim your convenience sample represents the population.

2. Purposive (Judgmental) Sampling

Purposive Sampling

Strategic

Definition: Researcher deliberately selects participants based on specific characteristics relevant to the research question.

Types of Purposive Sampling:
1. Expert Sampling

Select individuals with specific expertise or experience

Example: Interviewing experienced surgeons about a new surgical technique

2. Extreme/Deviant Case Sampling

Focus on unusual or special cases

Example: Studying highly successful startups to identify success factors

3. Typical Case Sampling

Select average or normal cases

Example: Studying middle-performing schools to understand typical challenges

4. Maximum Variation Sampling

Capture wide range of perspectives and experiences

Example: Including teachers from urban, suburban, and rural schools of varying sizes

5. Homogeneous Sampling

Select similar cases to reduce variation

Example: Only first-generation college students for focused study

6. Critical Case Sampling

Select cases that make a point dramatically

Example: "If it works here, it will work anywhere" or vice versa

Advantages:
  • Targets information-rich cases
  • Efficient use of limited resources
  • Appropriate for qualitative research
  • Flexible and adaptable
  • Can focus on specific characteristics
Disadvantages:
  • Researcher bias in selection
  • Cannot generalize statistically
  • Difficult to defend selection
  • May miss important perspectives
When to Use:
  • Qualitative research seeking deep understanding
  • Studying specialized or rare populations
  • Need participants with specific characteristics
  • Exploratory research
  • Resource constraints limit sample size

3. Quota Sampling

Quota Sampling

Structured

Definition: Divide population into categories and set quotas for each, then use convenience sampling within each category until quotas filled.

How to Do It:
  1. Identify important characteristics for representation
  2. Determine proportions in population
  3. Set quotas matching these proportions
  4. Recruit participants non-randomly until quotas met
Example:

Population: University with 60% female, 40% male students

Target sample: 200 students

Quotas: 120 females, 80 males

Process: Recruit students on campus (convenience) until reaching 120 females and 80 males

Quota vs. Stratified Sampling
Similarities:
  • Both divide population into categories
  • Both sample from each category
  • Both ensure representation
Key Difference:

Stratified: Random selection within strata (probability sampling)

Quota: Non-random selection within quotas (non-probability sampling)

Advantages:
  • Ensures subgroup representation
  • More representative than simple convenience
  • Easier and cheaper than stratified random
  • No sampling frame needed
  • Quick data collection
Disadvantages:
  • Still biased within quotas
  • Cannot calculate sampling error
  • Selection bias remains
  • Not as rigorous as stratified random

4. Snowball Sampling

Snowball Sampling

Network-Based

Definition: Existing participants recruit future participants from their networks. Sample "snowballs" as each participant refers others.

How to Do It:
  1. Identify few initial participants (seeds)
  2. Interview/survey initial participants
  3. Ask them to refer others who meet criteria
  4. Contact referred individuals
  5. Repeat process until desired sample size reached
Example:

Research: Experiences of undocumented immigrants

Challenge: No list exists, population hidden due to legal concerns

Solution:

  • Find initial participant through community organization
  • After interview, ask if they know others willing to participate
  • Each participant refers 2-3 others
  • Continue until 30 interviews completed
Advantages:
  • Reaches hidden or hard-to-access populations
  • Builds trust through referrals
  • Low cost
  • Sometimes the only feasible method
  • Leverages social networks
Disadvantages:
  • High sampling bias (network homogeneity)
  • Sample lacks independence
  • May miss isolated individuals
  • Cannot generalize
  • Slow sample accumulation
  • Ethical concerns about privacy
When to Use:
  • Studying hidden or stigmatized populations
  • No sampling frame available
  • Population characterized by strong networks
  • Gaining access requires insider trust

Common with: Drug users, homeless populations, rare disease patients, elite groups, illegal activities

5. Volunteer/Self-Selection Sampling

Volunteer Sampling

Participant-Driven

Definition: Participants volunteer themselves in response to a general invitation.

Examples:
  • Online surveys shared on social media
  • Email invitations to organization members
  • Recruitment flyers posted on campus
  • Call for participants on websites
  • Amazon MTurk, Prolific platforms
Advantages:
  • Very easy to implement
  • Low cost, especially online
  • Can reach large numbers quickly
  • Participants motivated
  • Ethical (no coercion)
Disadvantages:
  • Extreme self-selection bias
  • Volunteers differ from non-volunteers
  • Cannot control who responds
  • May attract people with strong opinions
  • Least generalizable method
Volunteer Bias

Research shows volunteers systematically differ from non-volunteers: they tend to be more educated, higher socioeconomic status, more social, more interested in the topic, and often have stronger opinions. Never assume volunteer samples represent the general population.

6. Theoretical Sampling (for Qualitative Research)

Theoretical Sampling

Grounded Theory

Definition: Data collection driven by emerging theory. Researcher decides who to sample next based on what's been learned from previous participants.

Process:
  1. Collect initial data
  2. Analyze for emerging concepts
  3. Identify gaps or questions
  4. Sample next participant to address these gaps
  5. Continue until theoretical saturation (no new insights)
Example:

Research: How people cope with chronic illness

  • Interview 3 recently diagnosed patients → identify "denial" theme
  • Sample long-term patients → discover "acceptance" process
  • Sample patients who struggled vs. adjusted well → identify coping strategies
  • Sample family members → understand support systems
  • Continue until no new themes emerge
When to Use:
  • Grounded theory studies
  • Theory development from data
  • Iterative qualitative research
  • Unknown what's important initially

Choosing Non-Probability Methods

Use Convenience when: Pilot testing, quick exploration, severe constraints

Use Purposive when: Specific expertise needed, information-rich cases, qualitative research

Use Quota when: Need representation of subgroups but can't do random sampling

Use Snowball when: Hidden populations, network-based communities, trust needed

Use Volunteer when: Online surveys, broad recruitment, motivated participants acceptable

Use Theoretical when: Building theory from data, iterative qualitative analysis

Reporting Non-Probability Samples

Always be transparent about:

  • How participants were selected
  • Why this method was chosen
  • Limitations for generalizability
  • How sample characteristics may affect findings

Never imply that non-probability samples represent the population statistically. Use terms like "participants," "respondents," or "this sample" rather than "the population."

Topic 4

Sample Size Determination

Determining appropriate sample size is crucial for research validity and efficiency. Too small a sample lacks statistical power to detect real effects, while too large wastes resources. This topic covers factors affecting sample size decisions and methods for calculating required sample sizes for different research designs.

Why Sample Size Matters

Too Small

  • Insufficient statistical power
  • May miss real effects (Type II error)
  • Wide confidence intervals
  • Results unreliable
  • Unethical waste of participants' time

Just Right

  • Adequate statistical power
  • Can detect meaningful effects
  • Reasonable confidence intervals
  • Reliable results
  • Efficient use of resources

Too Large

  • Wastes time and money
  • May find trivial effects significant
  • Unnecessary participant burden
  • Delayed completion
  • Ethical concerns about efficiency

Factors Affecting Required Sample Size

1

Effect Size

Definition: The magnitude of the difference or relationship you expect to find

Cohen's conventions:

  • Small effect: d = 0.2 (subtle, requires large sample)
  • Medium effect: d = 0.5 (moderate, typical in social sciences)
  • Large effect: d = 0.8 (substantial, requires smaller sample)

Principle: Smaller effects require larger samples to detect

Example: To detect small effect (d=0.2), need ~800 per group. For large effect (d=0.8), need ~50 per group.

2

Statistical Power

Definition: Probability of detecting an effect when it truly exists (1 - β)

  • Minimum acceptable: .80 (80% chance of detecting effect)
  • Preferred: .90 (90% chance)
  • Very high: .95 (95% chance)

Principle: Higher desired power requires larger samples

Trade-off: Power of .80 is convention, balancing rigor with practicality. Higher power is better but requires more resources.

3

Significance Level (α)

Definition: Probability of Type I error (false positive)

  • Standard: α = .05 (5% false positive rate)
  • Conservative: α = .01 (1% false positive rate)
  • Liberal: α = .10 (10% false positive rate)

Principle: More stringent alpha (lower value) requires larger samples

4

Population Variability

Definition: How much individuals differ from each other on your variable

Principle: More variable populations require larger samples for precise estimates

Example: Age of elementary students (narrow range, less variance) needs smaller sample than age of all university students (wide range, more variance)

5

Number of Variables/Groups

Definition: Complexity of your analysis

  • Simple comparison: 2 groups → ~100 total
  • Multiple groups: 4 groups → ~200 total
  • Multiple regression: 10 predictors → ~150+ cases
  • Factor analysis: Need 5-10 cases per variable

Rule of thumb: More complex analyses need larger samples

6

Expected Response/Attrition Rate

Definition: Proportion who will complete the study

Principle: Account for non-response and dropout

Calculation:

If you need 200 completers and expect 20% attrition:

Recruit: 200 / 0.80 = 250 participants

Sample Size Formulas

1. Sample Size for Estimating a Population Mean

Formula:

n = (Z² × σ²) / E²

Where:

  • n = required sample size
  • Z = Z-score for desired confidence level (1.96 for 95%)
  • σ = population standard deviation (estimate from pilot or literature)
  • E = desired margin of error
Example:

Goal: Estimate average GPA within ±0.1 points, 95% confidence

Known: σ ≈ 0.5 (from previous research)

Calculation:

n = (1.96² × 0.5²) / 0.1² = (3.84 × 0.25) / 0.01 = 0.96 / 0.01 = 96 students

2. Sample Size for Estimating a Population Proportion

Formula:

n = (Z² × p × (1-p)) / E²

Where:

  • n = required sample size
  • Z = Z-score for desired confidence level
  • p = estimated proportion (use 0.5 if unknown, gives largest sample)
  • E = desired margin of error
Example:

Goal: Estimate % of students who smoke within ±3%, 95% confidence

Estimated: p ≈ 0.15 (15% from literature)

Calculation:

n = (1.96² × 0.15 × 0.85) / 0.03² = (3.84 × 0.1275) / 0.0009 = 0.49 / 0.0009 = 544 students

3. Sample Size for Comparing Two Means (t-test)

Simplified Formula:

n per group ≈ 16/d²

For: α = .05, power = .80, two-tailed test

d = effect size (Cohen's d)

Quick Reference Table:
Effect Size n per Group Total N
Small (d = 0.2) 400 800
Medium (d = 0.5) 64 128
Large (d = 0.8) 25 50

Sample Size for Qualitative Research

Qualitative research doesn't use statistical formulas. Instead, sample size depends on:

1. Theoretical Saturation

Continue sampling until no new themes or insights emerge

Typical: 15-60 participants for interviews, 5-25 for focus groups

2. Research Design
  • Phenomenology: 5-25 participants
  • Grounded theory: 20-60 participants
  • Ethnography: Extended engagement with community
  • Case study: 1-4 cases
3. Depth vs. Breadth
  • Homogeneous sample: Fewer participants needed
  • Diverse sample: More participants to capture variety
  • In-depth interviews: Fewer participants
  • Brief surveys: More participants

Practical Guidelines and Rules of Thumb

Minimum Samples by Analysis Type:
  • Correlations: Minimum 30 for stable estimates
  • t-tests/ANOVA: Minimum 20-30 per group
  • Multiple regression: 10-15 cases per predictor
  • Factor analysis: 5-10 cases per variable, minimum 100
  • Structural equation modeling: 200-400 minimum
  • Chi-square: Expected frequency ≥5 in each cell
Survey Research Rules:
  • Small populations (<100): Census (everyone)
  • Medium populations (100-1000): 30-50%
  • Large populations (>1000): 10-20%, minimum 400
  • National surveys: 1000-1500 for ±3% margin of error

Use Online Sample Size Calculators

Rather than calculating by hand, use free online tools:

  • G*Power: Free software for power analysis (highly recommended)
  • Raosoft: Quick sample size calculator
  • SurveyMonkey: Sample size calculator
  • Sample Size Calculator (ClinCalc): For clinical studies

Always report the method and assumptions used for your sample size calculation!

When You Can't Achieve Ideal Sample Size

Reality often prevents reaching calculated sample sizes. If this happens:

  • Document the constraint honestly
  • Report achieved power (post-hoc power analysis)
  • Acknowledge as limitation
  • Be cautious interpreting null results
  • Consider pilot study or smaller scope
  • Focus on effect sizes, not just p-values

Remember: A well-executed small study is better than a poorly executed large study!

Topic 5

Sampling Errors and Bias

Understanding sampling errors and bias is crucial for conducting valid research and interpreting findings appropriately. This topic distinguishes between random sampling error (inevitable and quantifiable) and sampling bias (systematic and preventable), and provides strategies for minimizing both.

Two Types of Error

Sampling Error (Random)

Inevitable

Definition: Random variation between sample statistics and population parameters due to chance in selection process.

Characteristics:
  • Occurs in all samples
  • Due to chance, not researcher error
  • Random across samples
  • Can be quantified (confidence intervals, standard errors)
  • Decreases with larger sample sizes
  • Expected and acceptable
Example:

If true population mean GPA is 3.0, different random samples might yield means of 2.95, 3.02, 3.01, 2.98—close but not exact.

How to Manage:
  • Increase sample size
  • Use probability sampling
  • Report confidence intervals
  • Accept as inherent to sampling

Sampling Bias (Systematic)

Problematic

Definition: Systematic deviation from the true population parameter due to flawed sampling methods.

Characteristics:
  • Systematic, not random
  • Due to flawed methods
  • Consistently in one direction
  • Cannot be quantified easily
  • Doesn't decrease with larger samples
  • Threatens validity
Example:

If you survey only students in the library (who tend to be more studious), every sample will overestimate average GPA—not due to chance, but systematic selection.

How to Prevent:
  • Use probability sampling
  • Ensure complete sampling frame
  • Maximize response rates
  • Be aware of systematic exclusions

Key Distinction

Sampling Error: "My sample mean is 3.02 instead of exactly 3.00" → Acceptable

Sampling Bias: "My sampling method systematically excludes low-performers, so I'm always too high" → Problem!

Important: Increasing sample size reduces sampling error but does NOT fix sampling bias!

Types of Sampling Bias

1. Selection Bias

Definition: Certain members of population systematically more likely to be selected than others

Examples:
  • Convenience sampling: Only studying accessible people
  • Volunteer bias: Only motivated people participate
  • Snowball sampling: Only networked individuals included
  • 1936 Literary Digest poll: Surveyed phone/car owners (wealthy), wrongly predicted election
Prevention:
  • Use probability sampling methods
  • Ensure all population members have chance of selection
  • Avoid self-selection

2. Non-Response Bias

Definition: Selected participants don't respond, and non-responders differ systematically from responders

Examples:
  • Mailed surveys: Only motivated people return (often 20-30% response rate)
  • Online surveys: Only frequent internet users respond
  • Phone surveys: People screen calls, busy people don't answer
  • Satisfaction surveys: Extremely satisfied or dissatisfied respond more
Impact by Response Rate:
  • 80%+ response: Minimal concern
  • 60-80%: Some concern, check for bias
  • 40-60%: Significant concern, definitely check
  • <40%: Major concern, likely biased
Prevention/Mitigation:
  • Multiple contact attempts
  • Incentives for participation
  • Make participation convenient
  • Follow-up reminders
  • Compare early vs. late responders
  • Compare respondents to population data
  • Report response rates transparently

3. Sampling Frame Error

Definition: Sampling frame doesn't accurately represent target population

Examples:
  • Undercoverage: Phone directory misses cell-only users
  • Overcoverage: List includes people outside target population
  • Outdated list: Alumni database hasn't been updated in years
  • Incomplete list: Not all population members on list
Prevention:
  • Use most complete, current frame available
  • Update lists before sampling
  • Screen participants for eligibility
  • Use multiple overlapping frames
  • Document frame limitations

4. Survivorship Bias

Definition: Only studying "survivors" (those who made it through selection process), missing those who didn't

Examples:
  • College student studies: Miss those who dropped out
  • Business success studies: Only study successful companies, ignore failures
  • Treatment effectiveness: Only follow-up those who completed treatment
  • WWII planes: Reinforced areas with bullet holes, but should reinforce where survivors weren't hit (fatal areas)
Prevention:
  • Include dropouts/failures in analysis
  • Track attrition carefully
  • Use intent-to-treat analysis
  • Consider what's missing

5. Undercoverage Bias

Definition: Some population segments systematically excluded

Examples:
  • Online surveys: Exclude those without internet
  • Daytime surveys: Miss working people
  • English-only surveys: Exclude non-English speakers
  • Landline surveys: Miss young adults (cell-phone only)
Prevention:
  • Use multiple recruitment methods
  • Offer multiple response formats
  • Translate materials
  • Adjust sampling to reach underrepresented groups

Detecting Sampling Bias

Methods to Identify Bias in Your Sample:

1. Compare Sample to Known Population

Compare your sample demographics to known population statistics

Characteristic Population Your Sample Assessment
Female 55% 53% ✓ Similar
Age 18-22 60% 45% ⚠ Underrepresented
STEM major 40% 65% ✗ Overrepresented
2. Early vs. Late Responders

Compare those who responded immediately to those who required follow-up

Logic: Late responders are more similar to non-responders

If significant differences exist: Non-response bias likely present

3. Follow-Up with Non-Responders

Intensively pursue small sample of non-responders to compare characteristics

Methods: Higher incentives, phone calls, in-person visits

4. Check for Logical Inconsistencies

Unusual patterns may indicate bias:

  • All participants highly educated
  • No one in lowest income bracket
  • Everyone reports above-average performance
  • Missing entire subgroups

Addressing Sampling Bias

What to Do If Bias is Present:

1. Statistical Weighting

Adjust sample to match population proportions

Example: If 60% of population is female but only 45% of sample, weight female responses higher in analysis

Limitation: Only works for known characteristics; can't fix unknown biases

2. Post-Stratification

After data collection, adjust results to match known population distribution

3. Sensitivity Analysis

Test how results would change under different assumptions about non-responders

Example: "If all non-responders had lowest values, our estimate would be..."

4. Honest Reporting

Most important: Transparently report:

  • Sampling method used
  • Response rate achieved
  • Known differences from population
  • Potential impact on findings
  • Limitations of generalizability

Best Practices to Minimize Sampling Bias

  • Use probability sampling when possible
  • Obtain complete, current sampling frame
  • Maximize response rates (80%+ goal)
  • Use multiple recruitment strategies
  • Make participation convenient (online + paper, multiple languages)
  • Offer appropriate incentives
  • Follow up with non-responders multiple times
  • Monitor sample composition during data collection
  • Compare sample to population on key characteristics
  • Report honestly about limitations

Critical Reminder

Large sample ≠ Representative sample

A biased sample of 10,000 people is still biased. The famous 1936 Literary Digest poll surveyed 2.4 million people but made completely wrong election prediction due to sampling bias.

Priority order:

  1. Eliminate bias (most important)
  2. Then increase sample size

A well-designed small sample beats a poorly designed large sample every time!

Summary

Module 05 Key Takeaways

What You've Learned

  • Sampling is selecting a subset from a population to make inferences about the whole
  • Probability sampling allows generalization; non-probability sampling is practical but limited
  • Sample size depends on effect size, power, significance level, and population variability
  • Random sampling error is inevitable and quantifiable; sampling bias is systematic and problematic
  • Maximize response rates and compare sample to population to detect and minimize bias

Next Steps

In Module 06: Data Collection Methods, you'll explore various techniques for gathering data including surveys, interviews, observations, and experiments. Learn to design effective instruments, conduct valid measurements, and choose appropriate data collection methods for your research questions.

Continue to Module 06
Practice

Sampling Practice Exercises

Applied Sampling Tasks

  1. Method Selection: For each research scenario, identify the most appropriate sampling method and justify your choice:
    • National survey of healthcare quality
    • Study of rare disease patients
    • Comparing teaching methods in one school
    • Understanding experiences of refugees
  2. Sample Size Calculation: Use G*Power or online calculator to determine sample size for:
    • Comparing two groups, medium effect, power=.80
    • Estimating population mean within ±2 points
    • Correlation study to detect r=.30
  3. Bias Detection: Read a published study and identify potential sources of sampling bias. How might these biases affect conclusions?
  4. Design Your Sampling Plan: For your research question:
    • Define your population
    • Choose sampling method
    • Calculate required sample size
    • Identify potential biases
    • Plan strategies to minimize bias
  5. Response Rate Strategy: Develop a plan to achieve 70%+ response rate for an online survey, including incentives, reminders, and follow-up procedures.