Overview of Data Collection
Data collection is the systematic process of gathering information to answer research questions and test hypotheses. The quality of your research depends heavily on the quality of your data, making careful planning and execution of data collection essential for valid, reliable findings.
What is Data?
Quantitative Data
Numerical information that can be measured and analyzed statistically
Examples:
- Test scores (85, 92, 78)
- Age in years (25, 34, 41)
- Income levels ($45,000)
- Likert scale ratings (1-5)
- Counts and frequencies
- Physiological measurements
Collected via: Surveys, experiments, tests, physiological instruments
Qualitative Data
Non-numerical information describing qualities, characteristics, or meanings
Examples:
- Interview transcripts
- Open-ended responses
- Field observation notes
- Documents and artifacts
- Video/audio recordings
- Photographs and images
Collected via: Interviews, observations, documents, focus groups
Primary vs. Secondary Data
Primary Data
Definition: Original data collected firsthand by the researcher for the specific study
Characteristics:
- Collected specifically for your research
- You control the process
- Tailored to your research questions
- More time-consuming and expensive
- Current and relevant
Methods:
- Surveys and questionnaires
- Interviews
- Observations
- Experiments
- Focus groups
Secondary Data
Definition: Existing data collected by others for different purposes that you analyze for your study
Characteristics:
- Already exists
- Less control over quality
- May not perfectly match your needs
- Faster and cheaper to obtain
- May be dated
Sources:
- Government statistics
- Published research datasets
- Organizational records
- Historical documents
- Census data
Key Data Collection Methods
Surveys/Questionnaires
Structured instruments with predetermined questions administered to many respondents
Interviews
Directed conversations to explore perspectives, experiences, and meanings in depth
Observations
Systematic watching and recording of behaviors, events, or phenomena
Focus Groups
Moderated group discussions to explore shared perspectives and group dynamics
Experiments
Controlled manipulation of variables to test cause-and-effect relationships
Document Analysis
Systematic examination of existing documents, records, and artifacts
Choosing Data Collection Methods
Consider These Factors:
Research Questions
What type of data do your questions require?
- "How many?" → Surveys, quantitative
- "How do people experience?" → Interviews, qualitative
- "What happens when?" → Observations
- "Does X cause Y?" → Experiments
Research Design
Match method to your overall approach
- Experimental: Controlled measurements
- Survey: Questionnaires
- Phenomenology: In-depth interviews
- Ethnography: Observations, field notes
- Mixed methods: Multiple approaches
Available Resources
What can you realistically accomplish?
- Time: Interviews take more time than surveys
- Budget: Some methods cost more
- Personnel: Do you need trained interviewers?
- Technology: Online vs. paper-based
Population Characteristics
What methods will work with your participants?
- Literacy levels: Written surveys may not work
- Access: Can you reach them in person?
- Sensitivity: Private topics may need interviews
- Location: Geographic dispersion
Validity Needs
What level of rigor is required?
- Exploratory: Flexible methods OK
- Confirmatory: Standardized methods needed
- Sensitive topics: Multiple methods for triangulation
Mixed Methods Approach
Consider combining multiple data collection methods:
- Triangulation: Use multiple methods to verify findings
- Complementarity: Use one method to elaborate on another
- Sequential: Use qualitative first to develop survey items
- Concurrent: Collect both types simultaneously
Example: Conduct interviews to understand experiences, then survey to measure prevalence of themes found.
Data Quality Considerations
Validity
Are you measuring what you intend to measure?
- Clear, unambiguous questions
- Appropriate for your constructs
- Pilot testing before use
Reliability
Would you get consistent results if repeated?
- Standardized procedures
- Training for data collectors
- Clear coding schemes
Accuracy
Are participants providing truthful information?
- Confidentiality assurances
- Non-leading questions
- Cross-checking responses
Completeness
Are you getting all the data you need?
- Minimize missing data
- Follow-up on incomplete responses
- Account for non-response