Data Types and Sampling Methods

This lesson introduces the fundamental building blocks of statistics — understanding the different types of data and the methods used to collect them. A clear grasp of data classification and sampling is essential for the AQA GCSE Mathematics Statistics topic and frequently appears in exam questions worth 2–4 marks.

Types of Data

Data can be classified in several ways. The first distinction is between qualitative and quantitative data.

Type	Definition	Examples
Qualitative	Data that describes qualities or characteristics (non-numerical)	Eye colour, favourite subject, type of transport
Quantitative	Data that can be measured or counted (numerical)	Height, number of siblings, temperature

Quantitative Data: Discrete vs Continuous

Quantitative data is further divided into two types:

Type	Definition	Examples
Discrete	Data that can only take specific values (usually whole numbers from counting)	Number of pets, shoe size, dice score
Continuous	Data that can take any value within a range (usually from measuring)	Height (1.65 m), weight (72.3 kg), time (14.7 seconds)

Exam Tip: A common exam question asks you to classify data. Remember — if you count it, it is discrete; if you measure it, it is continuous. Shoe size is a classic trick question: although it has half sizes (5.5, 6, 6.5), it is still discrete because it can only take specific values, not any value in a range.

Primary and Secondary Data

Data can also be classified by how it was collected.

Type	Definition	Advantages	Disadvantages
Primary data	Data you collect yourself for a specific purpose	Tailored to your needs; you know how it was collected	Time-consuming and expensive to collect
Secondary data	Data collected by someone else, often for a different purpose	Quick and cheap to obtain	May not exactly match your needs; may be out of date or biased

Examples of primary data: surveys, experiments, questionnaires, observations.

Examples of secondary data: government statistics, newspaper reports, internet databases, school records.

Populations and Samples

In statistics:

The population is the entire group you are interested in studying.
A sample is a smaller subset of the population that you actually collect data from.

We use samples because it is usually impractical (too expensive, too time-consuming) to survey an entire population.

What Makes a Good Sample?

A good sample should be:

Representative — it should reflect the characteristics of the whole population.
Large enough — bigger samples give more reliable results.
Unbiased — every member of the population should have a fair chance of being selected.

Exam Tip: If a question asks you to criticise a sampling method, check whether the sample is biased (certain groups are excluded or over-represented), too small, or unrepresentative of the population.

Sampling Methods

There are several methods for selecting a sample. You need to know the following five:

1. Random Sampling

Every member of the population has an equal chance of being selected. Names or numbers are drawn at random (e.g. using a random number generator, pulling names from a hat).

Advantage: No bias in selection; every member has the same chance.
Disadvantage: Requires a complete list of the population; may not represent all subgroups.

2. Systematic Sampling

Members are selected at regular intervals from an ordered list (e.g. every 10th person on a register).

Advantage: Simple and easy to carry out once you have an ordered list.
Disadvantage: Can introduce bias if there is a hidden pattern in the list.

3. Stratified Sampling

The population is divided into groups (strata) based on a characteristic (e.g. age, gender, year group). A random sample is then taken from each group, in proportion to the size of that group in the population.

The number to sample from each stratum is calculated using:

Number from stratum = (number in stratum / total population) x sample size

Worked Example

A school has the following students:

Year Group	Number of Students
Year 7	180
Year 8	160
Year 9	200
Year 10	150
Year 11	110
Total	800

A stratified sample of 80 students is needed.

Year 7: (180 / 800) x 80 = 18 students

Year 8: (160 / 800) x 80 = 16 students

Year 9: (200 / 800) x 80 = 20 students

Year 10: (150 / 800) x 80 = 15 students

Year 11: (110 / 800) x 80 = 11 students

Check: 18 + 16 + 20 + 15 + 11 = 80 (correct)

Advantage: Proportionally representative of each subgroup.
Disadvantage: Requires prior knowledge of the population's characteristics.

4. Quota Sampling

The researcher decides how many people from each group to include (sets a quota) and then selects people until each quota is filled. Unlike stratified sampling, the selection within each group is not random.

Advantage: Quick and cheap; no complete list of population needed.
Disadvantage: The researcher chooses who to include, which introduces bias.

5. Convenience (Opportunity) Sampling

The researcher simply surveys whoever is easiest to reach or most readily available.

Advantage: Quick, easy, and cheap.
Disadvantage: Very likely to be biased and unrepresentative.

Exam Tip: In AQA exams, stratified sampling calculation questions are very common. Always show the fraction (stratum size / total population) multiplied by the sample size. Round to the nearest whole number if necessary, and always check that your values add up to the required sample size.

Bias in Data Collection

Bias occurs when a sample does not fairly represent the population, leading to misleading results.

Common sources of bias include:

Selection bias — certain groups are excluded (e.g. surveying only students who stay after school).
Question bias — leading or confusing questions push respondents towards a particular answer.
Response bias — people may lie or exaggerate (e.g. about exercise habits).
Non-response bias — certain types of people may not respond to a survey, skewing the results.
Timing bias — collecting data at a particular time may miss certain groups (e.g. surveying a high street at 10 am on a weekday misses people who work).

graph TD
    A[Sources of Bias] --> B[Selection Bias]
    A --> C[Question Bias]
    A --> D[Response Bias]
    A --> E[Non-response Bias]
    A --> F[Timing Bias]
    B --> B1[Certain groups excluded from sample]
    C --> C1[Leading or confusing questions]
    D --> D1[People lie or exaggerate answers]
    E --> E1[Some groups do not respond]
    F --> F1[Data collected at unrepresentative time]

Summary

Data can be qualitative (descriptive) or quantitative (numerical).
Quantitative data is either discrete (counted) or continuous (measured).
Primary data is collected first-hand; secondary data comes from existing sources.
A sample is a subset of the population; it should be representative and unbiased.
Key sampling methods: random, systematic, stratified, quota, and convenience.
Stratified sampling requires a proportional calculation for each stratum.
Bias can arise from poor sampling, leading questions, or unrepresentative timing.

Exam Tip: When asked to suggest improvements to a data collection method, always consider whether the sample is large enough, whether it is representative, and whether any groups have been excluded. Mentioning specific sources of bias will gain you marks.

Data Types and Sampling Methods

Data Types and Sampling Methods

Types of Data

Quantitative Data: Discrete vs Continuous

Primary and Secondary Data

Populations and Samples

What Makes a Good Sample?

Sampling Methods

1. Random Sampling

2. Systematic Sampling

3. Stratified Sampling

Worked Example

4. Quota Sampling

5. Convenience (Opportunity) Sampling

Bias in Data Collection

Summary

More in Mathematics