What is Statistics

Statistics is the science of collecting, organising, analysing, interpreting, and presenting data. It provides the tools to transform raw observations into meaningful conclusions, enabling evidence-based decision-making in virtually every field — from medicine and engineering to business and social policy.

A Brief History

~3000 BC — Ancient civilisations in Egypt, Babylon, and China conduct census-like counts for taxation and military planning
1662 — John Graunt publishes Natural and Political Observations Made upon the Bills of Mortality, founding modern demography
1713 — Jacob Bernoulli publishes Ars Conjectandi, formalising the law of large numbers
1805 — Adrien-Marie Legendre introduces the method of least squares for curve fitting
1812 — Pierre-Simon Laplace publishes Théorie analytique des probabilités
1900 — Karl Pearson develops the chi-squared test
1908 — William Sealy Gosset (pen name "Student") publishes the t-distribution
1925 — Ronald Fisher publishes Statistical Methods for Research Workers, shaping modern experimental design
1933 — Jerzy Neyman and Egon Pearson formalise hypothesis testing with Type I and Type II errors
1953 — The bootstrap and computational statistics begin to emerge with growing computer power
Today — Statistics underpins machine learning, data science, clinical trials, and public policy worldwide

Why Learn Statistics?

1. Data-Driven Decision Making

Statistics turns raw numbers into actionable insights. Whether you are evaluating a new medical treatment, optimising a marketing campaign, or assessing economic policy, statistical reasoning provides the framework for sound decisions.

2. Critical Thinking

Understanding statistics makes you a better consumer of information. You learn to question sample sizes, identify bias, distinguish correlation from causation, and spot misleading charts.

3. Foundation for Data Science and Machine Learning

Modern AI and machine learning algorithms are built on statistical principles — linear regression, Bayesian inference, probability distributions, and hypothesis testing are all core topics.

4. Universal Applicability

Statistics is used in:

Field	Example Application
Medicine	Clinical trials and drug efficacy testing
Business	Market research, A/B testing, demand forecasting
Engineering	Quality control and reliability analysis
Social sciences	Survey analysis, opinion polling
Sports	Player performance analytics (sabermetrics)
Government	Census data, economic indicators

Branches of Statistics

Statistics is broadly divided into two major branches:

Descriptive Statistics

Descriptive statistics summarise and organise data so it can be understood at a glance. Common tools include:

Measures of central tendency (mean, median, mode)
Measures of spread (range, variance, standard deviation)
Visual displays (histograms, box plots, bar charts)

Inferential Statistics

Inferential statistics use sample data to make generalisations about a larger population. Key techniques include:

Estimation (confidence intervals)
Hypothesis testing (t-tests, chi-squared tests)
Regression and prediction

Population  →  Sample  →  Analyse  →  Infer back to Population

Key Terminology

Term	Definition
Population	The complete set of all items of interest
Sample	A subset of the population selected for analysis
Parameter	A numerical measure describing a characteristic of a population (e.g., population mean μ)
Statistic	A numerical measure describing a characteristic of a sample (e.g., sample mean x̄)
Variable	A characteristic or attribute that can take different values
Data	The values collected through observation or measurement

Types of Data

By Nature

Type	Description	Examples
Quantitative	Numerical values that can be measured	Height, weight, temperature, income
Qualitative (Categorical)	Labels or categories	Gender, colour, nationality, satisfaction rating

By Measurement Scale

Scale	Properties	Examples
Nominal	Categories with no natural order	Blood type (A, B, AB, O), eye colour
Ordinal	Categories with a meaningful order but unequal intervals	Survey ratings (poor, fair, good, excellent)
Interval	Numerical with equal intervals but no true zero	Temperature in °C, calendar years
Ratio	Numerical with equal intervals and a true zero	Weight, height, income, age

The Statistical Process

A typical statistical investigation follows these steps:

Define the question — What do you want to learn?
Design the study — How will you collect data? (Experiment vs. observational study)
Collect data — Gather observations using surveys, experiments, or existing records
Explore and describe — Summarise data with descriptive statistics and visualisations
Analyse and infer — Apply inferential methods to draw conclusions
Interpret and communicate — Report findings clearly, noting limitations and uncertainties

Common Pitfalls

Warning: Statistics can be misused — intentionally or accidentally. Watch out for these:

Selection bias — The sample does not represent the population
Confounding variables — An unmeasured variable influences both the explanatory and response variables
Correlation ≠ Causation — Two variables moving together does not mean one causes the other
Misleading graphs — Truncated axes, cherry-picked scales, or 3D effects that distort perception
Small sample sizes — Drawing broad conclusions from too few observations

Summary

Statistics is the science of learning from data. It comprises descriptive methods (summarising data) and inferential methods (drawing conclusions about populations from samples). Understanding key terminology — population, sample, parameter, statistic — and the different types of data is essential groundwork for every topic that follows in this course.