You are viewing a free preview of this lesson.
Subscribe to unlock all 10 lessons in this course and every other course on LearningBro.
What is Machine Learning
What is Machine Learning
Machine learning (ML) is a branch of artificial intelligence that enables computers to learn from data and make decisions or predictions without being explicitly programmed for every scenario. Instead of writing rules by hand, you provide the machine with examples and let it discover the underlying patterns.
A Brief History
- 1943 — Warren McCulloch and Walter Pitts create a mathematical model of a biological neuron
- 1950 — Alan Turing publishes Computing Machinery and Intelligence and proposes the Turing Test
- 1957 — Frank Rosenblatt builds the Perceptron, the first trainable neural network
- 1959 — Arthur Samuel coins the term "machine learning" while working on a checkers-playing programme at IBM
- 1967 — The nearest-neighbour algorithm is developed, enabling basic pattern recognition
- 1979 — Students at Stanford build the Stanford Cart, a self-driving vehicle using ML
- 1997 — IBM Deep Blue defeats Garry Kasparov at chess
- 2006 — Geoffrey Hinton coins the term "deep learning" and demonstrates effective training of deep neural networks
- 2012 — AlexNet wins the ImageNet competition, sparking the deep learning revolution
- 2014 — Generative Adversarial Networks (GANs) are introduced by Ian Goodfellow
- 2016 — Google DeepMind's AlphaGo defeats the world champion Go player Lee Sedol
- 2017 — The Transformer architecture is published in Attention Is All You Need
- 2020 — GPT-3 demonstrates remarkable language generation capabilities
- Today — Machine learning is embedded in virtually every industry — from healthcare and finance to entertainment and transport
What is Machine Learning?
At its core, machine learning is about learning from data. A traditional programme follows explicit instructions written by a developer. A machine learning model, by contrast, is given data and a learning algorithm — the algorithm finds patterns in the data and encodes them into a model that can make predictions on new, unseen data.
Traditional Programming vs Machine Learning
| Traditional Programming | Machine Learning |
|---|---|
| Input: Data + Rules | Input: Data + Expected Output |
| Output: Results | Output: Rules (a model) |
| Developer writes the logic | Algorithm discovers the logic |
| Works well for well-defined problems | Works well for complex, pattern-rich problems |
A Simple Example
Suppose you want to classify emails as spam or not-spam:
- Traditional approach: Write rules like "if the email contains 'free money', mark as spam"
- ML approach: Give the algorithm thousands of labelled emails (spam / not-spam), and it learns the patterns that distinguish them
Types of Machine Learning
Machine learning is broadly divided into three categories:
1. Supervised Learning
The algorithm learns from labelled data — each training example has an input and a known correct output. The goal is to learn a mapping from inputs to outputs.
| Task | Description | Example |
|---|---|---|
| Classification | Predict a discrete category | Email spam detection, image recognition |
| Regression | Predict a continuous value | House price prediction, temperature forecasting |
2. Unsupervised Learning
The algorithm learns from unlabelled data — there are no correct answers provided. The goal is to discover hidden structure in the data.
| Task | Description | Example |
|---|---|---|
| Clustering | Group similar data points together | Customer segmentation, document grouping |
| Dimensionality Reduction | Reduce the number of features while preserving structure | Data visualisation, noise removal |
3. Reinforcement Learning
The algorithm learns by interacting with an environment — it takes actions, receives rewards or penalties, and learns to maximise cumulative reward over time.
| Concept | Description |
|---|---|
| Agent | The learner that takes actions |
| Environment | The world the agent interacts with |
| Reward | Feedback signal after each action |
| Policy | Strategy the agent learns to follow |
Examples: game-playing AI (AlphaGo), robotics, autonomous driving, recommendation systems.
Key Terminology
| Term | Definition |
|---|---|
| Feature | An individual measurable property of the data (e.g., age, height, income) |
| Label / Target | The variable you want to predict (e.g., price, category) |
| Training Set | Data used to train the model |
| Test Set | Data held out to evaluate the model's performance on unseen data |
| Model | The mathematical representation learned from data |
| Hyperparameter | A setting configured before training (e.g., learning rate, number of trees) |
| Overfitting | Model memorises training data and performs poorly on new data |
| Underfitting | Model is too simple to capture the patterns in the data |
| Generalisation | The model's ability to perform well on new, unseen data |
The Machine Learning Pipeline
Every machine learning project follows a similar workflow:
- Define the problem — What are you trying to predict or discover?
- Collect data — Gather relevant data from databases, APIs, files, or web scraping
- Prepare data — Clean, transform, and engineer features
- Choose a model — Select an appropriate algorithm for the task
- Train the model — Fit the model to the training data
- Evaluate the model — Measure performance on the test set
- Tune and improve — Adjust hyperparameters, add features, try different algorithms
- Deploy — Put the model into production to make predictions on new data
Python Libraries for Machine Learning
| Library | Purpose |
|---|---|
NumPy |
Numerical computing and array operations |
Pandas |
Data manipulation and analysis |
Matplotlib / Seaborn |
Data visualisation |
Scikit-Learn |
Classical machine learning algorithms, preprocessing, evaluation |
TensorFlow |
Deep learning framework (Google) |
PyTorch |
Deep learning framework (Meta) |
XGBoost / LightGBM |
Gradient boosting libraries for tabular data |
A First ML Example in Python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
# Load a built-in dataset
iris = load_iris()
X, y = iris.data, iris.target
# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Create and train a model
model = DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)
# Make predictions and evaluate
predictions = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, predictions):.2f}")
Real-World Applications
Healthcare
- Disease diagnosis from medical images
- Drug discovery and molecular property prediction
- Patient readmission risk prediction
Finance
- Credit scoring and loan default prediction
- Fraud detection in transactions
- Algorithmic trading strategies
Technology
- Recommendation engines (Netflix, Spotify, Amazon)
- Natural language processing (chatbots, translation)
- Computer vision (facial recognition, autonomous vehicles)
Science
- Protein structure prediction (AlphaFold)
- Climate modelling and weather forecasting
- Particle physics and astronomical surveys
When to Use Machine Learning
Machine learning is not always the right tool. Consider ML when:
- The problem involves complex patterns that are hard to define with rules
- You have sufficient labelled or unlabelled data
- The pattern you want to learn is relatively stable over time
- A small improvement in accuracy has significant value
Avoid ML when:
- A simple rule-based system would suffice
- You do not have enough data
- The problem requires perfect explainability (some models are "black boxes")
- The cost of errors is extremely high and no margin for mistakes exists
Summary
Machine learning enables computers to learn from data rather than following explicit instructions. It is broadly divided into supervised learning (learning from labelled data), unsupervised learning (discovering hidden structure), and reinforcement learning (learning through interaction). The ML pipeline involves defining a problem, collecting and preparing data, training and evaluating a model, and deploying it to production. Python and its rich ecosystem — Scikit-Learn, TensorFlow, PyTorch — make machine learning accessible to beginners and experts alike.