Cambridge Syllabus Notes

18.1 Artificial Intelligence (AI)

Back Propagation of Errors 🤖

Back propagation is the heart of training a neural network. Think of it like a game of “hot‑and‑cold” for a robot learning to throw a ball into a basket.

Forward pass: The robot (neuron) takes an input, multiplies it by a weight, adds a bias, and applies an activation function. The result is passed to the next layer.
Compute loss: After the final layer, we compare the network’s output $ \hat{y} $ with the true value $ y $ using a loss function, e.g. mean squared error $ L = \frac{1}{n}\sum (y - \hat{y})^2 $.
Backward pass: We calculate how much each weight contributed to the error. This is done using the chain rule of calculus, propagating the error gradient from the output layer back to the input layer.
Update weights: Each weight $ w $ is adjusted in the opposite direction of its gradient: $ w \leftarrow w - \eta \frac{\partial L}{\partial w} $, where $ \eta $ is the learning rate.

🔍 Analogy: Imagine you’re learning to play a piano. After each practice session, you listen to the music (output) and compare it with the sheet music (target). The difference (error) tells you which keys to adjust (weights) for the next practice.

Exam Tip: Remember the four main steps: forward pass, loss calculation, backward pass (gradient calculation), and weight update. Practice writing the gradient formulas for a simple two‑layer network.

Regression Methods in Machine Learning 📈

Regression is all about predicting a continuous value. Think of it as trying to guess the price of a house based on its size, location, and age.

Method	Formula	Example	Strengths
Linear Regression	$$y = \beta_0 + \beta_1 x$$	Predicting exam scores from study hours.	Simple, interpretable, fast.
Polynomial Regression	$$y = \beta_0 + \beta_1 x + \beta_2 x^2 + \dots + \beta_n x^n$$	Modeling growth curves (e.g., population over time).	Captures non‑linear trends.
Ridge Regression	$$\min_{\beta} \sum (y_i - \beta^T x_i)^2 + \lambda \\|\beta\\|^2$$	Predicting house prices with many correlated features.	Reduces overfitting, handles multicollinearity.
Lasso Regression	$$\min_{\beta} \sum (y_i - \beta^T x_i)^2 + \lambda \\|\beta\\|_1$$	Feature selection in medical diagnosis.	Sparsity, interpretable models.

Exam Tip: When asked to explain a regression method, state the objective function, mention any regularisation term, give a real‑world example, and list one key advantage. Practice sketching the loss surface for linear vs. ridge regression.

Show understanding of back propagation of errors and regression methods in machine learning

18.1 Artificial Intelligence (AI)

Back Propagation of Errors 🤖

Regression Methods in Machine Learning 📈

Revision