Lecture 02 - Vanilla implementation of gradient descent

MachineLearningCourse.Lecture02Module
Lecture02

Vanilla implementation of gradient descent.

Available Functions

  • demo(): Gradient descent demo for 5x5 pixel symbol recognition

Usage

using MachineLearningCourse
Lecture02.demo()
source
MachineLearningCourse.Lecture02.compute_average_gradientsMethod
compute_average_gradients(W, b, X, Y)

Compute average gradients across all training samples for batch gradient descent.

Performs gradient computation for each sample and averages the results:

  • For each sample (xi, yi): compute ∇Wi, ∇bi
  • Return average: (1/N) * Σ(∇Wi), (1/N) * Σ(∇bi)

Arguments

  • W::Matrix{Float32}: Weight matrix (noutputs × ninputs)
  • b::Vector{Float32}: Bias vector (n_outputs,)
  • X::Vector{Vector{Float32}}: Training input data (N samples)
  • Y::Vector{Vector{Float32}}: Training target data (N samples)

Returns

  • Tuple{Matrix{Float32}, Vector{Float32}}: (avg∇W, avg∇b)
    • avg_∇W: Average weight gradients
    • avg_∇b: Average bias gradients
source
MachineLearningCourse.Lecture02.compute_gradientsMethod
compute_gradients(W, b, x, y)

Compute gradients ∂ℒ/∂W and ∂ℒ/∂b for a single sample using backpropagation.

Calculates gradients using the chain rule:

  • ∂ℒ/∂W = δ * x^T where δ = ∂ℒ/∂â
  • ∂ℒ/∂b = δ

Arguments

  • W::Matrix{Float32}: Weight matrix (noutputs × ninputs)
  • b::Vector{Float32}: Bias vector (n_outputs,)
  • x::Vector{Float32}: Input vector for sample
  • y::Vector{Float32}: Target output vector for sample

Returns

  • Tuple{Matrix{Float32}, Vector{Float32}}: (∇W, ∇b)
    • ∇W: Weight gradients (same size as W)
    • ∇b: Bias gradients (same size as b)
source
MachineLearningCourse.Lecture02.demoFunction
demo()

Demonstration of gradient descent on 5x5 digit recognition.

Loads training data, randomly initializes weights and biases, and uses gradient descent to minimize total loss. Prints initial and final loss values.

Example

demo()  # Uses sample data file
source
MachineLearningCourse.Lecture02.gradient_descent!Method
gradient_descent!(W, b, X, Y)

Optimize neural network parameters using batch gradient descent.

Implements gradient descent with:

  • Maximum of 10,000 iterations
  • Compute average gradients (∇W,∇b) across all samples
  • Stops when ‖(∇W,∇b)‖ < tolerance 1.0e-3
  • Update parameters: W ← W - η * ∇W, b ← b - η * ∇b with learning rate η = 0.1

For each iteration:

  1. Compute average gradients across all training samples
  2. Calculate gradient norm for convergence checking
  3. Print progress information
  4. Check convergence criterion
  5. Update parameters using gradient descent rule

Arguments

  • W::Matrix{Float32}: Weight matrix (noutputs × ninputs), modified in-place
  • b::Vector{Float32}: Bias vector (n_outputs,), modified in-place
  • X::Vector{Vector{Float32}}: Training input data
  • Y::Vector{Vector{Float32}}: Training target data (one-hot encoded)
source
MachineLearningCourse.Lecture02.gradient_normMethod
gradient_norm(∇W, ∇b)

Compute the Euclidean norm of the combined gradient vector.

Flattens and concatenates weight and bias gradients into a single vector, then computes ‖∇‖ = √(‖∇W‖² + ‖∇b‖²) for convergence monitoring.

Arguments

  • ∇W::Matrix{Float32}: Weight gradients
  • ∇b::Vector{Float32}: Bias gradients

Returns

  • Float32: Euclidean norm of the combined gradient vector
source
MachineLearningCourse.Lecture02.one_hot_encodeMethod
one_hot_encode(label, num_classes)

Convert class labels to one-hot vectors for classification.

Arguments

  • label::Int: Class label (1-indexed)
  • num_classes::Int: Total number of classes

Returns

  • Vector{Float32}: One-hot encoded vector

Example

one_hot_encode(3, 10)  # Returns [0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
source
MachineLearningCourse.Lecture02.read_dataMethod
read_data(file_path)

Read 5x5 digit training data from a text file.

File format: Each digit consists of 6 lines:

  • 5 lines of 5 space-separated Float32 values (5x5 pixel grid)
  • 1 line with the digit label (0-9)

Arguments

  • file_path::String: Path to the data file

Returns

  • Tuple{Vector{Vector{Float32}}, Vector{Vector{Float32}}}: (X, Y)
    • X: Input vectors (each vector has 25 elements from 5x5 grid)
    • Y: One-hot encoded target vectors (10 classes, 1-indexed)

Example

X, Y = read_data("5x5digits.txt")
# X[1] contains 25 pixel values for first digit
# Y[1] contains one-hot vector for first digit's class
source
MachineLearningCourse.Lecture02.total_lossMethod
total_loss(W, b, X, Y)

Compute total Mean Squared Error loss across all training samples.

For each sample, performs forward pass and computes loss:

  • Forward pass: â = W * a + b
  • Sample loss: ℒ(y, â) = ‖â - y‖²
  • Total loss: Σ ℒ(yi, âi) over all samples

Arguments

  • W::Matrix{Float32}: Weight matrix (noutputs × ninputs)
  • b::Vector{Float32}: Bias vector (n_outputs,)
  • X::Vector{Vector{Float32}}: Training input data
  • Y::Vector{Vector{Float32}}: Training target data (one-hot encoded)

Returns

  • Float32: Total loss across all training samples
source
MachineLearningCourse.Lecture02.ℒMethod
ℒ(y, â)

Mean Squared Error loss function: ℒ = ‖â - y‖².

Arguments

  • y::Vector{Float32}: True target values
  • â::Vector{Float32}: Computed values

Returns

  • Float32: MSE loss value
source
MachineLearningCourse.Lecture02.∂ℒ_∂âMethod
∂ℒ_∂â(y, â)

Gradient of MSE loss with respect to computed activations: ∂ℒ/∂â = 2(â - y).

Arguments

  • y::Vector{Float32}: True target values
  • â::Vector{Float32}: Computed values

Returns

  • Vector{Float32}: Gradient vector ∂ℒ/∂â
source