chezburger - Stardance

@chezburger on CS50P Final Project: NeuroLab · about 1 month ago

2h 9m 39s logged

Devlog 4 — Phase 1 Complete!
Picking up where I left off, I added softmax to the output layer instead of ReLU, which converts the 10 outputs into actual probabilities that sum to 1. I also switched from MSE loss to cross entropy loss, which is the standard for classification problems since it heavily penalizes confident wrong predictions.

I implemented batch training, splitting the 60,000 image dataset into chunks of 256 instead of feeding all of it through at once. This made training way faster and way more effective. Accuracy jumped to over 90% in just a couple of cycles instead of taking 100 cycles like before.

I also added training accuracy printed alongside the loss each cycle so I could watch both improve together in real time. Then I built save and load functionality so NeuroLab doesn’t have to retrain from scratch every time, plus auto load logic that checks if a saved model already exists and loads it instead of retraining.
Final results from the full 100 cycle run. Training accuracy hit 100% and stayed there, and test accuracy on 10,000 images NeuroLab had never seen came out to 98.02%.

Last thing today was writing test_project.py with full pytest coverage. I wrote 9 test functions covering sigmoid, relu, relu_derivative, preactivation, init_weights, mse_loss, simple_loss, softmax, and cross_entropy_loss, with multiple test cases each. All 9 passed.

That officially wraps up Phase 1 of NeuroLab. Next up is Phase 2. A second dataset, more optimizers, and regularization with dropout and L2.

Open comments for this post

@chezburger on CS50P Final Project: NeuroLab · about 1 month ago

1h 40m logged

Devlog 3 — 93.69% Accuracy Today was actually insane for NeuroLab. Accuracy jumped from 10.9% to 93.69%! What I changed: I switched from the small 10k image dataset to the full 60k MNIST training dataset, which made a huge difference because the project had way more examples to learn from. I also added softmax to the output layer instead of ReLU, which converts the 10 outputs into actual probabilities that sum to 1, so now NeuroLab can say something like “97% confident this is a 1” instead of just outputting random numbers. Also added a preactivation function, switched to pandas for loading CSV files because np.loadtxt was taking way too long for my patience, and wrote an accuracy function to measure how well it learned. The training took so long. 100 cycles on 60,000 images is too much for my Ryzen 3 to handle. But the results were worth it. Final loss was 0.00981, and accuracy on 10,000 test images was 93.69%.

Open comments for this post

@chezburger on CS50P Final Project: NeuroLab · about 2 months ago

1h 53m 16s logged

Devlog 2 — Training on MNIST
Today I finally got NeuroLab training on real data. I loaded the MNIST dataset, which has 10,000 handwritten digit images, and split it into 8,000 for training and 2,000 for testing.
I wrote a load_mnist function that reads the CSV file, splits the labels from the pixels, and normalizes the pixel values by dividing by 255.0 so they are between 0 and 1. I also wrote a one_hot function that converts digit labels like 5 into arrays like [0,0,0,0,0,1,0,0,0,0] so the network can compare its 10 outputs to the correct answer.
The first training run was a disaster. The loss started at over 1000, which means the network was completely wrong. Turns out the weight initialization was the problem. Switched to He initialization by multiplying the random weights by np.sqrt(2.0 / input_size), and the starting loss dropped all the way to 0.14.
After 100 training cycles, the loss got down to 0.03!

Open comments for this post

@chezburger on CS50P Final Project: NeuroLab · about 2 months ago

3h 59m 33s logged

Today I started building NeuroLab, a neural network completely from scratch in Python using only NumPy. No TensorFlow, no PyTorch, just pure math and matrices. This is also my CS50P final project and my main Stardance project for the whole summer!
What I built today: I started with the core building blocks. First was sigmoid, which squashes any number between 0 and 1. Then ReLU, which returns 0 for negative numbers and the input itself for positive numbers. I also wrote relu_derivative for backpropagation, init_weights using random NumPy arrays, and mse_loss, which measures how wrong the network is by averaging squared differences.
Then I built the actual NeuralNetwork class with init to set up weights and biases for each layer, forward_pass to run data through the network, forward_cache, which does the same thing but saves intermediate values so backprop can use them, and backpropagation, which adjusts the weights based on how wrong the prediction was.
Problems I ran into: relu_derivative was crashing on arrays, so I fixed it by using .astype on NumPy boolean arrays. My forward_cache loop was also zipping the wrong lists. The worst bug was the gradient direction being completely flipped; the network was doing gradient ascent instead of descent, so the loss kept getting bigger instead of smaller. Fixed it by switching the subtraction order in the loss function. Gradients were also exploding with large batches, so I divided all gradients by m, which is the number of samples. Here is some of the code I wrote (It might be hard to see):