Neural Network from Scratch
- 5 Devlogs
- 10 Total hours
A neural network built from scratch with pure NumPy and maths. It can correctly classify MNIST digits with >90% accuracy.
A neural network built from scratch with pure NumPy and maths. It can correctly classify MNIST digits with >90% accuracy.
The network is fully working, however its difficult to see it in action. That’s why (with the help of Claude) I created a Streamlit app, so that you can try different parameters for the network in the browser. Without having to install anything locally.
I also spent some time making the readme prettier and more informative.
Cleaned up a lot of the boilerplate inside main.py. The training loop is now inside the network itself and can be called with a single fit() call handling batching, epochs, and logging:
network.fit(X_train, y_train, learning_rate=0.1, batch_size=0.1)
I also renamed a few things that were always named imprecisely. run() is now forward(), and the old fit() (which only did a single gradient step) became epoch(). This made the code much easier to read.
The one-hot encoding is now its own class instead of a one-liner scattered in main.py. This was necessary because you need to decode predictions back to the original labels afterwards:
encoder = OneHotEncoder()
y = encoder.encode(digits.target)
encoder.decode(network.forward(X_test))
I also added a train_test_split utility so I don’t have to manually slice arrays anymore.
I modified the API so that cost and activation functions can easily be changed.
After opting for softmax activation at the output layer and changing the cost function to categorical cross-entropy, the model classified the numbers with 85.59% accuracy after 10.000 epochs. I’m surprised it works this well despite the small network and no regularization or tuning at all.
While interesting, the math is getting really complicated now, especially deriving the softmax function is still a mystery to me, i had to look up its derivative online.
I want to spend some more time understanding how the cost and the activation function “influence” each other before implementing an API to more easily train the network.
It’s learning! I tried the network with a small synthetic dataset and it learned the pattern flawlessly, at least when you don’t let it run for too long:
Below you can see the network’s performance (cost) at each epoch, it starts out strong - reducing the cost and getting better and better - until all of a sudden there is a massive spike and the cost goes way up again.
I have no idea why this is happening, but overall I’m extremely happy that the network is learning.
Next I’m going to fix this issue and then I want it to predict numbers from the MNIST dataset.
Wrote a basic API for modelling the network.
When untrained it initializes weights and biases random.
With these random values it is already capable of running a forward pass with some input data.
Next I want to implement backpropagation so that the network can actually learn patterns in data.