Neural Network from Scratch

A fully functional neural network built using nothing but NumPy, capable of classifying MNIST handwritten digits with 90% test accuracy.
My goal was actually understanding what happens under the hood.
My implementation includes backpropagation from scratch, sigmoid and softmax activations, binary and categorical cross-entropy loss functions and mini-batch gradient descent. I wrapped all of this in an easy to use API:

network = Network(
    [
        Layer(64, activation="input"),
        Layer(50),
        Layer(30),
        Layer(10, activation="softmax"),
    ],
    loss="categorical_crossentropy",
)
network.fit(X_train, y_train, epochs=8000, learning_rate=1, batch_size=0.1)

The math and backpropagation were the hardest parts, I derived the gradients of the cost function by hand and backpropagation didn’t really make sense to me at first.
I started with mean squared error, and the gradients exploded quickly. Switching to cross-entropy fixed it.
I’m really proud that I now understand why these networks are able to learn, seeing your own network learn feels amazing.

You can try it out on MNIST yourself and play with the parameters in your browser without installing anything at https://neural-network-from-scratch-stockifab.streamlit.app.

5 devlogs
10h
13.41x multiplier
135 Stardust

Try project → See source code →

Open comments for this post

@stockifab on Neural Network from Scratch · about 2 months ago

1h 19m 11s logged

Demo + Readme

The network is fully working, however its difficult to see it in action. That’s why (with the help of Claude) I created a Streamlit app, so that you can try different parameters for the network in the browser. Without having to install anything locally.

I also spent some time making the readme prettier and more informative.

Open comments for this post

@stockifab on Neural Network from Scratch · about 2 months ago

1h 10m 12s logged

Training API + OneHotEncoder

Cleaned up a lot of the boilerplate inside main.py. The training loop is now inside the network itself and can be called with a single fit() call handling batching, epochs, and logging:

network.fit(X_train, y_train, learning_rate=0.1, batch_size=0.1)

I also renamed a few things that were always named imprecisely. run() is now forward(), and the old fit() (which only did a single gradient step) became epoch(). This made the code much easier to read.

The one-hot encoding is now its own class instead of a one-liner scattered in main.py. This was necessary because you need to decode predictions back to the original labels afterwards:

encoder = OneHotEncoder()
y = encoder.encode(digits.target)

encoder.decode(network.forward(X_test))

I also added a train_test_split utility so I don’t have to manually slice arrays anymore.

Open comments for this post

@stockifab on Neural Network from Scratch · about 2 months ago

4h 17m 22s logged

Softmax Activation and Digits

I modified the API so that cost and activation functions can easily be changed.

After opting for softmax activation at the output layer and changing the cost function to categorical cross-entropy, the model classified the numbers with 85.59% accuracy after 10.000 epochs. I’m surprised it works this well despite the small network and no regularization or tuning at all.

While interesting, the math is getting really complicated now, especially deriving the softmax function is still a mystery to me, i had to look up its derivative online.

I want to spend some more time understanding how the cost and the activation function “influence” each other before implementing an API to more easily train the network.

Open comments for this post

@stockifab on Neural Network from Scratch · about 2 months ago

1h 28m 13s logged

Backpropagation

It’s learning! I tried the network with a small synthetic dataset and it learned the pattern flawlessly, at least when you don’t let it run for too long:

Below you can see the network’s performance (cost) at each epoch, it starts out strong - reducing the cost and getting better and better - until all of a sudden there is a massive spike and the cost goes way up again.

I have no idea why this is happening, but overall I’m extremely happy that the network is learning.

Next I’m going to fix this issue and then I want it to predict numbers from the MNIST dataset.

Open comments for this post

@stockifab on Neural Network from Scratch · 2 months ago

1h 48m 59s logged

Network Architecture + Forward Pass

Wrote a basic API for modelling the network.
When untrained it initializes weights and biases random.
With these random values it is already capable of running a forward pass with some input data.

Next I want to implement backpropagation so that the network can actually learn patterns in data.