Devlog by @28taidan

@28taidan on BadGPT · about 2 months ago

4h 47m 32s logged

I implemented an MLP, I watched the entire Micrograd explanation by Andrej Karpathy, which was really interesting. I followed along, and played around with tuning my model (that does absolutely nothing useful) to predict stuff better.
It has Value objects, which store the data, as well as gradients, and the operation/previous numbers used to create them, as well as all the operations. This can be used to calculate the gradient (or how much each Value object affected the last Value in the chain of operations), which is basically all you need for backpropogation (which is where it goes back and calculates how much each weight/bias affects the final output).
It also has classes for each neuron, which is a collection of Values that act as the inputs, the weights (which are randomly generated at the start), and the bias for that nueron.
These neurons make up layers, and the layers make up the MLP (multilayer perceptron). I created a couple arrays of inputs, and the desired output, which has the output for each input array. To train it, you have the model predict something, then calculate the loss (or the sum of how far away each prediction was squared), then do the backpropogation (go through each parameter, or each weight and bias), find out how much and in what direction it affected the loss function, and finally change each parameter slightly so it moves the loss in the right direction. As you can see in the screenshot, I got it down pretty low. This will be useful because when I am training my LLM, I will need to use a neural network like this, and get the loss down as low as possible so it predicts better words and produces better results.
Karpathy’s video is at https://www.youtube.com/watch?v=VMj-3S1tku0&list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ&index=2, it’s really quite amazing, I learned so much from it, and I haven’t even learned Calculus yet, which he states is a prerequisite in the description, I highly recommend if you’re interested. Next I will need to work on implementing transformers (which takes in the input (words), and derives meaning from them by encoding them into a vector).