You are browsing as a guest. Sign up (or log in) to start making projects!

28taidan

@28taidan

Joined June 13th, 2026

  • 5Devlogs
  • 2Projects
  • 0Ships
  • 0Votes
I like LLMs, and when stuff works instead of breaking like it usually does.
Open comments for this post

5h 20m 19s logged

Having completed the MLP I described in my last devlog, I set out to try to follow the next tutorial in the list I found. However, after I listened to the description of it, which is basically just a model that can take in a database of words and produce more words that sound similar, I foolishly believed I could try to build something similar without following the tutorial. I did take the database of names from the tutorial, since I will need those to train the model.

I started by transforming the inputs (the previous letter in the word) into numbers, and then the output (a number) back into a letter. My first attempt just put in the ASCII code for the letter, and the rounded down output number. The first issue was I used the entire database to train, which took forever and quickly hit the maximum recursion depth, so I switched to using only the first 500 examples (using the entire database, combined with super high differences between the output and the expected output resulted in the initial loss being 1,934,198,245, compared to a loss of like 4 when just testing my MLP).

After that, I got it to actually finish training without crashing, however, the loss didn’t go down, and it always outputted a null character. I realized I had been really stupid, as the tanh function at the end of every neuron in the MLP shrinks the result between -1 and 1, so the result could only ever be the ASCII codes 0 or 1, which are null.

So instead, I changed it so instead of outputting the ASCII code directly, it outputs a 27 long array, each index representing a letter (and the last representing end of name), each value at an index represents the probability of that letter being chosen, so to decode the results it just has to find the highest probability, check that it’s not the last element, and add 97 to it’s index to get the ASCII codes (because the letter “a” is at 97, so if it’s index 0 and meant to be “a” it has to adjust). This led it to actually output letters, but it always outputted the same letter, as shown in the screenshot. The loss dropped rapidly and then plateaued at around 2000.

I found some bugs (gradient accumilating over time instead of resetting each epoch) and made some changes (instead of inputting an ASCII code, it inputs in the same format as the output), but nothing had any effect. After thinking, I realized that the model is just picking the most common letter, and always outputting it. That’s why the loss can’t get any lower than 2000, because that’s the limit of how good this strategy can get. However, I have no idea how to fix this issue, and somehow teach the model the meaning behind the letters, so I will accept defeat and watch the tutorial, then implement what I learned.

Having completed the MLP I described in my last devlog, I set out to try to follow the next tutorial in the list I found. However, after I listened to the description of it, which is basically just a model that can take in a database of words and produce more words that sound similar, I foolishly believed I could try to build something similar without following the tutorial. I did take the database of names from the tutorial, since I will need those to train the model.

I started by transforming the inputs (the previous letter in the word) into numbers, and then the output (a number) back into a letter. My first attempt just put in the ASCII code for the letter, and the rounded down output number. The first issue was I used the entire database to train, which took forever and quickly hit the maximum recursion depth, so I switched to using only the first 500 examples (using the entire database, combined with super high differences between the output and the expected output resulted in the initial loss being 1,934,198,245, compared to a loss of like 4 when just testing my MLP).

After that, I got it to actually finish training without crashing, however, the loss didn’t go down, and it always outputted a null character. I realized I had been really stupid, as the tanh function at the end of every neuron in the MLP shrinks the result between -1 and 1, so the result could only ever be the ASCII codes 0 or 1, which are null.

So instead, I changed it so instead of outputting the ASCII code directly, it outputs a 27 long array, each index representing a letter (and the last representing end of name), each value at an index represents the probability of that letter being chosen, so to decode the results it just has to find the highest probability, check that it’s not the last element, and add 97 to it’s index to get the ASCII codes (because the letter “a” is at 97, so if it’s index 0 and meant to be “a” it has to adjust). This led it to actually output letters, but it always outputted the same letter, as shown in the screenshot. The loss dropped rapidly and then plateaued at around 2000.

I found some bugs (gradient accumilating over time instead of resetting each epoch) and made some changes (instead of inputting an ASCII code, it inputs in the same format as the output), but nothing had any effect. After thinking, I realized that the model is just picking the most common letter, and always outputting it. That’s why the loss can’t get any lower than 2000, because that’s the limit of how good this strategy can get. However, I have no idea how to fix this issue, and somehow teach the model the meaning behind the letters, so I will accept defeat and watch the tutorial, then implement what I learned.

Replying to @28taidan

0
11
Open comments for this post

4h 47m 32s logged

I implemented an MLP, I watched the entire Micrograd explanation by Andrej Karpathy, which was really interesting. I followed along, and played around with tuning my model (that does absolutely nothing useful) to predict stuff better.
It has Value objects, which store the data, as well as gradients, and the operation/previous numbers used to create them, as well as all the operations. This can be used to calculate the gradient (or how much each Value object affected the last Value in the chain of operations), which is basically all you need for backpropogation (which is where it goes back and calculates how much each weight/bias affects the final output).
It also has classes for each neuron, which is a collection of Values that act as the inputs, the weights (which are randomly generated at the start), and the bias for that nueron.
These neurons make up layers, and the layers make up the MLP (multilayer perceptron). I created a couple arrays of inputs, and the desired output, which has the output for each input array. To train it, you have the model predict something, then calculate the loss (or the sum of how far away each prediction was squared), then do the backpropogation (go through each parameter, or each weight and bias), find out how much and in what direction it affected the loss function, and finally change each parameter slightly so it moves the loss in the right direction. As you can see in the screenshot, I got it down pretty low. This will be useful because when I am training my LLM, I will need to use a neural network like this, and get the loss down as low as possible so it predicts better words and produces better results.
Karpathy’s video is at https://www.youtube.com/watch?v=VMj-3S1tku0&list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ&index=2, it’s really quite amazing, I learned so much from it, and I haven’t even learned Calculus yet, which he states is a prerequisite in the description, I highly recommend if you’re interested. Next I will need to work on implementing transformers (which takes in the input (words), and derives meaning from them by encoding them into a vector).

I implemented an MLP, I watched the entire Micrograd explanation by Andrej Karpathy, which was really interesting. I followed along, and played around with tuning my model (that does absolutely nothing useful) to predict stuff better.
It has Value objects, which store the data, as well as gradients, and the operation/previous numbers used to create them, as well as all the operations. This can be used to calculate the gradient (or how much each Value object affected the last Value in the chain of operations), which is basically all you need for backpropogation (which is where it goes back and calculates how much each weight/bias affects the final output).
It also has classes for each neuron, which is a collection of Values that act as the inputs, the weights (which are randomly generated at the start), and the bias for that nueron.
These neurons make up layers, and the layers make up the MLP (multilayer perceptron). I created a couple arrays of inputs, and the desired output, which has the output for each input array. To train it, you have the model predict something, then calculate the loss (or the sum of how far away each prediction was squared), then do the backpropogation (go through each parameter, or each weight and bias), find out how much and in what direction it affected the loss function, and finally change each parameter slightly so it moves the loss in the right direction. As you can see in the screenshot, I got it down pretty low. This will be useful because when I am training my LLM, I will need to use a neural network like this, and get the loss down as low as possible so it predicts better words and produces better results.
Karpathy’s video is at https://www.youtube.com/watch?v=VMj-3S1tku0&list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ&index=2, it’s really quite amazing, I learned so much from it, and I haven’t even learned Calculus yet, which he states is a prerequisite in the description, I highly recommend if you’re interested. Next I will need to work on implementing transformers (which takes in the input (words), and derives meaning from them by encoding them into a vector).

Replying to @28taidan

0
10
Open comments for this post

2h 13m 32s logged

Working on UX improvements!

I fixed a bunch of bugs, for example overlapping events saying “I’m available” are combined into one, for simplicity. The idea is that each person goes in to put in their availability, so I made it that you choose a person and edit only their location/availability, so they don’t have to see stuff that’s not relevant to them (it gets greyed out).

Next, I am working on the core of the project, the planning logic. The idea is to build an algorithm to find the lowest amount of total hours everyone. I think I’d probably need some kind of greedy algorithm, but I’m not sure yet.

Working on UX improvements!

I fixed a bunch of bugs, for example overlapping events saying “I’m available” are combined into one, for simplicity. The idea is that each person goes in to put in their availability, so I made it that you choose a person and edit only their location/availability, so they don’t have to see stuff that’s not relevant to them (it gets greyed out).

Next, I am working on the core of the project, the planning logic. The idea is to build an algorithm to find the lowest amount of total hours everyone. I think I’d probably need some kind of greedy algorithm, but I’m not sure yet.

Replying to @28taidan

0
7
Open comments for this post

1h 55m 28s logged

I did some UI improvements and bug fixes to putting down a kid’s dropoff/pickup time, and also added location data to each dropoff/pickup point using Photon and OSRM, so I can estimate the travel time and when the kid has to leave to get there on time.

I did some UI improvements and bug fixes to putting down a kid’s dropoff/pickup time, and also added location data to each dropoff/pickup point using Photon and OSRM, so I can estimate the travel time and when the kid has to leave to get there on time.

Replying to @28taidan

0
1
Open comments for this post

51m 34s logged

Setting up a basic calendar view, as well as Vercel. I’m using Vercel’s Neon for storage because of it’s great free tier.

Setting up a basic calendar view, as well as Vercel. I’m using Vercel’s Neon for storage because of it’s great free tier.

Replying to @28taidan

0
2

Followers

Loading…