Ship ✨ Blessed

@stockifab on Word Embeddings · 8 days ago

Word embeddings are a way to store the meaning of word (such as “plane” or “dog”) in a large vector (= a long list of numbers)

These vectors can then be used to find words similar to each other. But they can also be used to calculate with English words, e.g.:

dog - bark + meow –> cat

(in fact, you can try this very example on the demo website, or come up with your own calculation)

In order to retrieve these embeddings, I created a model that reads through billions of words to understand how they are used. It looks at the words appearing close to each other and assumes all words with the same nearby words must have similar meaning.

This retrieval of the embeddings was the hardest part: My first model was okayish and could find related words, but I could not yet calculate with them like in the example before. For the final version I used a much larger dataset, with thought-out pre-processing steps, which was more complicated then I thought (You need to ensure the words you are grouping actually belong together, so you may only group within individual sentences, groups mustn’t overlap anything inside brackets, and you can’t make a group with words you filtered in pre-processing at the same time you shouldn’t skip any words, …)

Additionally the challenge with a larger dataset was, that it didnt fit into my memory, so I had to read it in chunks and train the model incrementally. The training of the final model took over 50 hours.

7 devlogs
28h
16.45x multiplier
556 Stardust

Try project → See source code →

Open comments for this post

@stockifab on Word Embeddings · 8 days ago

1h 42m 12s logged

Finish Demo Website (Pages 4-8)

I completed the last part of the demo website including the most exciting part about the word arithmetic.

Open comments for this post

@stockifab on Word Embeddings · 11 days ago

4h 20m 15s logged

Embedding Demo Website (Pages 1-3)

I really like the way the embeddings turned out. That’s why I want to make a website to make word embeddings as accessible and understandable as possible.

I’m assuming no prior knowledge from the visitors, and try to interactively guide them through the world of embeddings, with each slide explaining something about embeddings or being interactive to try it yourself. This time i finished slides 1-3, with the last one being the interactive one.

Since it should feel engaging and alive, I focus a lot on subtle animations and micro-interactions. I’ve never built a website with heavy animations before so this is the perfect opportunity to try GSAP!

Open comments for this post

@stockifab on Word Embeddings · 20 days ago

3h 35m 22s logged

Maths with English words

With the new dataset, training took a lot longer. My computer worked for around 50 hours to run through 26GB of training data.

The long wait really payed off though, the model is so much better than the previous model and it is finally capable of doing the embedding arithmetic I’ve been chasing the whole time.

Embedding arithmetic is so fun, here are some examples:

throw - throwing + running -> run (-ing form)
better - good + friendly -> friendlier (comparasion)
woman - man + king -> queen (man compared to woman is like king compared to queen)
germany - berlin + paris (capitals)

The model learned a lot more syntactic and semantic meaning than I had expected.

Next I want to build a web app, so that you can play with the embeddings yourself.

Open comments for this post

@stockifab on Word Embeddings · 27 days ago

7h 4m 21s logged

New Dataset + Improved Pre-Processing

In an effort to increase the model’s performance I switched from the previous model’s Wikipedia Simple English dataset to a subset of the Fine Web Edu dataset. As an additional measure to improve the results I decided to use a larger context of 4 words left and right to the root word.

I also completely re-wrote all the preprocessing steps. Even though the dataset is high-quality, language is very nuanced and needs filtering. But since words next to each other (4 words before and after the root word) get grouped, filtering an entire word means the group mustn’t be formed (as information is now missing), on the other hand removing punctuation from the end of a word should still allow that word to appear in a group.
Overall I filter/handle:

quotes
brackets
non-English Characters
apostrophes
other special characters (dashes, underscores, etc.)

Another challenge is ensuring that grouped words actually belong together. In the previous dataset I simply grouped the words next to each other in a giant blob of text. This time I only group words from the same text source (the same article, document, …) and only within the sentences of that source. This ensures groups don’t span multiple sentences.

This taught me a lot about working with larger amounts of data - I couldn’t load everything in memory as it would be too large, but had to read from disk in batches to process the data. After ~50 min of leaving my computer untouched I had the final dataset with ~26GB (~18GB compressed using word IDs) of filtered word groups.

Next, I will have to find a way to determine unrelated words, so that I can start training the third iteration of the model.

Open comments for this post

@stockifab on Word Embeddings · 29 days ago

3h 11m 50s logged

New (worse) model

In an effort to improve on the previous model, I created a new one. Now it uses separate weights for the context instead of re-using the embedding weights in the context. The new model has 30.000.000 parameters which is double the previous one.

Yet unfortunately, the results I got were disappointingly miserable: In contrast to the previous model, it doesn’t even group related words. According to the embeddings, the word “cat” is very similar in meaning to “motor” or “caution” 😂😭

I don’t know why it’s performing so poorly, but I’m curious to find out.

Open comments for this post

@stockifab on Word Embeddings · 30 days ago

4h 20m 59s logged

First Embedding Model

Using the data obtained previously I could get my first embeddings 🎉

The model (image) has 15.000.000 parameters and took roughly 1/2 hour to train. Unfortunately, the performance is below my expectations: It’s not yet good enough so that “uncle - aunt + mum” would output “dad”.

However, the embeddings definitely captured some meaning: For example, the vector of “android” is close to “ios” and to “smartphones”. The vector of “cat” is close to the words “goat”, “monkeys” and “snakes”.

I want to improve this model significantly until it’s good enough to meaningfully calculate with the embedding vectors.

Open comments for this post

@stockifab on Word Embeddings · about 1 month ago

3h 54m 25s logged

Data preprocessing

Before the model can train to retrieve the embeddings, it needs data. I found a dataset on Kaggle with the text of 249.396 wikipedia articles.

Today I did some preprocessing, first I removed any non-english words and delete special characters,
then I determined the 50.000 most frequent words, and limited my vocabulary to these words.
To compute the embeddings, I took the left and right words of each word in the data and put them into a pandas data frame along with two other unrelated words: (left word, center word, right word, unrelated word 1, unrelated word 2)

The final dataset has 24.780.670 rows, so I likely cant train on the full dataset on my computer.

The goal is for the model, when presented with the center word to predict which of the other words are similar to it which it can only do by learning the meaning of each word - which is what we want.

Ship ✨ Blessed

@stockifab on Garden Calendar · about 1 month ago

Garden Calendar is a website that shows you when to sow the plants for your garden.

A calendar shows you when to sow indoors, outdoors, when the plants can be harvested and using filtering options, you can quickly find the plant you are looking for. Furthermore Garden Calendar gives you useful gardening tips for each plant in its database.

I built this project because I wanted to refresh my Angular skills, in hindsight, this project wasn’t ideal for
this because I didn’t get to use a lot of Angular features.
Nevertheless, it wasn’t for nothing, and I learnt something along the way.

5 devlogs
10h
15.86x multiplier
197 Stardust

Try project → See source code →

Open comments for this post

@stockifab on Garden Calendar · about 1 month ago

35m 49s logged

GitHub Pages Deployment

I deployed the app to GitHub pages.

I encountered one issue: The images were not loading, because GitHub page URLs end with the name of the repository, so the root of the app is not at / but on /<Repository-Name>. The path of my images were preceded with a / which is incorrect when the root is not at /. Removing the / resolved the issue.

Open comments for this post

@stockifab on Garden Calendar · about 1 month ago

2h 19m 6s logged

Plant Details & UI Tweaks

Plants can now be clicked on, showing a details menu on the right with details and a tip for growing the plant.

I had troubles with the CSS of the legend at the bottom right corner which is positioned absolutely: The library used to make the panels resizable sets position: relative during dragging, causing the legend to be in an incorrect position only during dragging. I fixed the issue with some clever div wrapping.

Open comments for this post

@stockifab on Garden Calendar · about 1 month ago

3h 4m 34s logged

Bookmarking + Refactoring

Users can now bookmark the plants they want to see. Bookmarked plants save in localStorage.

Moreover, there is now an empty state when there are no plants to show due to filters.

I also did some refactoring and simplified the code here and there.

For my Angular learning goal: I learnt how to use effect(), which seems to do basically the same as the useEffect() hook in React.

Open comments for this post

@stockifab on Garden Calendar · about 1 month ago

2h 32m 10s logged

Calendar Rebuild

I rewrote the calendar from an SVG graphic to a table, and added some basic filtering.

Since my goal with this project is to learn Angular this was a nice opportunity to try signal forms. I also created a service for plant-related utility functions.

Next I want to allow users to only show plants they are interested in.

Open comments for this post

@stockifab on Garden Calendar · about 1 month ago

1h 48m 30s logged

Gantt Diagram

I want to use a Gantt Diagram to show when to sow and when to harvest plants. I decided to make this chart with a generated SVG graphic but that was a mistake: Calculating the position of each element (text, rectangle, line) is really fragile and styling the diagram further is painful.

I want to re-do the diagram with a html table, which can be styled a lot easier.

Ship Changes requested

@stockifab on Neural Network from Scratch · about 2 months ago

Neural Network from Scratch

A fully functional neural network built using nothing but NumPy, capable of classifying MNIST handwritten digits with 90% test accuracy.
My goal was actually understanding what happens under the hood.
My implementation includes backpropagation from scratch, sigmoid and softmax activations, binary and categorical cross-entropy loss functions and mini-batch gradient descent. I wrapped all of this in an easy to use API:

network = Network(
    [
        Layer(64, activation="input"),
        Layer(50),
        Layer(30),
        Layer(10, activation="softmax"),
    ],
    loss="categorical_crossentropy",
)
network.fit(X_train, y_train, epochs=8000, learning_rate=1, batch_size=0.1)

The math and backpropagation were the hardest parts, I derived the gradients of the cost function by hand and backpropagation didn’t really make sense to me at first.
I started with mean squared error, and the gradients exploded quickly. Switching to cross-entropy fixed it.
I’m really proud that I now understand why these networks are able to learn, seeing your own network learn feels amazing.

You can try it out on MNIST yourself and play with the parameters in your browser without installing anything at https://neural-network-from-scratch-stockifab.streamlit.app.

5 devlogs
10h
13.41x multiplier
135 Stardust

Try project → See source code →

Open comments for this post

@stockifab on Neural Network from Scratch · about 2 months ago

1h 19m 11s logged

Demo + Readme

The network is fully working, however its difficult to see it in action. That’s why (with the help of Claude) I created a Streamlit app, so that you can try different parameters for the network in the browser. Without having to install anything locally.

I also spent some time making the readme prettier and more informative.

Open comments for this post

@stockifab on Neural Network from Scratch · about 2 months ago

1h 10m 12s logged

Training API + OneHotEncoder

Cleaned up a lot of the boilerplate inside main.py. The training loop is now inside the network itself and can be called with a single fit() call handling batching, epochs, and logging:

network.fit(X_train, y_train, learning_rate=0.1, batch_size=0.1)

I also renamed a few things that were always named imprecisely. run() is now forward(), and the old fit() (which only did a single gradient step) became epoch(). This made the code much easier to read.

The one-hot encoding is now its own class instead of a one-liner scattered in main.py. This was necessary because you need to decode predictions back to the original labels afterwards:

encoder = OneHotEncoder()
y = encoder.encode(digits.target)

encoder.decode(network.forward(X_test))

I also added a train_test_split utility so I don’t have to manually slice arrays anymore.

Open comments for this post

@stockifab on Neural Network from Scratch · about 2 months ago

4h 17m 22s logged

Softmax Activation and Digits

I modified the API so that cost and activation functions can easily be changed.

After opting for softmax activation at the output layer and changing the cost function to categorical cross-entropy, the model classified the numbers with 85.59% accuracy after 10.000 epochs. I’m surprised it works this well despite the small network and no regularization or tuning at all.

While interesting, the math is getting really complicated now, especially deriving the softmax function is still a mystery to me, i had to look up its derivative online.

I want to spend some more time understanding how the cost and the activation function “influence” each other before implementing an API to more easily train the network.

Open comments for this post

@stockifab on Neural Network from Scratch · about 2 months ago

1h 28m 13s logged

Backpropagation

It’s learning! I tried the network with a small synthetic dataset and it learned the pattern flawlessly, at least when you don’t let it run for too long:

Below you can see the network’s performance (cost) at each epoch, it starts out strong - reducing the cost and getting better and better - until all of a sudden there is a massive spike and the cost goes way up again.

I have no idea why this is happening, but overall I’m extremely happy that the network is learning.

Next I’m going to fix this issue and then I want it to predict numbers from the MNIST dataset.

Open comments for this post

@stockifab on Neural Network from Scratch · 2 months ago

1h 48m 59s logged

Network Architecture + Forward Pass

Wrote a basic API for modelling the network.
When untrained it initializes weights and biases random.
With these random values it is already capable of running a forward pass with some input data.

Next I want to implement backpropagation so that the network can actually learn patterns in data.