You are browsing as a guest. Sign up (or log in) to start making projects!

Pokepy

  • 1 Devlogs
  • 4 Total hours

a character level language model built completely from scratch using only numpy that generates pokémon sounding names. no PyTorch, no autograd, no frameworks, every forward pass, backward pass, and gradient update is manually implemented. this project started as a way to understand how neural networks actually work under the hood. I first built a simple MLP, then improved it into a WaveNet-style architecture to understand how increasing context length changes what the model can learn.

Ship #1 Pending review

what did you make?

I built pokepy, a character-level language model that creates new pokémon sounding names from scratch using only numpy. The project started as a simple MLP and was later upgraded into a WaveNet-style architecture to explore how increasing the amount of context a model sees can improve generation.

Instead of using PyTorch or other machine learning frameworks, I built the main parts of the neural network myself, including embeddings, linear layers, batch normalization, activations, loss functions, and backpropagation.

what was challenging?

The hardest part was understanding and implementing everything manually. Without autograd, I had to calculate the gradients for every layer and debug how information moved through the network during training.

Another challenge was improving the model architecture. The first MLP only looked at 3 characters at a time, which limited what it could learn. Building the WaveNet-style model and increasing the context window to 8 characters required changing how the model combined information.

what are you proud of?

I am most proud that the entire project works without relying on deep learning libraries. Building a working neural network from just numpy helped me understand what is actually happening inside models instead of treating them like a black box.

I am also proud that I was able to compare different architectures and see how changing the way a model processes context affects the results.

what should people know so they can test your project?

The project has a live demo hosted on Hugging Face Spaces where anyone can generate new pokémon names.

The demo runs the trained WaveNet model and loads the saved numpy weights to generate names directly. No PyTorch or external ML frameworks are required.

Try entering the demo and generate a few names, every result is created by the model's learned character patterns, so outputs will be different each time.

I also built this project by following and understanding Andrej Karpathy's Zero to Hero, MakeMore

  • 1 devlog
  • 4h
Try project → See source code →
Open comments for this post

4h 2m 11s logged

I built pokepy, a character-level language model that generates pokémon sounding names completely from scratch using only numpy. The main goal of this project was to understand how neural networks actually work instead of just using libraries that hide everything. I did not use PyTorch or autograd, so I had to manually build the important parts like embeddings, linear layers, batch normalization, activations, loss functions, backpropagation, and gradient updates. I first built a simple MLP model. It used a 3 character context window, meaning it only looked at the previous 3 characters to predict the next one. The model was able to learn basic patterns from pokémon names, but I noticed that it struggled when names had longer patterns because it could only see a small amount of information.To improve this, I built a WaveNet-style model. Instead of just making the model bigger, I changed the architecture so it could understand more context. Using FlattenConsecutive layers, the model slowly combined groups of characters together, increasing the context window from 3 characters to 8 characters.The hardest parts of this project were implementing everything manually and debugging how each part worked. Batch normalization was difficult because I had to keep track of running statistics for inference, and backpropagation required me to calculate the gradients for every layer instead of using automatic tools.After comparing the MLP and WaveNet models, I learned that improving a model is not always about adding more parameters. Sometimes changing how the model understands information, especially context, can make a bigger difference. I also deployed the final WaveNet model using Hugging Face Spaces and Gradio. The demo loads the trained numpy weights and generates new pokémon names without using any deep learning frameworks. This project helped me understand the foundations of language models and gave me a better idea of what is actually happening inside neural networks when they learn.

I built pokepy, a character-level language model that generates pokémon sounding names completely from scratch using only numpy. The main goal of this project was to understand how neural networks actually work instead of just using libraries that hide everything. I did not use PyTorch or autograd, so I had to manually build the important parts like embeddings, linear layers, batch normalization, activations, loss functions, backpropagation, and gradient updates. I first built a simple MLP model. It used a 3 character context window, meaning it only looked at the previous 3 characters to predict the next one. The model was able to learn basic patterns from pokémon names, but I noticed that it struggled when names had longer patterns because it could only see a small amount of information.To improve this, I built a WaveNet-style model. Instead of just making the model bigger, I changed the architecture so it could understand more context. Using FlattenConsecutive layers, the model slowly combined groups of characters together, increasing the context window from 3 characters to 8 characters.The hardest parts of this project were implementing everything manually and debugging how each part worked. Batch normalization was difficult because I had to keep track of running statistics for inference, and backpropagation required me to calculate the gradients for every layer instead of using automatic tools.After comparing the MLP and WaveNet models, I learned that improving a model is not always about adding more parameters. Sometimes changing how the model understands information, especially context, can make a bigger difference. I also deployed the final WaveNet model using Hugging Face Spaces and Gradio. The demo loads the trained numpy weights and generates new pokémon names without using any deep learning frameworks. This project helped me understand the foundations of language models and gave me a better idea of what is actually happening inside neural networks when they learn.

Replying to @anayshekhar

0
1

Followers

Loading…