Open comments for this post
I built pokepy, a character-level language model that generates pokémon sounding names completely from scratch using only numpy. The main goal of this project was to understand how neural networks actually work instead of just using libraries that hide everything. I did not use PyTorch or autograd, so I had to manually build the important parts like embeddings, linear layers, batch normalization, activations, loss functions, backpropagation, and gradient updates. I first built a simple MLP model. It used a 3 character context window, meaning it only looked at the previous 3 characters to predict the next one. The model was able to learn basic patterns from pokémon names, but I noticed that it struggled when names had longer patterns because it could only see a small amount of information.To improve this, I built a WaveNet-style model. Instead of just making the model bigger, I changed the architecture so it could understand more context. Using FlattenConsecutive layers, the model slowly combined groups of characters together, increasing the context window from 3 characters to 8 characters.The hardest parts of this project were implementing everything manually and debugging how each part worked. Batch normalization was difficult because I had to keep track of running statistics for inference, and backpropagation required me to calculate the gradients for every layer instead of using automatic tools.After comparing the MLP and WaveNet models, I learned that improving a model is not always about adding more parameters. Sometimes changing how the model understands information, especially context, can make a bigger difference. I also deployed the final WaveNet model using Hugging Face Spaces and Gradio. The demo loads the trained numpy weights and generates new pokémon names without using any deep learning frameworks. This project helped me understand the foundations of language models and gave me a better idea of what is actually happening inside neural networks when they learn.