aero-deuce
- 4 Devlogs
- 10 Total hours
12-billion parameter AI language model that runs locally on consumer hardware. It's an instruction-following llm fine-tuned and post-trained using a technique called QLoRA with the Muon optimizer.
12-billion parameter AI language model that runs locally on consumer hardware. It's an instruction-following llm fine-tuned and post-trained using a technique called QLoRA with the Muon optimizer.
finished landing page
adapter, gguf, and mlx (q4) all on huggingface
I made a simple landing page
hosted custom inference endpoint (might not work very well because im broke)
should be able to run through services like Ollama and llama.cpp locally
I made a custom AI model called Aero-Deuce. it’s based on Google’s Gemma 4 12B, but we fine-tuned/post-trained it using a technique called QLoRA with a custom optimizer called Muon. the idea was to make a small, efficient instruction-following model that runs locally on consumer hardware.
we trained it on 30,000 instruction-following examples for 2,000 steps. the training loss dropped from 3.82 to 0.57. it runs at about 14-16 tokens per second on a MacBook Air M5.
the model is exported in three formats: a raw LoRA adapter (for merging with the base model), a GGUF file (runs in llama.cpp, LM Studio, Ollama, etc.), and an MLX version (optimized for Apple Silicon). all three will be on HuggingFace soon. the code is on GitHub.
still need to benchmark it against the base model and submit it to the HuggingFace leaderboard.
code: github.com/Ryz3nPlayZ/aero-deuce
adapter: huggingface.co/ZeZZm/aero-deuce
gguf: huggingface.co/ZeZZm/aero-deuce-GGUF
mlx: huggingface.co/ZeZZm/aero-deuce-MLX
45% of the way done with test post training run but im already broke