You are browsing as a guest. Sign up (or log in) to start making projects!

zemu

@zemu

Joined June 2nd, 2026

  • 6Devlogs
  • 2Projects
  • 1Ships
  • 15Votes
silly individual
Open comments for this post

2h 16m 36s logged

i made a desktop agent (still in development)
now im making a browser extension to let it attach onto browser sessions that users are already signed in on.

Cool stuff:

Push-based DOM snapshots (MutationObserver auto-detects page changes)
Stable element IDs via WeakMap (same element always has same ID)
Works with Gmail, Google Forms, real sessions
Tab management: switch, open, close specific tabs
File upload/download support

Technical stuff:

Python FastAPI daemon (port 8788)
Manifest V3 Chrome extension
Content script walks DOM → structured JSON
200ms debounce for settle detection, 3s hard cap
Biggest limitation: Cross-origin iframes blocked by browser security. Can’t interact with YouTube embeds, payment forms, etc. But handles ~80% of normal browsing fine.

right now it does work

i made a desktop agent (still in development)
now im making a browser extension to let it attach onto browser sessions that users are already signed in on.

Cool stuff:

Push-based DOM snapshots (MutationObserver auto-detects page changes)
Stable element IDs via WeakMap (same element always has same ID)
Works with Gmail, Google Forms, real sessions
Tab management: switch, open, close specific tabs
File upload/download support

Technical stuff:

Python FastAPI daemon (port 8788)
Manifest V3 Chrome extension
Content script walks DOM → structured JSON
200ms debounce for settle detection, 3s hard cap
Biggest limitation: Cross-origin iframes blocked by browser security. Can’t interact with YouTube embeds, payment forms, etc. But handles ~80% of normal browsing fine.

right now it does work

Replying to @zemu

0
3
Ship

I made Aero-Deuce, a 12-billion parameter AI language model that runs locally on consumer hardware. It's an instruction-following chatbot fine-tuned and post-trained from Google's Gemma 4 12B using a technique called QLoRA with the Muon optimizer. It's a relatively new approach that applies matrix orthogonalization to keep the training efficient. The whole model was trained on 30,000 instruction-following examples across 2,000 steps, going from a loss of 3.82 down to 0.57. The final model is exported in three formats: a GGUF file (~7GB) that works with anything supporting llama.cpp (LM Studio, Ollama, GPT4All), an MLX version optimized for Apple Silicon, and the raw LoRA adapter for anyone who wants to merge it or fine-tune further. I also built a live API endpoint on Modal with streaming support, so anyone can chat with it through a web interface.

It's named "Aero-Deuce" because deuce means two, and this is my second attempt at LLMs, my first being a 300m custom model with some weird architecture, and I didn't have the funds to get that one off the ground.

The most challenging part was the training pipeline itself. I started on Modal with spot GPUs, which meant my training kept getting preempted. I had to build robust checkpoint resumption just to survive it. Then I switched to Lightning AI for the second half. The base model uses a "unified" architecture (gemma4_unified) that most tooling didn't support — PEFT didn't recognize it, MLX couldn't run it, so I had to manually strip multimodal weights, rename parameter keys, and patch configs to make it work everywhere. Exporting to GGUF required building llama.cpp from source. The HuggingFace uploads stalled repeatedly. There were a lot of nights where nothing worked and I had to dig through error logs at 2am.

I'm proud that it actually works end to end. A 12B model that you can download as a single 7GB file and run on a MacBook Air — that's real. The training converged well, the model follows instructions properly, it identifies itself as Aero-Deuce instead of claiming to be Gemma. I built the entire pipeline from scratch: data loading with loss masking, the Muon optimizer implementation, dual-optimizer parameter partitioning, checkpoint resumption, the export pipeline, and the inference API. It's not a notebook tutorial — it's a real system.

To test it: go to the live demo at https://aero-deuce-lander.vercel.app, or download the GGUF from huggingface.co/ZeZZm/aero-deuce-GGUF and open it in LM Studio or GPT4All. If you're on a Mac, you can run pip install mlx-lm and use python -m mlx_lm.generate --model ZeZZm/aero-deuce-MLX --prompt "anything". The code is at github.com/Ryz3nPlayZ/aero-deuce. Everything is Apache 2.0.

Thanks!

  • 4 devlogs
  • 10h
Try project → See source code →
Open comments for this post

2h 3m 38s logged

adapter, gguf, and mlx (q4) all on huggingface
I made a simple landing page
hosted custom inference endpoint (might not work very well because im broke)
should be able to run through services like Ollama and llama.cpp locally

adapter, gguf, and mlx (q4) all on huggingface
I made a simple landing page
hosted custom inference endpoint (might not work very well because im broke)
should be able to run through services like Ollama and llama.cpp locally

Replying to @zemu

0
1
Open comments for this post

3h 32m 38s logged

I made a custom AI model called Aero-Deuce. it’s based on Google’s Gemma 4 12B, but we fine-tuned/post-trained it using a technique called QLoRA with a custom optimizer called Muon. the idea was to make a small, efficient instruction-following model that runs locally on consumer hardware.

we trained it on 30,000 instruction-following examples for 2,000 steps. the training loss dropped from 3.82 to 0.57. it runs at about 14-16 tokens per second on a MacBook Air M5.

the model is exported in three formats: a raw LoRA adapter (for merging with the base model), a GGUF file (runs in llama.cpp, LM Studio, Ollama, etc.), and an MLX version (optimized for Apple Silicon). all three will be on HuggingFace soon. the code is on GitHub.

still need to benchmark it against the base model and submit it to the HuggingFace leaderboard.

code: github.com/Ryz3nPlayZ/aero-deuce
adapter: huggingface.co/ZeZZm/aero-deuce
gguf: huggingface.co/ZeZZm/aero-deuce-GGUF
mlx: huggingface.co/ZeZZm/aero-deuce-MLX

I made a custom AI model called Aero-Deuce. it’s based on Google’s Gemma 4 12B, but we fine-tuned/post-trained it using a technique called QLoRA with a custom optimizer called Muon. the idea was to make a small, efficient instruction-following model that runs locally on consumer hardware.

we trained it on 30,000 instruction-following examples for 2,000 steps. the training loss dropped from 3.82 to 0.57. it runs at about 14-16 tokens per second on a MacBook Air M5.

the model is exported in three formats: a raw LoRA adapter (for merging with the base model), a GGUF file (runs in llama.cpp, LM Studio, Ollama, etc.), and an MLX version (optimized for Apple Silicon). all three will be on HuggingFace soon. the code is on GitHub.

still need to benchmark it against the base model and submit it to the HuggingFace leaderboard.

code: github.com/Ryz3nPlayZ/aero-deuce
adapter: huggingface.co/ZeZZm/aero-deuce
gguf: huggingface.co/ZeZZm/aero-deuce-GGUF
mlx: huggingface.co/ZeZZm/aero-deuce-MLX

Replying to @zemu

0
6
Open comments for this post

3h 55m 45s logged

45% of the way done with test post training run but im already broke

45% of the way done with test post training run but im already broke

Replying to @zemu

0
7

Followers

Loading…