zemu - Stardance

Ship Pending review

@zemu on zWork Browser Control · 10 days ago

I made zWork

Everything was challenging, mainly harness engineering. The browser feature was relatively straightforward.

I’m proud of the extensiveness or comprehensiveness of this project, and the scale at which it worked.

To fully test my app, you should probably install the app, as well as the extension. There is a limited web demo at app.tryzwork.app which doesnt really have any features.

More info is available in the readmes.

4 devlogs
24h

Try project → See source code →

Open comments for this post

@zemu on zWork Browser Control · 10 days ago

5h 13m 22s logged

I benchmarked zWork (desktop ai agent made by me) using zbctl (extension) to test its new browser use abilities. ive used the prompt in the picture below. the model answered all but one correct. the google form used to “benchmark” the model was: https://docs.google.com/forms/d/e/1FAIpQLSfHyJDMoOuDENsrrTmFsGtoDfe-9RRhlB2SDkKgYQlAAXOvhg/viewform

the form included a variety of answer types, including radio and dropdowns. this is by no means a final polished product, but is certainly a good proof of concept. I will continue working on zWork and improve the harness to also improve the models tool calling and accuracy. Thanks guys

Open comments for this post

@zemu on zWork Browser Control · about 1 month ago

12h 55m 39s logged

i tested zbctl using zWork (the agent im working on) as well as using claude code. initially there wasnt an endpoint for claude code, but claude code made one by itself. to make things fair, both harnesses were wired up to use deepseek-v4-pro as the model, and after testing the same prompt and same task, claude code basically won. my harness zWork needs a lot of work, but thats a bit besides the point. The zbctl bridge extension works.

Open comments for this post

@zemu on zWork Browser Control · about 2 months ago

2h 16m 36s logged

i made a desktop agent (still in development)
now im making a browser extension to let it attach onto browser sessions that users are already signed in on.

Cool stuff:

Push-based DOM snapshots (MutationObserver auto-detects page changes)
Stable element IDs via WeakMap (same element always has same ID)
Works with Gmail, Google Forms, real sessions
Tab management: switch, open, close specific tabs
File upload/download support

Technical stuff:

Python FastAPI daemon (port 8788)
Manifest V3 Chrome extension
Content script walks DOM → structured JSON
200ms debounce for settle detection, 3s hard cap
Biggest limitation: Cross-origin iframes blocked by browser security. Can’t interact with YouTube embeds, payment forms, etc. But handles ~80% of normal browsing fine.

right now it does work

Ship

@zemu on aero-deuce · about 2 months ago

I made Aero-Deuce, a 12-billion parameter AI language model that runs locally on consumer hardware. It’s an instruction-following chatbot fine-tuned and post-trained from Google’s Gemma 4 12B using a technique called QLoRA with the Muon optimizer. It’s a relatively new approach that applies matrix orthogonalization to keep the training efficient. The whole model was trained on 30,000 instruction-following examples across 2,000 steps, going from a loss of 3.82 down to 0.57. The final model is exported in three formats: a GGUF file (~7GB) that works with anything supporting llama.cpp (LM Studio, Ollama, GPT4All), an MLX version optimized for Apple Silicon, and the raw LoRA adapter for anyone who wants to merge it or fine-tune further. I also built a live API endpoint on Modal with streaming support, so anyone can chat with it through a web interface.

   It's named "Aero-Deuce" because deuce means two, and this is my second attempt at LLMs, my first being a 300m custom model with some weird architecture, and I didn't have the funds to get that one off the ground.

  The most challenging part was the training pipeline itself. I started on Modal with spot GPUs, which meant my training kept getting preempted. I had to build robust checkpoint resumption just to survive it. Then I switched to Lightning AI for the second half. The base model uses a "unified" architecture (gemma4_unified) that most tooling didn't support — PEFT didn't recognize it, MLX couldn't run it, so I had to manually strip multimodal weights, rename parameter keys, and patch configs to make it work everywhere. Exporting to GGUF required building llama.cpp from source. The HuggingFace uploads stalled repeatedly. There were a lot of nights where nothing worked and I had to dig through error logs at 2am.

  I'm proud that it actually works end to end. A 12B model that you can download as a single 7GB file and run on a MacBook Air — that's real. The training converged well, the model follows instructions properly, it identifies itself as Aero-Deuce instead of claiming to be Gemma. I built the entire pipeline from scratch: data loading with loss masking, the Muon optimizer implementation, dual-optimizer parameter partitioning, checkpoint resumption, the export pipeline, and the inference API. It's not a notebook tutorial — it's a real system.

  To test it: go to the live demo at https://aero-deuce-lander.vercel.app, or download the GGUF from huggingface.co/ZeZZm/aero-deuce-GGUF and open it in LM Studio or GPT4All. If you're on a Mac, you can run pip install mlx-lm and use python -m mlx_lm.generate --model ZeZZm/aero-deuce-MLX --prompt "anything". The code is at github.com/Ryz3nPlayZ/aero-deuce. Everything is Apache 2.0.

Thanks!

4 devlogs
10h
17.74x multiplier
174 Stardust

Try project → See source code →

Open comments for this post

@zemu on aero-deuce · about 2 months ago

18m 6s logged

finished landing page

Open comments for this post

@zemu on aero-deuce · about 2 months ago

2h 3m 38s logged

adapter, gguf, and mlx (q4) all on huggingface
I made a simple landing page
hosted custom inference endpoint (might not work very well because im broke)
should be able to run through services like Ollama and llama.cpp locally

Open comments for this post

@zemu on aero-deuce · about 2 months ago

3h 32m 38s logged

I made a custom AI model called Aero-Deuce. it’s based on Google’s Gemma 4 12B, but we fine-tuned/post-trained it using a technique called QLoRA with a custom optimizer called Muon. the idea was to make a small, efficient instruction-following model that runs locally on consumer hardware.

we trained it on 30,000 instruction-following examples for 2,000 steps. the training loss dropped from 3.82 to 0.57. it runs at about 14-16 tokens per second on a MacBook Air M5.

the model is exported in three formats: a raw LoRA adapter (for merging with the base model), a GGUF file (runs in llama.cpp, LM Studio, Ollama, etc.), and an MLX version (optimized for Apple Silicon). all three will be on HuggingFace soon. the code is on GitHub.

still need to benchmark it against the base model and submit it to the HuggingFace leaderboard.

code: github.com/Ryz3nPlayZ/aero-deuce
adapter: huggingface.co/ZeZZm/aero-deuce
gguf: huggingface.co/ZeZZm/aero-deuce-GGUF
mlx: huggingface.co/ZeZZm/aero-deuce-MLX

Open comments for this post

@zemu on aero-deuce · about 2 months ago

3h 55m 45s logged

45% of the way done with test post training run but im already broke

Open comments for this post

@zemu on zWork Browser Control · about 2 months ago

3h 58m 43s logged

magic