qb - Stardance

@qb on [Scale Replica] DIY Server Rack · 2 days ago

4h 56m 28s logged

DRAM prices so bad that I’m building a server rack (in fusion)

The goal is to build a full 4U server rack, scale it down and 3D print it. Then, print a full shelf of racks. Keep going and you get infinite compute.

What I’ve done:

Fully modelled GPU (design adapted from 5090/RTX Pro 6000), M.2 SSD, SFF and Network Adapter Cards, Power Supply
Partially modelled Dual-CPU EATX Motherboard.

Every part has been modelled by hand to reference materials. As the end result will be scaled down, many parts have been widened to accomodate that.

Thank you for your support!

Open comments for this post

@qb on waifmark-2 · 12 days ago

5h 5m 9s logged

Dev2

It’s been so long but the UI looks a little better now (?) The process has been basically fully streamlined:

HF model download
-> pick quant/download type
Benchmarking
-> start server -> run benchmark, score/100 evaluation, ETA and live console monitoring
Result auditing (Will continue to work on this)
-> human review responses flagged by LLM judge
Results showcase/leaderboard
-> WIP

Progress related to benchmarking:

3x question count compared to Waifmark 1
~2x faster parallel automated scoring
memory system (short/long ctx), agentic toolcalling (basic file reading/shell/research) functions fully implemented as part of local agentic benchmarking.

Open comments for this post

@qb on waifmark-2 · 15 days ago

2h 35m 35s logged

Devlog 0.1

(For a future project.)
I’m trying out LLM benchmarking on my M1 Macbook Pro that probably needs a break!

Prior to Stardance -
Waifmark 1 was a benchmark testing local agentic capabilities and speech persona of small locally hosted (V)LLMs.
However, my benchmarking process for Waifmark 1 was unstandardised and troublesome, and I kept all the data in excel out of all places.
Current stage -
Waifmark 2 is in the works. The benchmark is now evaluated by an LLM-as-a-Judge that can flag and pass low-confidence outputs for human review (as is industry standard).

Using the wonderful streamlit library I built a basic app that can download you a model from hf, serve the model and benchmark it in 3 steps. Unfortunately I cannot show any more behind the process as of now, but I’m very excited to join Stardance and to see what changes can be observed moving from Waifmark 1 -> 2!