You are browsing as a guest. Sign up (or log in) to start making projects!

subhansh

@subhansh

Joined June 4th, 2026

  • 15Devlogs
  • 6Projects
  • 1Ships
  • 0Votes
meow
Independent Ai researcher and developer
js a 17 y/o claudemaxxing the whole day
and yes WE are 1000x agentic ai engineer 💔
how do i change my pfp ? 🥀
Open comments for this post

19h 5m 53s logged

rumi-Devlog #7 🍓

This was probably one of the most frustrating and important RUMI development sessions so far.

Most people only see the final discovery reports. They don’t see the chaos that happens before RUMI can generate a single theory. And trust me, there was a lot of chaos.

💔 Bug Fixes, Bug Fixes, More Bug Fixes

I started this session thinking I’d make a few improvements and continue testing.

That did not happen.

RUMI’s architecture has reached a point where changing one thing breaks three others. Every time I fixed 3 bugs, another 5 appeared from somewhere else. At one point I genuinely crashed out because every run ended with a new issue that didn’t even exist before.

It felt like:

fix_bug()
spawn_more_bugs(count=5)

Turns out there’s a reason people say:

“If it’s working, don’t touch it.”

Unfortunately I touched it. A lot. 🥀

The Problem

Even though RUMI was producing interesting discoveries, I still wasn’t satisfied. The reports were often interesting, creative, and sometimes novel, but they still felt too generic.

The pipeline would generate a theory, evaluate it once, pick a winner, and move on. I wanted RUMI to spend more time refining ideas instead of treating every discovery as a one-shot attempt.

Falling Into The OpenMythos Rabbit Hole

While researching I came across OpenMythos, a community attempt to reverse engineer Anthropic’s rumored Mythos architecture.

Nobody actually knows how Mythos works, but the community has proposed several reasoning patterns that might explain Claude-style outputs.

I spent hours reading through it asking myself:

“What parts of this could actually work inside RUMI?”

Not copying.

Adapting.

The Biggest Change I’ve Ever Made To RUMI 🥀

One idea stood out immediately: recurrent reasoning loops.

Before:

  • Mechanisms
  • Predictions
  • Theory Competition
  • Winner

One pass. One shot.

If mechanism generation was weak, everything downstream became weak too.

So I rebuilt a huge section of the pipeline.

Now RUMI performs a 3-stage recurrent refinement process:

Loop 1 — Exploration

Maximum creativity
Maximum diversity
Broad theory generation

Loop 2 — Refinement

Survivors re-enter the pipeline
Focus on evidence, consistency, and mechanism quality

Loop 3 — Convergence

Lowest creativity
Highest rigor
Final selection of the strongest explanation

Instead of living or dying from a single generation pass, theories now get multiple opportunities to evolve.

Preventing Theory Drift

Another idea I adapted was evidence grounding injection.

Recursive systems often suffer from theory drift, where ideas slowly become disconnected from the evidence that originally generated them.

To prevent this, every refinement loop continuously receives:

Literature evidence
Contradictions
Knowledge gaps
Observed anomalies

This keeps discoveries grounded instead of drifting into fantasy.

Smarter Computation

I also added convergence-aware halting.

Not every scientific problem deserves the same amount of computation. Some topics converge quickly while others require deeper exploration.

RUMI can now monitor improvements between loops and eventually stop refining once discoveries begin stabilizing.

lwk so much happened in these last 17hrs that i cant even include everything in here cause of the word limit 😭🥀

rumi-Devlog #7 🍓

This was probably one of the most frustrating and important RUMI development sessions so far.

Most people only see the final discovery reports. They don’t see the chaos that happens before RUMI can generate a single theory. And trust me, there was a lot of chaos.

💔 Bug Fixes, Bug Fixes, More Bug Fixes

I started this session thinking I’d make a few improvements and continue testing.

That did not happen.

RUMI’s architecture has reached a point where changing one thing breaks three others. Every time I fixed 3 bugs, another 5 appeared from somewhere else. At one point I genuinely crashed out because every run ended with a new issue that didn’t even exist before.

It felt like:

fix_bug()
spawn_more_bugs(count=5)

Turns out there’s a reason people say:

“If it’s working, don’t touch it.”

Unfortunately I touched it. A lot. 🥀

The Problem

Even though RUMI was producing interesting discoveries, I still wasn’t satisfied. The reports were often interesting, creative, and sometimes novel, but they still felt too generic.

The pipeline would generate a theory, evaluate it once, pick a winner, and move on. I wanted RUMI to spend more time refining ideas instead of treating every discovery as a one-shot attempt.

Falling Into The OpenMythos Rabbit Hole

While researching I came across OpenMythos, a community attempt to reverse engineer Anthropic’s rumored Mythos architecture.

Nobody actually knows how Mythos works, but the community has proposed several reasoning patterns that might explain Claude-style outputs.

I spent hours reading through it asking myself:

“What parts of this could actually work inside RUMI?”

Not copying.

Adapting.

The Biggest Change I’ve Ever Made To RUMI 🥀

One idea stood out immediately: recurrent reasoning loops.

Before:

  • Mechanisms
  • Predictions
  • Theory Competition
  • Winner

One pass. One shot.

If mechanism generation was weak, everything downstream became weak too.

So I rebuilt a huge section of the pipeline.

Now RUMI performs a 3-stage recurrent refinement process:

Loop 1 — Exploration

Maximum creativity
Maximum diversity
Broad theory generation

Loop 2 — Refinement

Survivors re-enter the pipeline
Focus on evidence, consistency, and mechanism quality

Loop 3 — Convergence

Lowest creativity
Highest rigor
Final selection of the strongest explanation

Instead of living or dying from a single generation pass, theories now get multiple opportunities to evolve.

Preventing Theory Drift

Another idea I adapted was evidence grounding injection.

Recursive systems often suffer from theory drift, where ideas slowly become disconnected from the evidence that originally generated them.

To prevent this, every refinement loop continuously receives:

Literature evidence
Contradictions
Knowledge gaps
Observed anomalies

This keeps discoveries grounded instead of drifting into fantasy.

Smarter Computation

I also added convergence-aware halting.

Not every scientific problem deserves the same amount of computation. Some topics converge quickly while others require deeper exploration.

RUMI can now monitor improvements between loops and eventually stop refining once discoveries begin stabilizing.

lwk so much happened in these last 17hrs that i cant even include everything in here cause of the word limit 😭🥀

Replying to @subhansh

0
Open comments for this post

forgive me for the horrible writing and formatting of the devlog 🥀 im so fried rn 💔✌

Open comments for this post

15h 14m 19s logged

rumi - Devlog #6
June 11, 2026

so yea… pulled an all nighter for this one 🥀

most of the work this time wasn’t adding entirely new systems.

it was stress testing RUMI hard enough to find where the architecture starts breaking.

and honestly i found a lot more than i expected.

  • Track B Architecture Improvements

spent a lot of time refining Track B after the initial curiosity pipeline integration.

instead of just generating curiosity questions, it now runs through much more of the actual discovery stack.

lots of internal changes here that don’t really show up visually but massively affect the quality of the final reports.

💔 Constraint Pipeline Fix

found a pretty nasty issue where parts of the generated curiosity constraints weren’t making it all the way through the pipeline.

after fixing it, the results became immediately obvious.

latest dual-track run:

Track A Unique Theories: 9
Track B Unique Theories: 8
Shared Theories: 0

which is honestly one of the strongest signals i’ve seen so far that the dual-track system is actually working as intended.

both tracks are now exploring completely different hypothesis spaces instead of converging on the same ideas.

  • Claude Fable 5 Experiment

managed to get temporary access to Claude Fable 5 and immediately decided to throw RUMI at it.

before i could even test it though…

the integration broke 💔

spent a while fixing provider compatibility issues and getting everything working again.

eventually got a discovery run started on:

What happens to information when it crosses a black hole event horizon?

Fable managed to reach roughly Phase 8.5 before the free trial looked at RUMI’s request count and basically said:

aight imma head out 😭

  • What Fable Revealed

this ended up being way more valuable than i expected.

while comparing Fable’s outputs with the models i normally run, i noticed a massive difference.

most models tend to generate:

Hidden Variable

Mechanism Description

Prediction

Fable was generating:

Hidden Variable

Mechanism

Equation

Parameter Extraction

Derivation

Numerical Validation

Prediction

actual equations.

actual variables.

actual derivation chains.

actual numerical checks.

🥀 Switching Back To MiMo

after the Fable credits died, i moved everything back to MiMo and started comparing outputs.

and that’s when another weakness became obvious.

the pipeline itself wasn’t failing.

MiMo simply wasn’t producing the mathematical depth needed for the mechanism stage.

which means the current bottleneck isn’t discovery anymore.

it’s mathematical formalization.

Mathematical Formalization Architecture

one of the most interesting things i noticed while testing Claude Fable 5 was how differently it handled mechanism generation.

most models would generate something like:

Hidden Variable

Mechanism Description

Prediction

while Fable was generating:

Hidden Variable

Mechanism

Equation

Parameter Extraction

Derivation

Numerical Validation

Prediction

and honestly that immediately exposed one of RUMI’s biggest weaknesses.

theories and mechanisms were already being generated.

the math wasn’t.

so instead of just complaining about it, i started redesigning the mechanism pipeline around that pattern.

RUMI now has the foundations for a much more quantitative discovery process where mechanisms aren’t just descriptions anymore, which is lwk tuff ig 🥀✌

  • equations
  • variables
  • derivations
  • parameter extraction
  • numerical validation

Switching Back To MiMo 🥀

after Fable ran out of credits i moved everything back to MiMo and started testing the updated architecture and still doing it cause mimo is like hella slow bruh normally umi takes upto 30-40 mins but now with mimo shes taking aorund 2hr 30mins 💔

forgive me for the horrible writing and formatting of the devlog 🥀 im so fried rn 💔✌

Replying to @subhansh

1
Open comments for this post

15h 14m 19s logged

rumi - Devlog #6
June 11, 2026

so yea… pulled an all nighter for this one 🥀

most of the work this time wasn’t adding entirely new systems.

it was stress testing RUMI hard enough to find where the architecture starts breaking.

and honestly i found a lot more than i expected.

  • Track B Architecture Improvements

spent a lot of time refining Track B after the initial curiosity pipeline integration.

instead of just generating curiosity questions, it now runs through much more of the actual discovery stack.

lots of internal changes here that don’t really show up visually but massively affect the quality of the final reports.

💔 Constraint Pipeline Fix

found a pretty nasty issue where parts of the generated curiosity constraints weren’t making it all the way through the pipeline.

after fixing it, the results became immediately obvious.

latest dual-track run:

Track A Unique Theories: 9
Track B Unique Theories: 8
Shared Theories: 0

which is honestly one of the strongest signals i’ve seen so far that the dual-track system is actually working as intended.

both tracks are now exploring completely different hypothesis spaces instead of converging on the same ideas.

  • Claude Fable 5 Experiment

managed to get temporary access to Claude Fable 5 and immediately decided to throw RUMI at it.

before i could even test it though…

the integration broke 💔

spent a while fixing provider compatibility issues and getting everything working again.

eventually got a discovery run started on:

What happens to information when it crosses a black hole event horizon?

Fable managed to reach roughly Phase 8.5 before the free trial looked at RUMI’s request count and basically said:

aight imma head out 😭

  • What Fable Revealed

this ended up being way more valuable than i expected.

while comparing Fable’s outputs with the models i normally run, i noticed a massive difference.

most models tend to generate:

Hidden Variable

Mechanism Description

Prediction

Fable was generating:

Hidden Variable

Mechanism

Equation

Parameter Extraction

Derivation

Numerical Validation

Prediction

actual equations.

actual variables.

actual derivation chains.

actual numerical checks.

🥀 Switching Back To MiMo

after the Fable credits died, i moved everything back to MiMo and started comparing outputs.

and that’s when another weakness became obvious.

the pipeline itself wasn’t failing.

MiMo simply wasn’t producing the mathematical depth needed for the mechanism stage.

which means the current bottleneck isn’t discovery anymore.

it’s mathematical formalization.

Mathematical Formalization Architecture

one of the most interesting things i noticed while testing Claude Fable 5 was how differently it handled mechanism generation.

most models would generate something like:

Hidden Variable

Mechanism Description

Prediction

while Fable was generating:

Hidden Variable

Mechanism

Equation

Parameter Extraction

Derivation

Numerical Validation

Prediction

and honestly that immediately exposed one of RUMI’s biggest weaknesses.

theories and mechanisms were already being generated.

the math wasn’t.

so instead of just complaining about it, i started redesigning the mechanism pipeline around that pattern.

RUMI now has the foundations for a much more quantitative discovery process where mechanisms aren’t just descriptions anymore, which is lwk tuff ig 🥀✌

  • equations
  • variables
  • derivations
  • parameter extraction
  • numerical validation

Switching Back To MiMo 🥀

after Fable ran out of credits i moved everything back to MiMo and started testing the updated architecture and still doing it cause mimo is like hella slow bruh normally umi takes upto 30-40 mins but now with mimo shes taking aorund 2hr 30mins 💔

rumi - Devlog #6
June 11, 2026

so yea… pulled an all nighter for this one 🥀

most of the work this time wasn’t adding entirely new systems.

it was stress testing RUMI hard enough to find where the architecture starts breaking.

and honestly i found a lot more than i expected.

  • Track B Architecture Improvements

spent a lot of time refining Track B after the initial curiosity pipeline integration.

instead of just generating curiosity questions, it now runs through much more of the actual discovery stack.

lots of internal changes here that don’t really show up visually but massively affect the quality of the final reports.

💔 Constraint Pipeline Fix

found a pretty nasty issue where parts of the generated curiosity constraints weren’t making it all the way through the pipeline.

after fixing it, the results became immediately obvious.

latest dual-track run:

Track A Unique Theories: 9
Track B Unique Theories: 8
Shared Theories: 0

which is honestly one of the strongest signals i’ve seen so far that the dual-track system is actually working as intended.

both tracks are now exploring completely different hypothesis spaces instead of converging on the same ideas.

  • Claude Fable 5 Experiment

managed to get temporary access to Claude Fable 5 and immediately decided to throw RUMI at it.

before i could even test it though…

the integration broke 💔

spent a while fixing provider compatibility issues and getting everything working again.

eventually got a discovery run started on:

What happens to information when it crosses a black hole event horizon?

Fable managed to reach roughly Phase 8.5 before the free trial looked at RUMI’s request count and basically said:

aight imma head out 😭

  • What Fable Revealed

this ended up being way more valuable than i expected.

while comparing Fable’s outputs with the models i normally run, i noticed a massive difference.

most models tend to generate:

Hidden Variable

Mechanism Description

Prediction

Fable was generating:

Hidden Variable

Mechanism

Equation

Parameter Extraction

Derivation

Numerical Validation

Prediction

actual equations.

actual variables.

actual derivation chains.

actual numerical checks.

🥀 Switching Back To MiMo

after the Fable credits died, i moved everything back to MiMo and started comparing outputs.

and that’s when another weakness became obvious.

the pipeline itself wasn’t failing.

MiMo simply wasn’t producing the mathematical depth needed for the mechanism stage.

which means the current bottleneck isn’t discovery anymore.

it’s mathematical formalization.

Mathematical Formalization Architecture

one of the most interesting things i noticed while testing Claude Fable 5 was how differently it handled mechanism generation.

most models would generate something like:

Hidden Variable

Mechanism Description

Prediction

while Fable was generating:

Hidden Variable

Mechanism

Equation

Parameter Extraction

Derivation

Numerical Validation

Prediction

and honestly that immediately exposed one of RUMI’s biggest weaknesses.

theories and mechanisms were already being generated.

the math wasn’t.

so instead of just complaining about it, i started redesigning the mechanism pipeline around that pattern.

RUMI now has the foundations for a much more quantitative discovery process where mechanisms aren’t just descriptions anymore, which is lwk tuff ig 🥀✌

  • equations
  • variables
  • derivations
  • parameter extraction
  • numerical validation

Switching Back To MiMo 🥀

after Fable ran out of credits i moved everything back to MiMo and started testing the updated architecture and still doing it cause mimo is like hella slow bruh normally umi takes upto 30-40 mins but now with mimo shes taking aorund 2hr 30mins 💔

Replying to @subhansh

1
Open comments for this post

3h 13m 29s logged

devlog#6

  • added a guide app
  • and 3 new live wallpapers with parallax effect
  • restyled dock and app icons a bit
  • new songs and covers
  • feature to resize window’s length and breadth

Go check it out!! :3 its done now (hopefully, i dont want more bugs 💔✌)

devlog#6

  • added a guide app
  • and 3 new live wallpapers with parallax effect
  • restyled dock and app icons a bit
  • new songs and covers
  • feature to resize window’s length and breadth

Go check it out!! :3 its done now (hopefully, i dont want more bugs 💔✌)

Replying to @subhansh

2
Open comments for this post

Here’s the link! go check it out

Open comments for this post

4h 38m 3s logged

devlog #5
waawaw so i did a complete ui uphaul and polished everything and then added that analog watch and calendar too. also add parallax effect for the wallpapers, then added 2 more songs and then updated the song covers too. then fixed alot of bugs and animation issues. added 3 more static wallpapers and 3 new live wallpapers (ima show that in the next devlog). and then i also added a cute pet which follows ur cursor and if u tap on it u can make her sit!!, tap again and she will start following ur cursor again.
should i add more reels tho ? 🥀

Here’s the link! go check it out

Replying to @subhansh

1
Open comments for this post

4h 38m 3s logged

devlog #5
waawaw so i did a complete ui uphaul and polished everything and then added that analog watch and calendar too. also add parallax effect for the wallpapers, then added 2 more songs and then updated the song covers too. then fixed alot of bugs and animation issues. added 3 more static wallpapers and 3 new live wallpapers (ima show that in the next devlog). and then i also added a cute pet which follows ur cursor and if u tap on it u can make her sit!!, tap again and she will start following ur cursor again.
should i add more reels tho ? 🥀

devlog #5
waawaw so i did a complete ui uphaul and polished everything and then added that analog watch and calendar too. also add parallax effect for the wallpapers, then added 2 more songs and then updated the song covers too. then fixed alot of bugs and animation issues. added 3 more static wallpapers and 3 new live wallpapers (ima show that in the next devlog). and then i also added a cute pet which follows ur cursor and if u tap on it u can make her sit!!, tap again and she will start following ur cursor again.
should i add more reels tho ? 🥀

Replying to @subhansh

1
Open comments for this post

39m 52s logged

the most app is finally here 😭😭🥀…. drumrolls DOOMSCROLL! 🥀… y a u can now watch 38 exclusive reels on MewoOS now (gonna add more later 😺) and then i added 3 static interactive bagrounds and now working on live wallpapers too andd thennn aaaa ye i added those good looking app icons created from figma and agagin fixed allat of bugs ofc 🥀…. then improved the animations and glass morphism effect and restyled all of the panels and basically the whole ui lol 😭…. and added more gud pics in the gallery and the song count in the music player app went from 6 to 20!! and yes they are real songs and not js some random ahh chimes lol anddddd aaa idk more… maybe that was it for this devlog
holdup now lemme cook the livewallpapers and polish things up

the most app is finally here 😭😭🥀…. drumrolls DOOMSCROLL! 🥀… y a u can now watch 38 exclusive reels on MewoOS now (gonna add more later 😺) and then i added 3 static interactive bagrounds and now working on live wallpapers too andd thennn aaaa ye i added those good looking app icons created from figma and agagin fixed allat of bugs ofc 🥀…. then improved the animations and glass morphism effect and restyled all of the panels and basically the whole ui lol 😭…. and added more gud pics in the gallery and the song count in the music player app went from 6 to 20!! and yes they are real songs and not js some random ahh chimes lol anddddd aaa idk more… maybe that was it for this devlog
holdup now lemme cook the livewallpapers and polish things up

Replying to @subhansh

0
Open comments for this post

3h 49m 37s logged

devlog #3 - MewoOS

Added a booting animation and fixed allat of bugs and improved dock , animations , and ui (will show these in the next devlog)

devlog #3 - MewoOS

Added a booting animation and fixed allat of bugs and improved dock , animations , and ui (will show these in the next devlog)

Replying to @subhansh

0
Open comments for this post

59m 9s logged

Devlog 2 — MewoOS

yayyayayay its finally coming together… basic version is done now ive add some more features and apps and just polish the whole ui a bit too should i add an app for doomscrolling 🥀 ?

anyways heres the details of what it has rn :-

-added an iframe browser because the jam said “be creative” and my creativity said “put the entire internet inside a window inside a fake OS inside a browser.” it loads Wikipedia and GitHub reliably. most other sites block iframes so its basically a very expensive Wikipedia viewer. but it works and im counting it as a feature

  • also the basic version of music player was alr done but my first thought was “the music player needs REAL music” — not some placeholder beeps, actual songs. so i set up yt-dlp and went on a downloading spree at unholy hours. we got Cruel Angel’s Thesis, YOASOBI’s Idol, Ado’s Usseewa, Night Dancer, Fly Me to the Moon, Feeling Good by Nina Simone, Chopin Ballade No.4 in G minor, Rachmaninoff Piano Concerto No.2 (the full 37MB concerto because i have no concept of file size restraint), Paganini Caprice No.24, and some jazz tracks for the cultured folks. HTML5 audio API handles all of it — play, pause, skip, progress bar, the whole thing. no npm packages, no dependencies, just vibes and tags

  • the UI went through approximately 47 iterations of “is this too pink” vs “is this dark enough” — settled on a deep black (#0A0709) with sakura pink accents and floating ambient glow orbs that drift around the screen like little ghosts of my questionable design choices. windows have glass panels with backdrop blur, the dock has custom SVG icons (not emojis, im not a monster), and the whole thing radiates “what if a gothic cathedral had a kawaii phase” energy 💔💔

Devlog 2 — MewoOS

yayyayayay its finally coming together… basic version is done now ive add some more features and apps and just polish the whole ui a bit too should i add an app for doomscrolling 🥀 ?

anyways heres the details of what it has rn :-

-added an iframe browser because the jam said “be creative” and my creativity said “put the entire internet inside a window inside a fake OS inside a browser.” it loads Wikipedia and GitHub reliably. most other sites block iframes so its basically a very expensive Wikipedia viewer. but it works and im counting it as a feature

  • also the basic version of music player was alr done but my first thought was “the music player needs REAL music” — not some placeholder beeps, actual songs. so i set up yt-dlp and went on a downloading spree at unholy hours. we got Cruel Angel’s Thesis, YOASOBI’s Idol, Ado’s Usseewa, Night Dancer, Fly Me to the Moon, Feeling Good by Nina Simone, Chopin Ballade No.4 in G minor, Rachmaninoff Piano Concerto No.2 (the full 37MB concerto because i have no concept of file size restraint), Paganini Caprice No.24, and some jazz tracks for the cultured folks. HTML5 audio API handles all of it — play, pause, skip, progress bar, the whole thing. no npm packages, no dependencies, just vibes and tags

  • the UI went through approximately 47 iterations of “is this too pink” vs “is this dark enough” — settled on a deep black (#0A0709) with sakura pink accents and floating ambient glow orbs that drift around the screen like little ghosts of my questionable design choices. windows have glass panels with backdrop blur, the dock has custom SVG icons (not emojis, im not a monster), and the whole thing radiates “what if a gothic cathedral had a kawaii phase” energy 💔💔

Replying to @subhansh

0
Open comments for this post

37m 11s logged

wawawa basic frontend almost done (js have to polish it now )

wawawa basic frontend almost done (js have to polish it now )

Replying to @subhansh

0
Open comments for this post

10h 39m logged

Devlog #5 — RUMI Dual-Track Architecture

so as yall know yesterday i finished adding the curiosity pipeline
(wait… you dont know about this? naaa💔go read yesterday’s devlog >:3…WUT ? GO RN)

but yea that was mostly just the foundation

today was about actually refining it and wiring it into the rest of RUMI

basically i ended up bringing a lot of the Track A discovery pipeline into Track B too because like… why wouldnt i 😭

not just straight up copy-pasting it though because then both tracks would end up doing the same thing

instead i reused parts of the architecture while keeping the curiosity systems separate

the biggest addition was making the generated curiosity constraints actually flow into the discovery process itself

which sounds simple until you spend half the day finding out they werent actually being propagated through the pipeline properly 💔

and then came the debugging arc 😭

DNS resolution failures causing discovery runs to randomly die 😾
(like wdym rumi requested for more papers and data and dns forgetting that internet exists)

curiosity constraints not propagating into the pipeline😫
Semantic Scholar 429 rate limits 🥀
Gemini 503 errors 😢
free-tier rate limits everywhere (she makes like 6-7k requests in one run )😺🥀

and somehow RUMI has become so LLM hungry that i had to run her on 13 API keys across 3 different providers just to keep discovery sessions alive 😭 (lwk running on hope and sum duct tape 🥀🥀)

there was genuinely a point where i was about to crash out because providers kept rate limiting, APIs were dying, runs were hanging, and everything was fighting me at once 💔🥀😭

but after all that RUMI finally completed a full dual-track discovery run

topic:

“Why does the universe expand faster than expected?”

results:

Track A Winner
→ Scale-Dependent Effective Gravity

Track B Winner
→ Temporal Vacuum Shear Framework

and this is where things got interesting

Track A and Track B finished with:

0 shared theories <–(🙀)
different winners
different hidden variables
different contradiction sets
different mechanisms
0 runtime errors

RUMI generated:

17 theories
30 mechanisms
22 hidden variables
12 contradictions
12 predictions

from a single discovery session 😭

and honestly the most important number isnt the score

its this:

Shared Theories: 0

because it means Track B isnt just making a slightly remixed version of Track A anymore

both tracks started from the same question, same literature, same knowledge base, and still ended up exploring completely different hypothesis spaces

which is pretty much the entire reason i built the curiosity system in the first place 😭🥀

still more work to do, but today feels like one of the biggest architecture milestones for RUMI so far💔🥀
lwk gonna try to improve GFlowNets tmrw

here’s the updated pipeline diagram :-

Devlog #5 — RUMI Dual-Track Architecture

so as yall know yesterday i finished adding the curiosity pipeline
(wait… you dont know about this? naaa💔go read yesterday’s devlog >:3…WUT ? GO RN)

but yea that was mostly just the foundation

today was about actually refining it and wiring it into the rest of RUMI

basically i ended up bringing a lot of the Track A discovery pipeline into Track B too because like… why wouldnt i 😭

not just straight up copy-pasting it though because then both tracks would end up doing the same thing

instead i reused parts of the architecture while keeping the curiosity systems separate

the biggest addition was making the generated curiosity constraints actually flow into the discovery process itself

which sounds simple until you spend half the day finding out they werent actually being propagated through the pipeline properly 💔

and then came the debugging arc 😭

DNS resolution failures causing discovery runs to randomly die 😾
(like wdym rumi requested for more papers and data and dns forgetting that internet exists)

curiosity constraints not propagating into the pipeline😫
Semantic Scholar 429 rate limits 🥀
Gemini 503 errors 😢
free-tier rate limits everywhere (she makes like 6-7k requests in one run )😺🥀

and somehow RUMI has become so LLM hungry that i had to run her on 13 API keys across 3 different providers just to keep discovery sessions alive 😭 (lwk running on hope and sum duct tape 🥀🥀)

there was genuinely a point where i was about to crash out because providers kept rate limiting, APIs were dying, runs were hanging, and everything was fighting me at once 💔🥀😭

but after all that RUMI finally completed a full dual-track discovery run

topic:

“Why does the universe expand faster than expected?”

results:

Track A Winner
→ Scale-Dependent Effective Gravity

Track B Winner
→ Temporal Vacuum Shear Framework

and this is where things got interesting

Track A and Track B finished with:

0 shared theories <–(🙀)
different winners
different hidden variables
different contradiction sets
different mechanisms
0 runtime errors

RUMI generated:

17 theories
30 mechanisms
22 hidden variables
12 contradictions
12 predictions

from a single discovery session 😭

and honestly the most important number isnt the score

its this:

Shared Theories: 0

because it means Track B isnt just making a slightly remixed version of Track A anymore

both tracks started from the same question, same literature, same knowledge base, and still ended up exploring completely different hypothesis spaces

which is pretty much the entire reason i built the curiosity system in the first place 😭🥀

still more work to do, but today feels like one of the biggest architecture milestones for RUMI so far💔🥀
lwk gonna try to improve GFlowNets tmrw

here’s the updated pipeline diagram :-

Replying to @subhansh

0
Open comments for this post

mb this was devlog #4 🥀

Open comments for this post

9h 22m 35s logged

Stardance Devlog #3

June 7, 2026

Today I realized RUMI’s biggest limitation wasn’t math, evidence, reviewers, or the 17-stage pipeline.

It was curiosity.

For days I’ve been making RUMI better at reading papers, finding gaps, generating hypotheses, scoring theories, and validating ideas, which actually improved her ALLAT…
but still something felt missing.

Then it hit me.

RUMI could analyze research.

RUMI could synthesize research.

RUMI could critique research.

But RUMI never stopped and asked:

“Why does this happen at all?”

And that’s where most discoveries begin🥀

So today I built Phase 0 — The Curiosity Engine.

Before reading papers.

Before building knowledge graphs.

Before generating mechanisms.

RUMI now starts by questioning the problem itself.

What could be causing this?
What assumptions are we making?
What if the accepted explanation is incomplete?
What are we not looking at?

Basically, RUMI now wonders before she researches😭

The first implementation was a disaster.

I merged curiosity directly into the main pipeline and immediately turned the architecture into spaghetti.

So I rebuilt it.

Now there are two separate paths:

Conventional RUMI

Reads
Analyzes
Verifies

Curious RUMI

Questions
Explores
Challenges assumptions

Both run independently and merge into the final report💔

The funny part is that after building theory tournaments, Bayesian reasoning, evidence systems, mathematical verification, observability checks, and dozens of other modules…

The biggest upgrade might have been teaching RUMI something humans learn as children:

Curiosity🥀😭

Today’s Progress

Phase 0 Curiosity Engine ✅
Dual-Pipeline Architecture ✅
Evidence Extraction Improvements ✅
Observability Checks ✅
Mechanism Completeness Scoring ✅
My Sanity ❌

See you tomorrow.

Hopefully before RUMI starts asking questions that create more problems than answers.
🥀💔

mb this was devlog #4 🥀

Replying to @subhansh

1
Open comments for this post

9h 22m 35s logged

Stardance Devlog #3

June 7, 2026

Today I realized RUMI’s biggest limitation wasn’t math, evidence, reviewers, or the 17-stage pipeline.

It was curiosity.

For days I’ve been making RUMI better at reading papers, finding gaps, generating hypotheses, scoring theories, and validating ideas, which actually improved her ALLAT…
but still something felt missing.

Then it hit me.

RUMI could analyze research.

RUMI could synthesize research.

RUMI could critique research.

But RUMI never stopped and asked:

“Why does this happen at all?”

And that’s where most discoveries begin🥀

So today I built Phase 0 — The Curiosity Engine.

Before reading papers.

Before building knowledge graphs.

Before generating mechanisms.

RUMI now starts by questioning the problem itself.

What could be causing this?
What assumptions are we making?
What if the accepted explanation is incomplete?
What are we not looking at?

Basically, RUMI now wonders before she researches😭

The first implementation was a disaster.

I merged curiosity directly into the main pipeline and immediately turned the architecture into spaghetti.

So I rebuilt it.

Now there are two separate paths:

Conventional RUMI

Reads
Analyzes
Verifies

Curious RUMI

Questions
Explores
Challenges assumptions

Both run independently and merge into the final report💔

The funny part is that after building theory tournaments, Bayesian reasoning, evidence systems, mathematical verification, observability checks, and dozens of other modules…

The biggest upgrade might have been teaching RUMI something humans learn as children:

Curiosity🥀😭

Today’s Progress

Phase 0 Curiosity Engine ✅
Dual-Pipeline Architecture ✅
Evidence Extraction Improvements ✅
Observability Checks ✅
Mechanism Completeness Scoring ✅
My Sanity ❌

See you tomorrow.

Hopefully before RUMI starts asking questions that create more problems than answers.
🥀💔

Stardance Devlog #3

June 7, 2026

Today I realized RUMI’s biggest limitation wasn’t math, evidence, reviewers, or the 17-stage pipeline.

It was curiosity.

For days I’ve been making RUMI better at reading papers, finding gaps, generating hypotheses, scoring theories, and validating ideas, which actually improved her ALLAT…
but still something felt missing.

Then it hit me.

RUMI could analyze research.

RUMI could synthesize research.

RUMI could critique research.

But RUMI never stopped and asked:

“Why does this happen at all?”

And that’s where most discoveries begin🥀

So today I built Phase 0 — The Curiosity Engine.

Before reading papers.

Before building knowledge graphs.

Before generating mechanisms.

RUMI now starts by questioning the problem itself.

What could be causing this?
What assumptions are we making?
What if the accepted explanation is incomplete?
What are we not looking at?

Basically, RUMI now wonders before she researches😭

The first implementation was a disaster.

I merged curiosity directly into the main pipeline and immediately turned the architecture into spaghetti.

So I rebuilt it.

Now there are two separate paths:

Conventional RUMI

Reads
Analyzes
Verifies

Curious RUMI

Questions
Explores
Challenges assumptions

Both run independently and merge into the final report💔

The funny part is that after building theory tournaments, Bayesian reasoning, evidence systems, mathematical verification, observability checks, and dozens of other modules…

The biggest upgrade might have been teaching RUMI something humans learn as children:

Curiosity🥀😭

Today’s Progress

Phase 0 Curiosity Engine ✅
Dual-Pipeline Architecture ✅
Evidence Extraction Improvements ✅
Observability Checks ✅
Mechanism Completeness Scoring ✅
My Sanity ❌

See you tomorrow.

Hopefully before RUMI starts asking questions that create more problems than answers.
🥀💔

Replying to @subhansh

1
Open comments for this post

2h 17m 35s logged

spent basically the whole day inside RUMI’s discovery engine and ngl it was pain 💔

fixed 50+ bugs across the pipeline. some of the biggest ones were:

empty hypotheses
theory tournament crashing
unicode errors on windows
graph contamination
novelty scoring issues
mechanism validation deleting everything 😭

also built a few new systems:

math verification engine
counterfactual/“what if” hypothesis generation
entity enrichment (NASA, PubChem, UniProt, PDB, etc.)
better novelty scoring
tournament winner override

before the fixes, some discoveries were scoring:

0/100
40/100
45/100

after everything:

FRB Magnetar: 66/100 (B)
KRAS G12D: 65/100 (B)

the coolest run today was on Fast Radio Bursts and magnetars.

RUMI went through 69 papers, generated new hidden variables, built mechanisms around them, made testable predictions, then tried to destroy its own ideas through adversarial testing.

finally it actually felt like the pipeline was working end-to-end instead of randomly exploding lol 😭

still got a long way to go though. biggest weaknesses right now are novelty and mathematical rigor. RUMI is getting better at explaining science, now I need to make her better at finding genuinely new stuff.

back to debugging tomorrow frfr 💔🥀😭

spent basically the whole day inside RUMI’s discovery engine and ngl it was pain 💔

fixed 50+ bugs across the pipeline. some of the biggest ones were:

empty hypotheses
theory tournament crashing
unicode errors on windows
graph contamination
novelty scoring issues
mechanism validation deleting everything 😭

also built a few new systems:

math verification engine
counterfactual/“what if” hypothesis generation
entity enrichment (NASA, PubChem, UniProt, PDB, etc.)
better novelty scoring
tournament winner override

before the fixes, some discoveries were scoring:

0/100
40/100
45/100

after everything:

FRB Magnetar: 66/100 (B)
KRAS G12D: 65/100 (B)

the coolest run today was on Fast Radio Bursts and magnetars.

RUMI went through 69 papers, generated new hidden variables, built mechanisms around them, made testable predictions, then tried to destroy its own ideas through adversarial testing.

finally it actually felt like the pipeline was working end-to-end instead of randomly exploding lol 😭

still got a long way to go though. biggest weaknesses right now are novelty and mathematical rigor. RUMI is getting better at explaining science, now I need to make her better at finding genuinely new stuff.

back to debugging tomorrow frfr 💔🥀😭

Replying to @subhansh

0
Open comments for this post

2h 8m 31s logged

yayaya finallyy finally the basic tui is working and the frontend-backend integration is almost there

yayaya finallyy finally the basic tui is working and the frontend-backend integration is almost there

Replying to @subhansh

0
Open comments for this post

7h 42m 13s logged

12hr 726mins in and still not finished debugging 💔🥀

12hr 726mins in and still not finished debugging 💔🥀

Replying to @subhansh

1
Loading more…

Followers

Loading…