You are browsing as a guest. Sign up (or log in) to start making projects!

MonkeySpeak

  • 6 Devlogs
  • 6 Total hours

track how fast you can speak .. as that is going to be the new way of communicating with tech

Open comments for this post

2h 54m 39s logged

monkeyspeak final devlog before ship ( hopefully )


tl;dr

  • so brave and edge broke everything ( ye read what broke for more info )
  • deepgram mode now transcribes live text again (words dissolve, wpm moves, momentum still reacts to your voice)
  • brave and edge no longer hang for 25 seconds then fake a mic error
  • production uses a render websocket proxy + vercel env, same path that worked locally
  • github: nothariharan/monkeyspeak (commit 095b731 and earlier speech routing work on main)
  • live: monkeyspeak-delta.vercel.app

what i changed

speech routing (back to something that made sense)

mode behavior browser web speech api only. no deepgram hijack on brave/edge mount. deepgram try deepgram first (proxy → bridge fallback on chrome only). if that fails, fall back to web speech with a clear error.

the ui now shows errors for the provider that actually failed, not a generic “mic blocked” when stt died for other reasons.

deepgram client hardening

  • prefer var name ur wish websocket proxy over the vercel http bridge
  • brave/edge without a reachable proxy: fail fast with a useful message (no 25s timeout)
  • utterance_end_ms locked to 1000 everywhere (deepgram rejects live ws with 400 below that)
  • bridge watchdog clears on BridgeReady, not on first random chunk
  • server bridge waits for upstream deepgram before closing the socket

the transcript fix (the big one)

  • client parses deepgram json whether it arrives as a string, blob, or arraybuffer
  • render proxy forwards deepgram replies as utf-8 text frames instead of opaque binary

infra

  • backend/ express + ws proxy deployable on render (render.yaml included)

  • set up env with ur var name at vercel

  • redeployed frontend via vercel cli after env update

small polish

  • clearer vad fallback logs (worker load fail vs no voice detected vs timeout)
  • config bar hint on brave/edge when browser mode is selected
  • momentum sprites + gsap monkey states (from earlier in the sprint)

what broke (and why it looked cursed)

  1. brave and edge block or mishandle the vercel http audio bridge (duplex fetch upload), so deepgram connections hung ~25s and never returned transcripts.

  2. those browsers also cannot open an authenticated websocket straight to api.deepgram.com, so they need the render ws proxy (wss://…/api/deepgram/proxy) instead of the chrome-friendly paths.

  3. even after the proxy worked, deepgram sent json in binary ws frames and the client ignored anything that was not a string, so words never updated until we parsed blob/arraybuffer payloads.


what’s next (maybe)

  • vendor ort wasm for vad so the worker stops complaining
  • gsap scale split for clean console
  • render keep-alive or paid instance if cold starts hurt demos
  • preview env on vercel for pr deployments (production is wired today)

sorry if i yapped a lot and the fixes where actually slighly more techy stuff so i didnt wanna yap abt that as wel so yeah if u want to know lmk in replies :)

monkeyspeak final devlog before ship ( hopefully )


tl;dr

  • so brave and edge broke everything ( ye read what broke for more info )
  • deepgram mode now transcribes live text again (words dissolve, wpm moves, momentum still reacts to your voice)
  • brave and edge no longer hang for 25 seconds then fake a mic error
  • production uses a render websocket proxy + vercel env, same path that worked locally
  • github: nothariharan/monkeyspeak (commit 095b731 and earlier speech routing work on main)
  • live: monkeyspeak-delta.vercel.app

what i changed

speech routing (back to something that made sense)

mode behavior browser web speech api only. no deepgram hijack on brave/edge mount. deepgram try deepgram first (proxy → bridge fallback on chrome only). if that fails, fall back to web speech with a clear error.

the ui now shows errors for the provider that actually failed, not a generic “mic blocked” when stt died for other reasons.

deepgram client hardening

  • prefer var name ur wish websocket proxy over the vercel http bridge
  • brave/edge without a reachable proxy: fail fast with a useful message (no 25s timeout)
  • utterance_end_ms locked to 1000 everywhere (deepgram rejects live ws with 400 below that)
  • bridge watchdog clears on BridgeReady, not on first random chunk
  • server bridge waits for upstream deepgram before closing the socket

the transcript fix (the big one)

  • client parses deepgram json whether it arrives as a string, blob, or arraybuffer
  • render proxy forwards deepgram replies as utf-8 text frames instead of opaque binary

infra

  • backend/ express + ws proxy deployable on render (render.yaml included)

  • set up env with ur var name at vercel

  • redeployed frontend via vercel cli after env update

small polish

  • clearer vad fallback logs (worker load fail vs no voice detected vs timeout)
  • config bar hint on brave/edge when browser mode is selected
  • momentum sprites + gsap monkey states (from earlier in the sprint)

what broke (and why it looked cursed)

  1. brave and edge block or mishandle the vercel http audio bridge (duplex fetch upload), so deepgram connections hung ~25s and never returned transcripts.

  2. those browsers also cannot open an authenticated websocket straight to api.deepgram.com, so they need the render ws proxy (wss://…/api/deepgram/proxy) instead of the chrome-friendly paths.

  3. even after the proxy worked, deepgram sent json in binary ws frames and the client ignored anything that was not a string, so words never updated until we parsed blob/arraybuffer payloads.


what’s next (maybe)

  • vendor ort wasm for vad so the worker stops complaining
  • gsap scale split for clean console
  • render keep-alive or paid instance if cold starts hurt demos
  • preview env on vercel for pr deployments (production is wired today)

sorry if i yapped a lot and the fixes where actually slighly more techy stuff so i didnt wanna yap abt that as wel so yeah if u want to know lmk in replies :)

Replying to @hariharann

2
Ship #1 Changes requested

monkeyspeak v0 is live 🙊

been building this for a while --- it's the spoken equivalent of monkeytype. you get a prompt, hit the mic, read it out loud, and the app tracks how fast and how accurately you speak.

what's in v0:

speed mode
- pick 15s, 30s, 60s, or 120s
- live net wpm (filler words like "um" and "uh" get stripped)
- sentences, numbers, or paste your own custom text
- words dissolve on screen as you nail them
- monkey mascot animates based on your speaking momentum
- personal bests per duration + prompt type

clarity mode
- type or paste what you said
- word-level diff against the original prompt
- grades from s down to "needs work"
- practice mode that rebuilds a prompt from words you missed

ui refresh
- new landing hero: "how fast⚡can you speak 🙊"
- the monkey mascot IS the start button now — click it to begin
- speech-themed doodles, clean flat design, no gradient bloat
- sprite animations via gsap

under the hood
- next.js 14 + typescript + zustand
- web speech api by default (no api key needed)
- optional deepgram integration with server-side proxy (key never hits the browser)
- silero vad for voice activity detection

try it: clone the repo, npm install, npm run dev, allow mic access, go.

github.com/nothariharan/monkeyspeak

  • 6 devlogs
  • 6h
Try project → See source code →
Open comments for this post

26m 39s logged

reworked the ui and got monkeyspeak ready for v0 ship launch :))

it’s basically monkeytype but for your voice — read a prompt out loud, get scored on speed + clarity.

features:

  • speed mode: timed speaking tests (15s / 30s / 60s / 120s) with live wpm + word accuracy
  • clarity mode: paste your transcript, get a word-by-word diff + letter grade
  • click the monkey mascot to start (no separate mic button anymore lol)
  • animated monkey companion that reacts to your speaking energy mid-test
  • words dissolve on screen as you say them correctly
  • personal bests saved locally
  • browser speech api works out of the box, optional deepgram for better stt
  • themes, accent colors, custom fonts
  • keyboard shortcuts (enter to start, tab to reset, escape to stop)

built with next.js, typescript, gsap, zustand.

repo: github.com/nothariharan/monkeyspeak

reworked the ui and got monkeyspeak ready for v0 ship launch :))

it’s basically monkeytype but for your voice — read a prompt out loud, get scored on speed + clarity.

features:

  • speed mode: timed speaking tests (15s / 30s / 60s / 120s) with live wpm + word accuracy
  • clarity mode: paste your transcript, get a word-by-word diff + letter grade
  • click the monkey mascot to start (no separate mic button anymore lol)
  • animated monkey companion that reacts to your speaking energy mid-test
  • words dissolve on screen as you say them correctly
  • personal bests saved locally
  • browser speech api works out of the box, optional deepgram for better stt
  • themes, accent colors, custom fonts
  • keyboard shortcuts (enter to start, tab to reset, escape to stop)

built with next.js, typescript, gsap, zustand.

repo: github.com/nothariharan/monkeyspeak

Replying to @hariharann

0
Open comments for this post

16m 53s logged

made the ui more clean still settling the latency here and there

current new approach i am building around is using the deepgram api key for initial purpose later trying to replicate what they do

open to questions or recommendations for the whole stt etc.

made the ui more clean still settling the latency here and there

current new approach i am building around is using the deepgram api key for initial purpose later trying to replicate what they do

open to questions or recommendations for the whole stt etc.

Replying to @hariharann

1
Open comments for this post

36m 14s logged

cooking up 🚀🔥

soon making it public just lots of latency issue in speaking to text conversion who knew it would be a pain but yeah let me know if u guys think of an alternative for it or anything at all

thanks for reading :)

cooking up 🚀🔥

soon making it public just lots of latency issue in speaking to text conversion who knew it would be a pain but yeah let me know if u guys think of an alternative for it or anything at all

thanks for reading :)

Replying to @hariharann

0
Open comments for this post

1h 52m 58s logged

going with a much better ui soon launching :) 🚀
for yall to try it

going with a much better ui soon launching :) 🚀
for yall to try it

Replying to @hariharann

0
Open comments for this post

16m 1s logged

built the first iteration of the website !

got lots of customizing options now i am working more towards reducing the latency between what u speak and what is being transcribed

give me your opinions and recommendations :)

built the first iteration of the website !

got lots of customizing options now i am working more towards reducing the latency between what u speak and what is being transcribed

give me your opinions and recommendations :)

Replying to @hariharann

0

Followers

Loading…