Open comments for this post
https://github.com/ThisisShashwat/wisprflow-sdk/blob/main/demo/demo.md
Hey guys, I’m so excited to announce that I’ve completed everything! The SDK is done, the Path Script is done, and now I can 100% make it work. It’s been coding for over 4 hours now, after the last dev log. I’m really frustrated that this took an insane amount of time because my experience with JavaScript isn’t that much, but it was enough for me to know what Electron was and how Node works. And even the code was highly minified (it was Electron code), I was able to figure it out. I learned a lot while making this. I learned how to reverse engineer Electron apps, and I learned how to use partial scripts to make reverse engineering ways. Here, obviously I took aid from ChatGPT and Claude to help me out with the reverse engineering because I didn’t know a lot, and they’re amazing tools when it comes to learning. So I finally managed to make it; everything is done, and now I’ll be finally pushing it out!
Open comments for this post
Well before I tell you the goal of this project, let me describe why I’m making it and what my project is trying to do. For that I’ll need to share some context.
First of all hi, I am Shashwat and it’s really nice that I’ve been given an opportunity to participate in Stardance Hack Club. I did not know how devlogs work so my first devlog was just a dummy devlog. This is the actual devlog.
I did not know that I am supposed to install a hack a time to track my code timing because it’s really frustrating that it barely tracked one hour where I’ve been working on it for over 20 hours now, split across days.
Anyways that does not matter. Coming back Wispr Flow is basically a commercial paid app made by some entity which is voice transcriptions. The way it’s different from Siri detections is that Siri does not know context and Siri does not know your names. It does not have a dictionary of you and the moment you start switching between languages if you’re bilingual and if you like to talk in a bilingual tone then it’s really difficult. It completely butchers everything and all the filler words, unnecessary stuff, etc. Every single thing is excluded, removed by Wispr Flow so you have a very professional text.
In the long term that I would want to make a Jarvis-like home assistant for myself. I already have the whole home assistant running on my Raspberry Pi and I would want it to be automated by my voice. I haven’t really found any model that can do what Wispr Flow can. As the name suggests if you literally whisper to your microphone it can catch that even if you’re far away, even if you’re mumbling half asleep, you can still catch that. I don’t know how they’ve accomplished it but they’ve accomplished it really well.
Again this is not an advertisement so I have purchased the pro tier and have paid for it but they don’t have an API. Basically they have an API but they have sunsetted the API where they don’t give it anymore because I spoke to their owners and the owner was like it was very complicated and it was not really worth the investment so that’s why they stopped the API. If I can reverse engineer it, I have full permission to do it for my personal projects and I can use it.
For the past few days, combined time over 25 hours to 30 hours, I’ve spent on reverse engineering the Wispr Flow app. Based on that the progress I have made till now is that I have figured out how their model works and what’s the structure of their queries and calls and how I structure it now. The part that’s left is to wrap it into a Python SDK because I want to programmatically be able to send that specific 10 seconds of my voice recording and just get the text back programmatically in my own Python application. That is the long-term goal.
I’ll talk to you next time or also this entire thing was dictated by Wispr Flow and I’m literally not going to correct anything because I’m pretty sure it caught everything and I’m just going to hit post. In case you find any mistakes just excuse me because I was using voice dictation.
Open comments for this post
The end goal: Reverse-engineers the Wispr Flow desktop client and exposes its transcription and command APIs through a clean Python interface. Send audio files directly from Python, stream live audio, customize transcription behavior, and receive structured results — no UI interaction required.