Open comments for this post
https://github.com/ThisisShashwat/wisprflow-sdk/blob/main/demo/demo.md
Hey guys, I’m so excited to announce that I’ve completed everything! The SDK is done, the Path Script is done, and now I can 100% make it work. It’s been coding for over 4 hours now, after the last dev log. I’m really frustrated that this took an insane amount of time because my experience with JavaScript isn’t that much, but it was enough for me to know what Electron was and how Node works. And even the code was highly minified (it was Electron code), I was able to figure it out. I learned a lot while making this. I learned how to reverse engineer Electron apps, and I learned how to use partial scripts to make reverse engineering ways. Here, obviously I took aid from ChatGPT and Claude to help me out with the reverse engineering because I didn’t know a lot, and they’re amazing tools when it comes to learning. So I finally managed to make it; everything is done, and now I’ll be finally pushing it out!
Open comments for this post
Alright I am finally ready to ship it. Everything is done and it looks wonderful I have added the taskbar. I have added every single feature I wanted and I also uploaded to render so it works flawlessly
Open comments for this post
It’s been 1 hour since I’ve been building it. A task manager is left; only the icon is there and I haven’t started with the task bar at all. I have to start with the task bar. The notepad is ready. The settings app is also a dummy app. Let us see how I build it
Open comments for this post
This is my first devlog for today. All that I am doing till now is that I have read the entire guide and I have read the documentation. I am trying to learn how to really replicate the structure of an operating system in pure JavaScript. I am actually learning how to make a structured OS rather than actually doing what the guide does because I want to make it as modular as possible so I can replicate every single thing that this does in the future.
Adding apps should be really like adding apps. The way you install apps should actually be something like that in reality rather than a bad attempt at hardcoding everything. That is my goal for this: making the starting very easy so the later becomes very easy
The way I’m actually learning is through Claude and anti-gravity because they actually teach me properly how to really do it. I can keep asking them questions so I’m using AI to learn right now. This is my first hour of actual learning rather than counting
Open comments for this post
https://github.com/ThisisShashwat/wisprflow-sdk/blob/main/demo/demo.md
Hey guys, I’m so excited to announce that I’ve completed everything! The SDK is done, the Path Script is done, and now I can 100% make it work. It’s been coding for over 4 hours now, after the last dev log. I’m really frustrated that this took an insane amount of time because my experience with JavaScript isn’t that much, but it was enough for me to know what Electron was and how Node works. And even the code was highly minified (it was Electron code), I was able to figure it out. I learned a lot while making this. I learned how to reverse engineer Electron apps, and I learned how to use partial scripts to make reverse engineering ways. Here, obviously I took aid from ChatGPT and Claude to help me out with the reverse engineering because I didn’t know a lot, and they’re amazing tools when it comes to learning. So I finally managed to make it; everything is done, and now I’ll be finally pushing it out!
Open comments for this post
The end goal: Reverse-engineers the Wispr Flow desktop client and exposes its transcription and command APIs through a clean Python interface. Send audio files directly from Python, stream live audio, customize transcription behavior, and receive structured results — no UI interaction required.
Open comments for this post
Well before I tell you the goal of this project, let me describe why I’m making it and what my project is trying to do. For that I’ll need to share some context.
First of all hi, I am Shashwat and it’s really nice that I’ve been given an opportunity to participate in Stardance Hack Club. I did not know how devlogs work so my first devlog was just a dummy devlog. This is the actual devlog.
I did not know that I am supposed to install a hack a time to track my code timing because it’s really frustrating that it barely tracked one hour where I’ve been working on it for over 20 hours now, split across days.
Anyways that does not matter. Coming back Wispr Flow is basically a commercial paid app made by some entity which is voice transcriptions. The way it’s different from Siri detections is that Siri does not know context and Siri does not know your names. It does not have a dictionary of you and the moment you start switching between languages if you’re bilingual and if you like to talk in a bilingual tone then it’s really difficult. It completely butchers everything and all the filler words, unnecessary stuff, etc. Every single thing is excluded, removed by Wispr Flow so you have a very professional text.
In the long term that I would want to make a Jarvis-like home assistant for myself. I already have the whole home assistant running on my Raspberry Pi and I would want it to be automated by my voice. I haven’t really found any model that can do what Wispr Flow can. As the name suggests if you literally whisper to your microphone it can catch that even if you’re far away, even if you’re mumbling half asleep, you can still catch that. I don’t know how they’ve accomplished it but they’ve accomplished it really well.
Again this is not an advertisement so I have purchased the pro tier and have paid for it but they don’t have an API. Basically they have an API but they have sunsetted the API where they don’t give it anymore because I spoke to their owners and the owner was like it was very complicated and it was not really worth the investment so that’s why they stopped the API. If I can reverse engineer it, I have full permission to do it for my personal projects and I can use it.
For the past few days, combined time over 25 hours to 30 hours, I’ve spent on reverse engineering the Wispr Flow app. Based on that the progress I have made till now is that I have figured out how their model works and what’s the structure of their queries and calls and how I structure it now. The part that’s left is to wrap it into a Python SDK because I want to programmatically be able to send that specific 10 seconds of my voice recording and just get the text back programmatically in my own Python application. That is the long-term goal.
I’ll talk to you next time or also this entire thing was dictated by Wispr Flow and I’m literally not going to correct anything because I’m pretty sure it caught everything and I’m just going to hit post. In case you find any mistakes just excuse me because I was using voice dictation.
Open comments for this post
Well before I tell you the goal of this project, let me describe why I’m making it and what my project is trying to do. For that I’ll need to share some context.
First of all hi, I am Shashwat and it’s really nice that I’ve been given an opportunity to participate in Stardance Hack Club. I did not know how devlogs work so my first devlog was just a dummy devlog. This is the actual devlog.
I did not know that I am supposed to install a hack a time to track my code timing because it’s really frustrating that it barely tracked one hour where I’ve been working on it for over 20 hours now, split across days.
Anyways that does not matter. Coming back Wispr Flow is basically a commercial paid app made by some entity which is voice transcriptions. The way it’s different from Siri detections is that Siri does not know context and Siri does not know your names. It does not have a dictionary of you and the moment you start switching between languages if you’re bilingual and if you like to talk in a bilingual tone then it’s really difficult. It completely butchers everything and all the filler words, unnecessary stuff, etc. Every single thing is excluded, removed by Wispr Flow so you have a very professional text.
In the long term that I would want to make a Jarvis-like home assistant for myself. I already have the whole home assistant running on my Raspberry Pi and I would want it to be automated by my voice. I haven’t really found any model that can do what Wispr Flow can. As the name suggests if you literally whisper to your microphone it can catch that even if you’re far away, even if you’re mumbling half asleep, you can still catch that. I don’t know how they’ve accomplished it but they’ve accomplished it really well.
Again this is not an advertisement so I have purchased the pro tier and have paid for it but they don’t have an API. Basically they have an API but they have sunsetted the API where they don’t give it anymore because I spoke to their owners and the owner was like it was very complicated and it was not really worth the investment so that’s why they stopped the API. If I can reverse engineer it, I have full permission to do it for my personal projects and I can use it.
For the past few days, combined time over 25 hours to 30 hours, I’ve spent on reverse engineering the Wispr Flow app. Based on that the progress I have made till now is that I have figured out how their model works and what’s the structure of their queries and calls and how I structure it now. The part that’s left is to wrap it into a Python SDK because I want to programmatically be able to send that specific 10 seconds of my voice recording and just get the text back programmatically in my own Python application. That is the long-term goal.
I’ll talk to you next time or also this entire thing was dictated by Wispr Flow and I’m literally not going to correct anything because I’m pretty sure it caught everything and I’m just going to hit post. In case you find any mistakes just excuse me because I was using voice dictation.
Open comments for this post
The end goal: Reverse-engineers the Wispr Flow desktop client and exposes its transcription and command APIs through a clean Python interface. Send audio files directly from Python, stream live audio, customize transcription behavior, and receive structured results — no UI interaction required.