Devlog by @shash - Stardance

@shash on WisprFlow SDK · about 2 months ago

19m 56s logged

Well before I tell you the goal of this project, let me describe why I’m making it and what my project is trying to do. For that I’ll need to share some context.

First of all hi, I am Shashwat and it’s really nice that I’ve been given an opportunity to participate in Stardance Hack Club. I did not know how devlogs work so my first devlog was just a dummy devlog. This is the actual devlog.

I did not know that I am supposed to install a hack a time to track my code timing because it’s really frustrating that it barely tracked one hour where I’ve been working on it for over 20 hours now, split across days.

Anyways that does not matter. Coming back Wispr Flow is basically a commercial paid app made by some entity which is voice transcriptions. The way it’s different from Siri detections is that Siri does not know context and Siri does not know your names. It does not have a dictionary of you and the moment you start switching between languages if you’re bilingual and if you like to talk in a bilingual tone then it’s really difficult. It completely butchers everything and all the filler words, unnecessary stuff, etc. Every single thing is excluded, removed by Wispr Flow so you have a very professional text.

In the long term that I would want to make a Jarvis-like home assistant for myself. I already have the whole home assistant running on my Raspberry Pi and I would want it to be automated by my voice. I haven’t really found any model that can do what Wispr Flow can. As the name suggests if you literally whisper to your microphone it can catch that even if you’re far away, even if you’re mumbling half asleep, you can still catch that. I don’t know how they’ve accomplished it but they’ve accomplished it really well.

Again this is not an advertisement so I have purchased the pro tier and have paid for it but they don’t have an API. Basically they have an API but they have sunsetted the API where they don’t give it anymore because I spoke to their owners and the owner was like it was very complicated and it was not really worth the investment so that’s why they stopped the API. If I can reverse engineer it, I have full permission to do it for my personal projects and I can use it.

For the past few days, combined time over 25 hours to 30 hours, I’ve spent on reverse engineering the Wispr Flow app. Based on that the progress I have made till now is that I have figured out how their model works and what’s the structure of their queries and calls and how I structure it now. The part that’s left is to wrap it into a Python SDK because I want to programmatically be able to send that specific 10 seconds of my voice recording and just get the text back programmatically in my own Python application. That is the long-term goal.

I’ll talk to you next time or also this entire thing was dictated by Wispr Flow and I’m literally not going to correct anything because I’m pretty sure it caught everything and I’m just going to hit post. In case you find any mistakes just excuse me because I was using voice dictation.