You are browsing as a guest. Sign up (or log in) to start making projects!

Churu

@Churu

Joined June 4th, 2026

  • 2Devlogs
  • 4Projects
  • 2Ships
  • 15Votes
Ship Pending review

I built a Python voice assistant called Jarvis. It listens for a wake word (“Jarvis”), processes spoken input, and then responds using text-to-speech or executes basic system actions like opening apps, running commands, or answering queries through an AI/API layer.

How it works (in simple terms)

The flow is basically:

Microphone input
The program constantly listens through the mic until it detects the wake word.
Wake word detection
Once “Jarvis” is detected, it switches into active listening mode.
Speech-to-text
Your speech is converted into text using a speech recognition engine.
Processing layer
The text is either:
matched to predefined commands (like opening apps, searching, etc.), or
sent to an AI model/API for a response
Response output
The response is converted back to speech and played through speakers.
State control
It manages states like “listening”, “thinking”, and “speaking” so it doesn’t overlap audio input and output.
devlog / what actually went wrong

At the start it looked simple, but most of the time was spent fixing edge cases and weird failures.

Mic input was the first issue. It either didn’t pick up sound or used the wrong device entirely. On my system it worked, but on another setup it silently failed because the default audio index was different. I had to explicitly handle device selection instead of relying on defaults.

Speech recognition was inconsistent too. It would misinterpret short commands or fail when there was background noise. I had to tweak sensitivity and improve how it handled pauses in speech so it wouldn’t cut off early.

One major mistake was not separating speaking and listening states. The assistant would sometimes hear its own response and trigger itself again, creating loops. I fixed this by adding a simple lock so it cannot listen while speaking.

The biggest structural issue came later with packaging. It worked perfectly when running python main.py, but broke when converted into an executable. The dist version failed due to missing modules, broken file paths, and hardcoded system-specific directories.

I initially assumed the environment would behave the same after packaging, but PyInstaller changed everything. I had to fix relative paths, remove hardcoded references to my machine, and manually ensure hidden dependencies were included.

API and environment variables also caused issues. The program would run fine in terminal but fail in packaged form because the .env file or API key wasn’t being loaded correctly in the new runtime context.

what I learned from it

Most of the problems were not in “AI logic” but in system-level behavior: audio devices, file paths, packaging, and runtime differences.

The main takeaways were:

voice input systems are fragile and device-dependent
Python packaging is not equivalent to running a script
state management matters more than expected in real-time systems
debugging usually comes from environment mismatch, not code logic itself

  • 1 devlog
  • 10h
Try project → See source code →
Open comments for this post

10h 24m 46s logged

so i made this voice assistant “jarvis” in python
basically it listens for a wake word, then you talk to it, and it replies or runs stuff like opening apps / answering questions / whatever i wired into it
I built a Python voice assistant called Jarvis. It listens for a wake word “Jarvis”, processes spoken input, and then responds using tts or executes basic system actions like opening apps, running commands, or answering queries through an AI/API layer.

How it works (in simple terms)

The flow is basically:

  1. Microphone input
    The program constantly listens through the mic until it detects the wake word.
  2. Wake word detection
    Once “Jarvis” is detected, it switches into active listening mode.
  3. Speech-to-text
    Your speech is converted into text using a speech recognition engine.
  4. Processing layer
    The text is either:
    matched to predefined commands (like opening apps, searching, etc.), or
    sent to an AI model/API for a response
  5. Response output
    The response is converted back to speech and played through speakers.
  6. State control
    It manages states like “listening”, “thinking”, and “speaking” so it doesn’t overlap audio input and output.
    at first it was kinda simple but then everything started breaking

mic input was the first pain
it either didn’t hear me or picked up the wrong device
spent way too long just figuring out why it was “silent” but actually listening to the wrong mic

then speech recognition started being annoying
like it would randomly misunderstand words or just freeze if i spoke too fast

also made the classic mistake of letting it listen while it was speaking
so it would hear its own voice and start looping responses , so i had to add a lock so it doesnt talk while talking

biggest headache was the API / env stuff
worked fine when i ran python main.py
then i packaged it and suddenly everything was “missing key / file not found / module not found”

turned out i had hardcoded paths and assumed my machine layout is universal fixed that by making paths dynamic + cleaning up env loading

also packaging with pyinstaller was pain
some imports just randomly didn’t show up in dist build and i had to manually force include them

now it mostly works but i still feel like it’s fragile,
one change and it might break again.

but yeah overall it taught me:

audio stuff is way harder than it looks
packaging python apps is cursed
and debugging voice stuff is basically just “try everything until it works”

Original post
@Churu

so i made this voice assistant “jarvis” in python
basically it listens for a wake word, then you talk to it, and it replies or runs stuff like opening apps / answering questions / whatever i wired into it
I built a Python voice assistant called Jarvis. It listens for a wake word “Jarvis”, processes spoken input, and then responds using tts or executes basic system actions like opening apps, running commands, or answering queries through an AI/API layer.

How it works (in simple terms)

The flow is basically:

  1. Microphone input
    The program constantly listens through the mic until it detects the wake word.
  2. Wake word detection
    Once “Jarvis” is detected, it switches into active listening mode.
  3. Speech-to-text
    Your speech is converted into text using a speech recognition engine.
  4. Processing layer
    The text is either:
    matched to predefined commands (like opening apps, searching, etc.), or
    sent to an AI model/API for a response
  5. Response output
    The response is converted back to speech and played through speakers.
  6. State control
    It manages states like “listening”, “thinking”, and “speaking” so it doesn’t overlap audio input and output.
    at first it was kinda simple but then everything started breaking

mic input was the first pain
it either didn’t hear me or picked up the wrong device
spent way too long just figuring out why it was “silent” but actually listening to the wrong mic

then speech recognition started being annoying
like it would randomly misunderstand words or just freeze if i spoke too fast

also made the classic mistake of letting it listen while it was speaking
so it would hear its own voice and start looping responses , so i had to add a lock so it doesnt talk while talking

biggest headache was the API / env stuff
worked fine when i ran python main.py
then i packaged it and suddenly everything was “missing key / file not found / module not found”

turned out i had hardcoded paths and assumed my machine layout is universal fixed that by making paths dynamic + cleaning up env loading

also packaging with pyinstaller was pain
some imports just randomly didn’t show up in dist build and i had to manually force include them

now it mostly works but i still feel like it’s fragile,
one change and it might break again.

but yeah overall it taught me:

audio stuff is way harder than it looks
packaging python apps is cursed
and debugging voice stuff is basically just “try everything until it works”

Replies

Loading replies…

0
1
Ship Changes requested

Jarvis 2.0 (Echo) is a desktop AI assistant/software built to automate tasks, launch apps, process commands, and act as a customizable personal assistant. The biggest challenge was connecting different systems together cleanly while keeping everything fast and organized. I’m most proud that it feels like a real product instead of separate scripts. To test it, install dependencies, run the app, and try the built-in commands/features listed in the README.

  • 1 devlog
  • 2h
Try project → See source code →
Open comments for this post

2h 10m 3s logged

Echo is a simple but powerful Slack bot built to make communication inside a team feel smoother, faster, and less messy. It is designed with the idea that a good bot should not just sit there and reply, but should actually help people get things done without making them waste time on small repetitive tasks. Echo focuses on making everyday workspace interaction easier by responding quickly, keeping things organized, and acting like a helpful digital teammate inside Slack. The project reflects a practical and modern approach to automation, where the goal is not to make something flashy for no reason, but to build something genuinely useful, clean, and easy to work with. What makes Echo special is that it is meant to feel natural in a team environment — simple enough for anyone to use, but still smart enough to be helpful in real workflows. It shows a clear interest in building tools that save time, improve productivity, and make collaboration feel more effortless. Overall, Echo is a thoughtful first step into building real-world assistant software, and it has the foundation to grow into a very capable and reliable workspace companion. It opens apps and websited on command, even though yet it is only configured for Apps like Browsers, Games, Terminal, Useful apps like VS Code it is a very useful assistant, coming along with two modes - GUI and CLI
This is GUI , CLI is just normal terminal where you can speak

In my Arch i have set it as an automatic startup app in background so it doesnt show and whenever i press the hotkey that i have used for it i.e “Ctrl + Space” it turns on and as soon as i speak the name it recognises and opens the app for me

Original post
@Churu

Echo is a simple but powerful Slack bot built to make communication inside a team feel smoother, faster, and less messy. It is designed with the idea that a good bot should not just sit there and reply, but should actually help people get things done without making them waste time on small repetitive tasks. Echo focuses on making everyday workspace interaction easier by responding quickly, keeping things organized, and acting like a helpful digital teammate inside Slack. The project reflects a practical and modern approach to automation, where the goal is not to make something flashy for no reason, but to build something genuinely useful, clean, and easy to work with. What makes Echo special is that it is meant to feel natural in a team environment — simple enough for anyone to use, but still smart enough to be helpful in real workflows. It shows a clear interest in building tools that save time, improve productivity, and make collaboration feel more effortless. Overall, Echo is a thoughtful first step into building real-world assistant software, and it has the foundation to grow into a very capable and reliable workspace companion. It opens apps and websited on command, even though yet it is only configured for Apps like Browsers, Games, Terminal, Useful apps like VS Code it is a very useful assistant, coming along with two modes - GUI and CLI
This is GUI , CLI is just normal terminal where you can speak

In my Arch i have set it as an automatic startup app in background so it doesnt show and whenever i press the hotkey that i have used for it i.e “Ctrl + Space” it turns on and as soon as i speak the name it recognises and opens the app for me

Replies

Loading replies…

0
16

Followers

Loading…