Okay, it’s been a while…
I’ve been struggling with this project for a long time. First, cleaning up a massive amount of data took ages. By the end, I had around 100k images in total! Then came the training… and honestly, ugh. Constant tweaks and trial-and-error almost drove me insane. The total training time is surely 100+ hours, considering all the restarts and reruns.
But finally, I’ve got something like a working model, and that’s what I’m sharing now. I’ve also finished the frontend, and everything is finally available for you to try. It’s also open-source (though I couldn’t share the dataset due to uncertain sources and potential legal concerns).
Any feedback is greatly appreciated!
So I’m building an app that counts money from a photo, but honestly, local training has been a total nightmare. Spent 12 hours straight just cleaning and stitching datasets together, only for training on my 3070 to go horribly because of VRAM limits forcing me to downscale everything. Looking at the test image, you can see it’s detecting something, but it completely misses the fine details and fails to classify them because it has zero concept of relative sizes. I’m pivoting to a Transformer-based model (RF-DETR) so it can actually look at the whole picture and compare coin sizes better. Renting a beefy GPU in the cloud to do the training.🙄