Heyy everyone
I am currently building an android based pokedex app. The goal is simple point your phone’s camera towards a pokemon and it would use real time object detection to identify it and use pokeapi to fetch detailed stats about the pokemon.
When I first thought of this project, I wanted to make a pokedex for all pokemon. However, some simple calculations gave me a reality check. To get a deceny accuracy, I aimed to train the object detection model at 100 images per pokemon, but there are 1,025 pokemon in total, that would mean more than 100k images to collect and label, it would take months if not years for me to do it alone, that’s why I narrowed down quickly to only generation 1 pokemons, around 151 pokemons and 15,100 images which still is a lot to collect and label manually.
Instead of collecting images manually, I wanted to automate the process using the icrawler python framework. I initially tried using the built in google image crawler but it failed again and again, it returned no images, it wasnt able to detect the image tags, after some research I found out what was happening, google frequently changes their css class names to prevent bots from scraping and icrawler relied on hard coded class names.
So I switched to the in built Bing image crawler and ran a test batch of 5 pokemons, it worked, I was happy to see images of pokemons being stored in my computer. But there was another roadblock, I had aimed at a total of 500 images (100 per pokemon) but it only managed to scrape around 30 images per pokemon. That’s only 30% of my target.
To get past this I planning to build a better scraper using selenium and implement a dynamic scrolling loop to trigger lazy loaded images. I have attached today’s results, I got 156 images in total today and I hope for 500 tomorrow.
Stay tuned for the next update