Open comments for this post
Devlog - 01
A few weeks ago I almost helped find $20K buried somewhere in the American southwest. ATLAS is the image recognition pipeline I built to try.
It Works by taking a feed of images (50+ for best results, I used 625 for my tests) sending them through Various Local VLMs for inference. (Via LM Studio)
I’ve been working on this project for weeks, decided to share it with everyone here.
Currently I am working on Version 3.3.
Previous models of ATLAS only utilized one VLM for inference, I had noticed the outputs to be significantly off course. So In V3.3 I designed it to use an ensemble system.
Instead of using one model, it cross checks data between 2 or more. Then determines the winning output from all models.
This way if one model hallucinates and the others don’t, Your data wont be skewed… Though I am still working out some bugs, as V3.3 has been off course by a MARGIN.
It runs a user defined boundary. So instead of having to manually check coordinates yourself you’d know if it was incorrect instantly.
Though V3.3 doesn’t really respect the boundary, so I need to get that fixed.
-(My latest test ran 4,355Km off target. this is a known geometry bug in the coordinate derivation, not the inference pipeline itself)
The speed at which ATLAS Processes images depends entirely on your hardware and number/type of model(s) you are using.
Example:
(Using qwen 2.5 7B (Q4_K_M), and Moondream 2 (Q4)
Estimated Times
-
Best tier GPU: RTX 4090, Dual RTX 3090s, Mac Studio M3 Ultra, etc…
-
Runtime:
Qwen 2.5 7B ~0.25 seconds
Moondream2 ~0.08 seconds
Total: 3 to 4 minutes
-
High Tier GPU: RTX 4080, RTX 4070 Ti Super, M3 Max, etc…
-
Runtime
Qwen 2.5 7B ~0.60 seconds
Moondream2 ~0.20 seconds
Total: 6 to 8 minutes
-
Mid tier GPU: RTX 4060 Ti (16GB), RTX 4070, M3 Pro, etc…
-
Runtime
Qwen 2.5 7B ~2.00 seconds
Moondream2 ~0.50 seconds
Total: 22 to 30 minutes
-
Low tier GPU: RTX 3050, GTX 1660 Super, Base M1/M2 Mac, etc…
-
Runtime:
Qwen 2.5 7B ~5.00 seconds
Moondream2 ~1.20 seconds
Total: 1 to 1.5 hours
-
CPU Only: Core i9 / Ryzen 9 (32GB DDR5 RAM), etc…
-
Runtime:
Qwen 2.5 7B ~35.00 seconds
Moondream2 ~7.00 seconds
Total: 6 to 8 hours
(tested on RX 9070 XT 16GB)
Updated features:
Global Luminance Profiling: maps brightness across the entire dataset to skip night frames before inference, saves ~10% compute
Triple-Model Consensus Pass: runs 3 VLMs in parallel and cross-checks outputs to filter hallucinations
Orbital Mechanics Longitude Tracking: derives longitude from solar noon timing in image timestamps
Haversine Validator: checks derived coordinates against the search boundary using spherical earth math
Currently working on fixing shadow angle calculation (Keeps returning unknown for certain vectors), and Search area constraint bias.
(Image shows V3.3 running with a Tri-Qwen ensemble)