You are browsing as a guest. Sign up (or log in) to start making projects!

Open comments for this post

1h 21m 13s logged

Devlog - 01

A few weeks ago I almost helped find $20K buried somewhere in the American southwest. ATLAS is the image recognition pipeline I built to try.

It Works by taking a feed of images (50+ for best results, I used 625 for my tests) sending them through Various Local VLMs for inference. (Via LM Studio)

I’ve been working on this project for weeks, decided to share it with everyone here.

Currently I am working on Version 3.3.
Previous models of ATLAS only utilized one VLM for inference, I had noticed the outputs to be significantly off course. So In V3.3 I designed it to use an ensemble system.

Instead of using one model, it cross checks data between 2 or more. Then determines the winning output from all models.

This way if one model hallucinates and the others don’t, Your data wont be skewed… Though I am still working out some bugs, as V3.3 has been off course by a MARGIN.

It runs a user defined boundary. So instead of having to manually check coordinates yourself you’d know if it was incorrect instantly.

Though V3.3 doesn’t really respect the boundary, so I need to get that fixed.

-(My latest test ran 4,355Km off target. this is a known geometry bug in the coordinate derivation, not the inference pipeline itself)

The speed at which ATLAS Processes images depends entirely on your hardware and number/type of model(s) you are using.

Example:

(Using qwen 2.5 7B (Q4_K_M), and Moondream 2 (Q4)

  • 600 Frames

Estimated Times

  • Best tier GPU: RTX 4090, Dual RTX 3090s, Mac Studio M3 Ultra, etc…

  • Runtime:
    Qwen 2.5 7B ~0.25 seconds
    Moondream2 ~0.08 seconds
    Total: 3 to 4 minutes

  • High Tier GPU: RTX 4080, RTX 4070 Ti Super, M3 Max, etc…

  • Runtime
    Qwen 2.5 7B ~0.60 seconds
    Moondream2 ~0.20 seconds
    Total: 6 to 8 minutes

  • Mid tier GPU: RTX 4060 Ti (16GB), RTX 4070, M3 Pro, etc…

  • Runtime
    Qwen 2.5 7B ~2.00 seconds
    Moondream2 ~0.50 seconds
    Total: 22 to 30 minutes

  • Low tier GPU: RTX 3050, GTX 1660 Super, Base M1/M2 Mac, etc…

  • Runtime:
    Qwen 2.5 7B ~5.00 seconds
    Moondream2 ~1.20 seconds
    Total: 1 to 1.5 hours

  • CPU Only: Core i9 / Ryzen 9 (32GB DDR5 RAM), etc…

  • Runtime:
    Qwen 2.5 7B ~35.00 seconds
    Moondream2 ~7.00 seconds
    Total: 6 to 8 hours

(tested on RX 9070 XT 16GB)

Updated features:
Global Luminance Profiling: maps brightness across the entire dataset to skip night frames before inference, saves ~10% compute

Triple-Model Consensus Pass: runs 3 VLMs in parallel and cross-checks outputs to filter hallucinations

Orbital Mechanics Longitude Tracking: derives longitude from solar noon timing in image timestamps

Haversine Validator: checks derived coordinates against the search boundary using spherical earth math

Currently working on fixing shadow angle calculation (Keeps returning unknown for certain vectors), and Search area constraint bias.
(Image shows V3.3 running with a Tri-Qwen ensemble)

3

Comments 3

@Krshs90

Amazing project!

@eli_ozcan

one question: why are you using language models? Isn’t this a task better suited for vision-only models like YOLO?

@raviiiibro

yeah thats wright