wll do later
wll do later
Will update later
long day will update later
devlog will update later
devlog will update later
Devlog will update later
devpost will update later
did some work
refactoring some logic
I did a cleanup session today: moved hardcoded API keys from config.js into .env, added a startup check that exits immediately with a clear message if any required key is missing, and split a 650-line index.js into pipeline.js, cli.js, and utils.js. The tool runs cleanly on my machine and I have been using it on articles I read this past week, but it requires setting up two API keys to run so nobody else can actually use it yet, and figuring out distribution is probably the next real question.
This session was supposed to take two hours and took seven because I assumed fetching and parsing a news article from a URL would be straightforward. It is not: axios plus cheerio works fine for AP and Reuters, NYT and WaPo render their bodies client-side so the raw HTML comes back nearly empty, Puppeteer solved that but added 5 seconds of headless Chrome startup time to every run, and @mozilla/readability worked better across most sites but choked on paywalls. I ended up with a three-layer fallback that tries readability first, then the longest <article> tag, then the largest div by text length, which covers about 70% of sites. The session ended with a clean formatted terminal report using chalk for color-coded confidence labels, which was the first time the tool looked like something a person would actually use.
I rewrote the whole pipeline in Node.js and parallelized all the agents because the Python sequential version was timing a 7-claim article at 68 seconds, which is not usable. The rewrite took about 90 minutes and the rest of the session was debugging differences between the Python and Node versions, mostly around how I had structured the retry logic. Running 14 concurrent API requests at once got me rate-limited immediately, so I added a concurrency queue capped at 4 simultaneous calls, and the same article that took 68 seconds dropped to 14 seconds.
Two things kept crashing longer test runs so I fixed them both today. Claude was returning malformed JSON about one in fifteen runs despite explicit instructions, either wrapping output in backticks or adding a trailing comma, so I added a cleanup step before JSON.parse and a retry that sends the raw response back with “this was not valid JSON, return only the array with no other text.” The second issue was unverifiable claims burning three full search iterations and returning nothing, so I added a classification step upfront where subjective claims skip the loop entirely and return a subjective_claim flag instead.
I added the divergence layer today, which surfaces how outlets with different political leans frame the same underlying fact rather than just whether the fact is accurate. I first tried scraping lean ratings from AllSides at runtime but scrapped it after 45 minutes because it was fragile, probably against their ToS, and too slow for a per-request lookup. I replaced it with a hardcoded lookup table sourced from Ad Fontes Media data covering about 40 major outlets, where anything not in the table returns “Unknown” rather than a guess. Testing it on a border crossings story came back with CNN describing the encounter numbers as a humanitarian crisis, Fox describing the same numbers as a Biden policy failure, and the DHS primary source showing the raw figure all three were citing.
I started with SerpAPI for web search and spent the first two hours realizing every result it returned was months old because of aggressive free-tier caching, then switched to Brave Search API and got current results immediately. My first verification approach was a single prompt passing the claim and search results to Claude at once, but the outputs were shallow and the first search query was usually wrong anyway, so I built an iterative loop where Claude picks a query, I run the search, and Claude decides whether to go deeper or return a result. The loop ran forever in early testing until I added a hard stopping rule to the prompt: “If you have at least 2 independent sources that confirm or deny the claim, stop searching and return your assessment now.” By the end of the day, the pipeline surfaced CNN and Fox both citing the same BLS statistic with completely different framings alongside the actual BLS.gov primary source, which was exactly the output I had been trying to build toward.
I caught a bad bug where the extractor was hallucinating numbers: I ran the same AP unemployment article three times and the reported rate came out as 4.2%, 4.1%, and 3.9% across the three runs when the article clearly says 4.2%. The model was filling in from training data instead of reading the text in front of it, which is a subtle failure mode because the numbers look plausible enough that you would never notice without explicitly checking. The fix was adding a verbatim extraction instruction to the system prompt and setting temperature to 0.
I started this after my dad and I spent 45 minutes arguing over an immigration article where we were both citing different sources reporting the same statistic at different numbers, and neither of us could figure out who was right because we couldn’t agree on which source to trust. My first instinct was a binary fact-checker that returns TRUE or FALSE, but the first time I prompted Claude that way I got confident verdicts with no sources and realized immediately that a confidently wrong answer with no citations is worse than no answer at all. I tried GPT-4 for claim extraction and kept getting paragraph-length analysis instead of a JSON array, then switched to Claude and got clean structured output on the first attempt once I tightened the prompt.
Extract factual claims from the article below. Return ONLY a JSON array of strings.
No explanation before the array. No explanation after. No markdown fences.
I finished the session with a Python script that takes pasted article text and returns a JSON array of 5-8 discrete factual claims, all traceable to specific sentences in the article.