You are browsing as a guest. Sign up (or log in) to start making projects!

Open comments for this post

7h 6m 19s logged

This session was supposed to take two hours and took seven because I assumed fetching and parsing a news article from a URL would be straightforward. It is not: axios plus cheerio works fine for AP and Reuters, NYT and WaPo render their bodies client-side so the raw HTML comes back nearly empty, Puppeteer solved that but added 5 seconds of headless Chrome startup time to every run, and @mozilla/readability worked better across most sites but choked on paywalls. I ended up with a three-layer fallback that tries readability first, then the longest <article> tag, then the largest div by text length, which covers about 70% of sites. The session ended with a clean formatted terminal report using chalk for color-coded confidence labels, which was the first time the tool looked like something a person would actually use.

0
1

Comments 0

No comments yet. Be the first!