This session was supposed to take two hours and took seven because I assumed fetching and parsing a news article from a URL would be straightforward. It is not: axios plus cheerio works fine for AP and Reuters, NYT and WaPo render their bodies client-side so the raw HTML comes back nearly empty, Puppeteer solved that but added 5 seconds of headless Chrome startup time to every run, and @mozilla/readability worked better across most sites but choked on paywalls. I ended up with a three-layer fallback that tries readability first, then the longest <article> tag, then the largest div by text length, which covers about 70% of sites. The session ended with a clean formatted terminal report using chalk for color-coded confidence labels, which was the first time the tool looked like something a person would actually use.
Comments 0
No comments yet. Be the first!
Sign in to join the conversation.