resea
- 5 Devlogs
- 16 Total hours
a feed that updates you with interesting and new reserach based on your activities
a feed that updates you with interesting and new reserach based on your activities
added a new cashing system which helps save data locally in the browser so that each person gets a individualized feed in the demo website.
devlog-4
• Crawlers: OpenAlex and arXiv jobs pull open-access papers (min 5 citations) while offsets are persisted in crawler_offsets.json so each topic resumes where it left off.
• Canonical catalog: All crawled papers land in data/papers.db, ensuring every service references the same SQLite store even when multiple components run concurrently.
• Enrichment: Each new row undergoes OpenAlex metadata backfill, DOI/arXiv linking, and topic/authorship tagging so everything downstream sees complete content.
• Metrics + embeddings: classification.metrics computes trending/hybrid scores and embeddings.pipeline generates vector representations that let the feed rank novelty and relevance.
• Purifier: db_purifier.py runs after enrichment, removing paywalled, duplicate, or incomplete papers while keeping PDF/ArXiv URLs up to date.
• Feed cache: Once the catalog is clean, FeedCache stores session rows and shown IDs so the API can quickly respond without re-running heavy scoring on every request.
• API: FastAPI’s /feed handler calls build_feed(db, refresh?, client_seen_ids) which builds a FeedContext containing seen IDs, soft/hard exclusions, and user-interest signals.
• Feed logic: The engine pulls candidates via _select_from_query, applies feedback-weighted scores, injects high jitter on refresh, and enforces seen-paper penalties so every refresh reshuffles without repeating the exact same order.
• Frontend: The YouTube-style carousel shows hero/trending/high-impact rows, pulls seen IDs from localStorage, calls api.getFeed(refresh, seenIds), and records events (click/save/dismiss) to teach the feed what must never reappear.
• Feedback loop: User events plus the refresh button feed back into feed.build_feed() via soft/hard exclusions, ensuring fresh papers are promoted while clicked/saved items stay hidden forever.
Smarter Fallback: If you have seen absolutely every paper in the catalog, the algorithm will now smoothly fall back to showing you the best papers you’ve already seen rather than giving you a completely blank screen.
Clearer UI & Reset Button: I updated the frontend so if you ever somehow run out of papers again, it will proudly tell you “You’re all caught up!” and give you a button to Clear your read history so you can start over while the crawlers hunt for more.
I have updated the program and intereface to look better, it now has custom themes better crawler and more
This is like Netflix, but for research papers. It crawls the internet to index research papers and classifies them by topic. Then, a feed-based ML algorithm filters and displays a personalized list based on the user’s activity.