resea - Stardance

Open comments for this post

@jithesh_sarvin on resea · about 1 month ago

21m 39s logged

thank you for using this website. I updated the architecture for better recommendation

Open comments for this post

@jithesh_sarvin on resea · about 2 months ago

54m 18s logged

added a new cashing system which helps save data locally in the browser so that each person gets a individualized feed in the demo website.

Ship #1

@jithesh_sarvin on resea · about 2 months ago

What did you make?
Research Feed — a personalized research paper discovery app. FastAPI backend + Next.js frontend, lightweight crawler pipeline (arXiv + OpenAlex) that populates a SQLite DB, a recommender/feed engine, and a demo deployment so people can try the site.

What was challenging?
Making the whole pipeline robust end-to-end: deduplication, citation backfill, handling OpenAlex/semantic‑scholar rate limits, purifier rules so we don’t delete useful papers, and getting Next.js + API routing right behind nginx on the Nest host. Deployment constraints (Nest environment, no Docker initially) added friction too.

What are you proud of?
A working interactive demo with a live crawler and ≈570 papers, a usable frontend (feed, search, paper pages), an automated demo deploy flow (Docker/demo scripts + non‑Docker bootstrap), and fixes for tricky issues (routing, citation backfill, offsets, and safer purifier logic).

What should people know so they can test your project?

Open the demo: https://jithesh.hackclub.app
Try these flows: browse Home feed, click Search, open a paper, open the PDF (external), save a paper to Library, and use the refresh button on the feed.
API checks: /api/stats (paper count), /api/feed, /api/papers/:id return JSON.
If the UI shows JSON, hard-refresh (Ctrl+Shift+R) — nginx was fixed so UI routes should render HTML.
To repro crawler/DB behavior (server access): tail logs at /root/research-feed/logs and manage services with systemctl (research-feed-api, research-feed-frontend).
Report any broken link, missing PDF, or pages that show raw JSON and I’ll fix it quickly.
Want me to craft a one‑paragraph summary for the project page or the “What did you make” field?

4 devlogs
15h
10.69x multiplier
165 Stardust

Try project → See source code →

Open comments for this post

@jithesh_sarvin on resea · about 2 months ago

8h 29m 46s logged

devlog-4
• Crawlers: OpenAlex and arXiv jobs pull open-access papers (min 5 citations) while offsets are persisted in crawler_offsets.json so each topic resumes where it left off.
• Canonical catalog: All crawled papers land in data/papers.db, ensuring every service references the same SQLite store even when multiple components run concurrently.
• Enrichment: Each new row undergoes OpenAlex metadata backfill, DOI/arXiv linking, and topic/authorship tagging so everything downstream sees complete content.
• Metrics + embeddings: classification.metrics computes trending/hybrid scores and embeddings.pipeline generates vector representations that let the feed rank novelty and relevance.
• Purifier: db_purifier.py runs after enrichment, removing paywalled, duplicate, or incomplete papers while keeping PDF/ArXiv URLs up to date.
• Feed cache: Once the catalog is clean, FeedCache stores session rows and shown IDs so the API can quickly respond without re-running heavy scoring on every request.
• API: FastAPI’s /feed handler calls build_feed(db, refresh?, client_seen_ids) which builds a FeedContext containing seen IDs, soft/hard exclusions, and user-interest signals.
• Feed logic: The engine pulls candidates via _select_from_query, applies feedback-weighted scores, injects high jitter on refresh, and enforces seen-paper penalties so every refresh reshuffles without repeating the exact same order.
• Frontend: The YouTube-style carousel shows hero/trending/high-impact rows, pulls seen IDs from localStorage, calls api.getFeed(refresh, seenIds), and records events (click/save/dismiss) to teach the feed what must never reappear.
• Feedback loop: User events plus the refresh button feed back into feed.build_feed() via soft/hard exclusions, ensuring fresh papers are promoted while clicked/saved items stay hidden forever.

Open comments for this post

@jithesh_sarvin on resea · about 2 months ago

2h 6m 31s logged

Smarter Fallback: If you have seen absolutely every paper in the catalog, the algorithm will now smoothly fall back to showing you the best papers you’ve already seen rather than giving you a completely blank screen.
Clearer UI & Reset Button: I updated the frontend so if you ever somehow run out of papers again, it will proudly tell you “You’re all caught up!” and give you a button to Clear your read history so you can start over while the crawlers hunt for more.

Open comments for this post

@jithesh_sarvin on resea · about 2 months ago

1h 53m 34s logged

I have updated the program and intereface to look better, it now has custom themes better crawler and more

Open comments for this post

@jithesh_sarvin on resea · about 2 months ago

2h 55m 38s logged

This is like Netflix, but for research papers. It crawls the internet to index research papers and classifies them by topic. Then, a feed-based ML algorithm filters and displays a personalized list based on the user’s activity.