You are browsing as a guest. Sign up (or log in) to start making projects!

resea

  • 5 Devlogs
  • 16 Total hours

a feed that updates you with interesting and new reserach based on your activities

Open comments for this post

54m 18s logged

added a new cashing system which helps save data locally in the browser so that each person gets a individualized feed in the demo website.

added a new cashing system which helps save data locally in the browser so that each person gets a individualized feed in the demo website.

Replying to @jithesh_sarvin

0
11
Ship #1

What did you make?
Research Feed — a personalized research paper discovery app. FastAPI backend + Next.js frontend, lightweight crawler pipeline (arXiv + OpenAlex) that populates a SQLite DB, a recommender/feed engine, and a demo deployment so people can try the site.

What was challenging?
Making the whole pipeline robust end-to-end: deduplication, citation backfill, handling OpenAlex/semantic‑scholar rate limits, purifier rules so we don’t delete useful papers, and getting Next.js + API routing right behind nginx on the Nest host. Deployment constraints (Nest environment, no Docker initially) added friction too.

What are you proud of?
A working interactive demo with a live crawler and ≈570 papers, a usable frontend (feed, search, paper pages), an automated demo deploy flow (Docker/demo scripts + non‑Docker bootstrap), and fixes for tricky issues (routing, citation backfill, offsets, and safer purifier logic).

What should people know so they can test your project?

Open the demo: https://jithesh.hackclub.app
Try these flows: browse Home feed, click Search, open a paper, open the PDF (external), save a paper to Library, and use the refresh button on the feed.
API checks: /api/stats (paper count), /api/feed, /api/papers/:id return JSON.
If the UI shows JSON, hard-refresh (Ctrl+Shift+R) — nginx was fixed so UI routes should render HTML.
To repro crawler/DB behavior (server access): tail logs at /root/research-feed/logs and manage services with systemctl (research-feed-api, research-feed-frontend).
Report any broken link, missing PDF, or pages that show raw JSON and I’ll fix it quickly.
Want me to craft a one‑paragraph summary for the project page or the “What did you make” field?

  • 5 devlogs
  • 16h
Try project → See source code →
Open comments for this post

8h 29m 46s logged

devlog-4
• Crawlers: OpenAlex and arXiv jobs pull open-access papers (min 5 citations) while offsets are persisted in crawler_offsets.json so each topic resumes where it left off.
• Canonical catalog: All crawled papers land in data/papers.db, ensuring every service references the same SQLite store even when multiple components run concurrently.
• Enrichment: Each new row undergoes OpenAlex metadata backfill, DOI/arXiv linking, and topic/authorship tagging so everything downstream sees complete content.
• Metrics + embeddings: classification.metrics computes trending/hybrid scores and embeddings.pipeline generates vector representations that let the feed rank novelty and relevance.
• Purifier: db_purifier.py runs after enrichment, removing paywalled, duplicate, or incomplete papers while keeping PDF/ArXiv URLs up to date.
• Feed cache: Once the catalog is clean, FeedCache stores session rows and shown IDs so the API can quickly respond without re-running heavy scoring on every request.
• API: FastAPI’s /feed handler calls build_feed(db, refresh?, client_seen_ids) which builds a FeedContext containing seen IDs, soft/hard exclusions, and user-interest signals.
• Feed logic: The engine pulls candidates via _select_from_query, applies feedback-weighted scores, injects high jitter on refresh, and enforces seen-paper penalties so every refresh reshuffles without repeating the exact same order.
• Frontend: The YouTube-style carousel shows hero/trending/high-impact rows, pulls seen IDs from localStorage, calls api.getFeed(refresh, seenIds), and records events (click/save/dismiss) to teach the feed what must never reappear.
• Feedback loop: User events plus the refresh button feed back into feed.build_feed() via soft/hard exclusions, ensuring fresh papers are promoted while clicked/saved items stay hidden forever.

devlog-4
• Crawlers: OpenAlex and arXiv jobs pull open-access papers (min 5 citations) while offsets are persisted in crawler_offsets.json so each topic resumes where it left off.
• Canonical catalog: All crawled papers land in data/papers.db, ensuring every service references the same SQLite store even when multiple components run concurrently.
• Enrichment: Each new row undergoes OpenAlex metadata backfill, DOI/arXiv linking, and topic/authorship tagging so everything downstream sees complete content.
• Metrics + embeddings: classification.metrics computes trending/hybrid scores and embeddings.pipeline generates vector representations that let the feed rank novelty and relevance.
• Purifier: db_purifier.py runs after enrichment, removing paywalled, duplicate, or incomplete papers while keeping PDF/ArXiv URLs up to date.
• Feed cache: Once the catalog is clean, FeedCache stores session rows and shown IDs so the API can quickly respond without re-running heavy scoring on every request.
• API: FastAPI’s /feed handler calls build_feed(db, refresh?, client_seen_ids) which builds a FeedContext containing seen IDs, soft/hard exclusions, and user-interest signals.
• Feed logic: The engine pulls candidates via _select_from_query, applies feedback-weighted scores, injects high jitter on refresh, and enforces seen-paper penalties so every refresh reshuffles without repeating the exact same order.
• Frontend: The YouTube-style carousel shows hero/trending/high-impact rows, pulls seen IDs from localStorage, calls api.getFeed(refresh, seenIds), and records events (click/save/dismiss) to teach the feed what must never reappear.
• Feedback loop: User events plus the refresh button feed back into feed.build_feed() via soft/hard exclusions, ensuring fresh papers are promoted while clicked/saved items stay hidden forever.

Replying to @jithesh_sarvin

0
31
Open comments for this post

2h 6m 31s logged

Smarter Fallback: If you have seen absolutely every paper in the catalog, the algorithm will now smoothly fall back to showing you the best papers you’ve already seen rather than giving you a completely blank screen.
Clearer UI & Reset Button: I updated the frontend so if you ever somehow run out of papers again, it will proudly tell you “You’re all caught up!” and give you a button to Clear your read history so you can start over while the crawlers hunt for more.

Smarter Fallback: If you have seen absolutely every paper in the catalog, the algorithm will now smoothly fall back to showing you the best papers you’ve already seen rather than giving you a completely blank screen.
Clearer UI & Reset Button: I updated the frontend so if you ever somehow run out of papers again, it will proudly tell you “You’re all caught up!” and give you a button to Clear your read history so you can start over while the crawlers hunt for more.

Replying to @jithesh_sarvin

0
15
Open comments for this post

1h 53m 34s logged

I have updated the program and intereface to look better, it now has custom themes better crawler and more

I have updated the program and intereface to look better, it now has custom themes better crawler and more

Replying to @jithesh_sarvin

0
13
Open comments for this post

2h 55m 38s logged

This is like Netflix, but for research papers. It crawls the internet to index research papers and classifies them by topic. Then, a feed-based ML algorithm filters and displays a personalized list based on the user’s activity.

This is like Netflix, but for research papers. It crawls the internet to index research papers and classifies them by topic. Then, a feed-based ML algorithm filters and displays a personalized list based on the user’s activity.

Replying to @jithesh_sarvin

0
14

Followers

Loading…