RAG App - Stardance

@revankotapati on RAG App · about 1 month ago

3h 51m 12s logged

For this project, I scrapped the UI and remade it. The base setup is still there. The sidebar and the main content area is still present, but I removed the ability to index and clear history. In the new UI, the sidebar allows the user to upload files, and the UI will show how many sections were indexed automatically. When the file is deleted, history and its related context will also be deleted. The QA part of the algorithm now answers more cleanly and provides follow-up questions as well. Moreover, the chat also shows its sources, the document’s text, when answering a question.

Coming to backend of the algorithm, I remade it as well. I first went to an OpenRouter API, which I had to fix for a long time as the file was not being converted to text properly, so the answers generated by the model were not correct. Then, I switched back to use a free Gemini Model for the answer generation and used a local embedding model to save tokens.

Open comments for this post

@revankotapati on RAG App · about 2 months ago

40m 8s logged

Added an OCR feature to allow handwritten text to be used. I also updated the user-query feature by allowing the user to select their own settings for the RAG pipeline. Moreover, I was able to allow the user to upload files and improved the UI to make it look better.

Ship #1 Changes requested

@revankotapati on RAG App · about 2 months ago

For this project, I engineered a containerized, cloud-native Retrieval-Augmented Generation (RAG) assistant deployed on Hugging Face Spaces that enables users to securely upload documentation and query it in real time through a Streamlit chat interface. Writing native code directly with the modern google-genai and Pinecone Python SDKs allowed me to bypass bulky orchestrator frameworks, resulting in a lightweight, zero-local-storage pipeline that uses text-embedding-004 and gemini-1.5-flash. Navigating this build came with distinct engineering challenges, particularly resolving rigid SDK endpoint routing mismatches and optimizing a vector dimensionality gap by implementing types.EmbedContentConfig to dynamically truncate embedding outputs down to 768 dimensions to match the Pinecone serverless index. I am incredibly proud of how fast, clean, and cost-effective this standalone architecture turned out, as well as the strict system prompt guardrails that prevent hallucinations by defaulting to a clean fallback response whenever a query falls outside the uploaded context. To thoroughly test the live application, users simply need to verify their API keys, paste a technical text block into the ingestion sidebar to index the cloud vectors, and then run a mix of specific domain questions and out-of-domain prompts to observe the system’s precise retrieval performance and defensive constraints.

2 devlogs
2h
2.98x multiplier
5 Stardust

Try project → See source code →

Open comments for this post

@revankotapati on RAG App · about 2 months ago

44m 1s logged

Finished backend and frontend connection.

Open comments for this post

@revankotapati on RAG App · about 2 months ago

1h 0m 28s logged

I built the frontend and backend of this RAG pipeline using Python, GoogleGenAI, and Pinecone