City budget documents are public, but in practice they are hard to use. They are long PDFs, full of department codes and financial terms, and most people do not have time to manually scan hundreds of pages just to answer one question.
I wanted to make this easier, so I built a public app that lets people ask plain-language questions about Chicago's FY2026 budget ordinances and get answers tied to exact source pages.
Code: github.com/devthakker/Chicago-budget
The Problem
The core problem was not "can we call an LLM". It was:
- How do we retrieve the right evidence from large, messy PDFs?
- How do we show users exactly where the answer came from?
- How do we make this reliable enough for public use?
The two source documents are large enough that naive search quality drops quickly, especially with tables and codes:
chicago_Annual_Appropriation_Ordinance_2026.pdfchicago_Grant_Details_Ordinance_2026.pdf
Architecture
The app follows a straightforward RAG pipeline:
- PDF text extraction (
pdftotext -layout) - Chunking with page metadata and section awareness
- Hybrid retrieval (BM25 + optional embeddings)
- Reranking (heuristic, with optional cross-encoder path)
- Answer generation with citations
- Source UX: open exact PDF page in tab or embedded viewer
Stack:
- Backend: FastAPI
- Retrieval/indexing: custom Python engine
- Frontend: server-rendered HTML/CSS templates
- Deployment: Docker
- Model providers: OpenAI, AWS Bedrock, or local Ollama
Retrieval Design Choices That Mattered
1) Hybrid retrieval over vectors-only
Budget queries are often code-heavy (GA00, 925S, ARPA, etc.), where lexical match is strong. I kept BM25 as the dominant signal and blended vectors as a secondary signal.
Default weighting:
RAG_BM25_WEIGHT=0.85RAG_VECTOR_WEIGHT=0.15
2) TOC suppression
One early failure mode was retrieval returning table-of-contents chunks. They look lexically relevant but are poor evidence. I added TOC detection plus penalty plus optional suppression.
3) Smaller, section-aware chunking
Large generic chunks blurred unrelated sections. I moved to smaller windows with overlap and section boundaries, which improved precision and citation usefulness.
Evidence UX: Trust Through Citations
Good answers are not enough for civic data. Users need to verify.
So each result includes:
- source document name
- page range
- direct "Open in Viewer" action
- "Open in New Tab" action to the exact page anchor
I also added an embedded PDF panel so users can inspect evidence without leaving the page.
Making It Operable in Production
Public tools need controls, not just model output.
I added:
- Dockerized deployment
- Env-based provider switching (
openai,bedrock,ollama) - Rate limiting for public traffic
- Site on/off feature flag with a temporary disabled page linking to the open-source repo
- Export options for user queries (Markdown, JSON, CSV)
Quality Loop: Evaluation and Tuning
I added an evaluation harness (eval_rag.py) with a sample benchmark file (eval/questions.sample.json) so retrieval changes can be measured, not guessed.
Metrics:
- Hit Rate@k
- MRR (mean reciprocal rank)
- first-hit rank per query
I also added a tuning mode to test BM25/vector blends and select the best setting for the current benchmark.
Deployment Notes
The app runs on AWS EC2 with Docker and Caddy for HTTPS. DNS is managed in Vercel and points a subdomain to the EC2 Elastic IP.
This setup kept deployment simple while still giving me:
- HTTPS
- controlled rollout
- easy environment-based config
What I Would Improve Next
If I were taking this to the next level, I would prioritize:
- richer structured extraction for budget tables and fields
- stronger evaluation coverage with a larger real-user query set
- Redis-backed shared rate limiting for multi-instance scale
- filterable UI facets (department, fund, grant code)
Closing
The biggest lesson was that civic RAG quality is mostly a retrieval and product design problem, not just a prompt problem. The model helps summarize, but trust comes from retrieval quality, transparent evidence, and operational discipline.
If you're building something similar and want to compare notes, feel free to reach out.