A complete, self-hosted pipeline from raw financial PDFs to a governed, conversational knowledge base.
Layout-aware OCR via Qwen3-VL-8B for scanned & mixed-orientation PDFs.
Nested, merged-cell and page-spanning financial tables.
Page-level batching & resumable parsing for 10k-page files.
Retry failed batches, isolate corrupted pages, partial ingestion.
Semantic vectors in PostgreSQL with sub-second retrieval.
Token streaming with inline source citations.
Single file, selected files, or entire workspace.
Short-term continuity plus semantic historical retrieval.
Super-admin, admin, user roles with strict isolation.
Live activity, ingestion, config and queue events.
Isolated workspaces with folder hierarchy & grouping.
One-command Docker Compose, air-gapped deployable.