How are you dealing with documents that exceed the context window? Chunking an db vectorization? One issue with some approaches is the Paragraph 40 that references Paragraph 6 in a document but the LLM doesn’t really have direct capability to “remember” that reference.
I chunk the documents and use Elasticsearch to store the vectors. On a laptop with 8GB of GPU I can have a pretty large context window and not hallucinate.