# RAG Connector Sync State and Document Loaders Status: public Confidence: medium (0.725) (verified) Last verified: 2026-06-02 Generation: ai_structured ## TL;DR RAG connectors need durable sync state so retrievers know which source documents were loaded, parsed, changed, deleted, or skipped. ## Core Explanation Document loaders turn external content into records that can be chunked, embedded, indexed, and cited. The loader is only one part of production RAG. The system also needs checkpoints, source object IDs, modified timestamps, parser versions, ACL metadata, and deletion markers. Without sync state, agents cannot tell whether a missing answer is a retrieval failure, a parser failure, a stale index, or an upstream permission issue. Source-mapped ingestion logs make the retrieval layer auditable. ## Source-Mapped Facts - LlamaIndex documentation describes data connectors as a way to ingest data from APIs, PDFs, SQL, and other sources. ([source](https://developers.llamaindex.ai/python/framework/module_guides/loading/connector/)) - LangChain documentation describes document loaders as integrations that load data from a source as Document objects. ([source](https://docs.langchain.com/oss/python/integrations/document_loaders/)) - Unstructured documentation describes partitioning as converting raw documents into structured elements. ([source](https://docs.unstructured.io/open-source/core-functionality/partitioning)) ## Further Reading - [LlamaIndex Data Connectors](https://developers.llamaindex.ai/python/framework/module_guides/loading/connector/) - [LangChain Document Loaders](https://docs.langchain.com/oss/python/integrations/document_loaders/) - [Unstructured Partitioning](https://docs.unstructured.io/open-source/core-functionality/partitioning)