Ingest Arbitrary Documents
Note: The Bearer auth token is generated on server startup in Danswer MIT. There is better API Key support as part of Danswer EE.See below for a breakdown of the different fields provided:
id
: this is the unique ID of the document, if a document of this ID exists it will be updated/replaced.
If not provided, a document ID is generated from the semantic_identifier field instead and returned in the
response.sections
: list of sections each containing textual content and an optional link. The document chunking
tries to avoid splitting sections internally and favors splitting at section borders. Also the link of the
document at query time is the link of the best matched section.source
: Source type, full list can be checked by searching for DocumentSource
heresemantic_identifier
: This is the “Title” of the document as shown in the UI (see image below)metadata
: Used for the “Tags” feature which is displayed in the UI. The values can be either strings
or list of stringsdoc_updated_at
: The time that the document was last considered updated. By default there is a time
based score decay around this value when the document is considered during search.cc_pair_id
: This is the “Connector” ID seen on the Connector Status pages. For example, if running
locally, it might be http://localhost:3000/admin/connector/2
. This allows attaching the ingestion doc
to existing connectors so they can be assigned to groups or deleted together with the connector. If not
provided or set to 1
explicitly, it is considered part of the default catch-all connector.