GPT4All
Configure Danswer to use GPT4All models in memory
Refer to Model Configs for how to set the environment variables for your particular deployment.
Note: While we support local LLMs, you will get significantly better responses with a more powerful model like GPT-4.
What is GPT4All
GPT4All provides a way to run LLMs (closed and opensource) by calling APIs or running in memory. For self-hosted models, GPT4All offers models that are quantized or running with reduced float precision. Both of these are ways to compress models to run on weaker hardware at a slight cost in model capabilities.
GPT4All provides a Python wrapper which Danswer uses to run the models in same container as the Danswer API Server.
Because GPT4All is not compatible with certain architectures, Danswer does not package it by default.
You will have to install it in your deployment by uncommenting gpt4all==2.0.2
in
danswer/backend/requirements/default.txt and rebuilding with GPT4All installed.
Note: Despite GPT4All offering quantized models, it is still significantly slower than models fully hosted on GPUs. If you’re running the models purely on CPU, there may be significant delay to processing the context documents and in generating answers.