What is GPT4All
GPT4All provides a way to run LLMs (closed and opensource) by calling APIs or running in memory. For self-hosted models, GPT4All offers models that are quantized or running with reduced float precision. Both of these are ways to compress models to run on weaker hardware at a slight cost in model capabilities. GPT4All provides a Python wrapper which Danswer uses to run the models in same container as the Danswer API Server. Because GPT4All is not compatible with certain architectures, Danswer does not package it by default. You will have to install it in your deployment by uncommentinggpt4all==2.0.2
in
danswer/backend/requirements/default.txt and rebuilding with GPT4All installed.
Note: Despite GPT4All offering quantized models, it is still significantly slower than models fully hosted on GPUs.
If you’re running the models purely on CPU, there may be significant delay to processing the context documents and in
generating answers.