[Feature Request] Support for GGUF models (llama.cpp compatible) #12

syddharth · 2023-11-13T21:05:52Z

These run on both GPU and CPU. A lot of OSS community uses them I guess, and the models are quite light on VRAM.

davidmezzetti · 2023-11-17T17:57:51Z

Thank you for submitting.

If you're using txtai 6.2+ you can do the following.

# Embeddings index
writable: false
cloud:
  provider: huggingface-hub
  container: neuml/txtai-wikipedia

# llama.cpp pipeline
llama_cpp.Llama:
    model_path: path to GGUF file

# Extractor pipeline
extractor:
  path: llama_cpp.Llama
  output: reference

txtchat.pipeline.wikisearch.Wikisearch:
  # Add application reference
  application:

workflow:
  wikisearch:
    tasks:
    - action: txtchat.pipeline.wikisearch.Wikisearch

You just need to make sure you also have https://github.com/abetlen/llama-cpp-python installed.

syddharth · 2023-11-18T11:10:43Z

Thanks for this. The GGUF model loads correctly. Though I am getting the following error now:

Traceback (most recent call last):
  File "C:\Users\mates\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\mates\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtchat\agent\__main__.py", line 21, in <module>
    agent = AgentFactory.create(sys.argv[1])
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtchat\agent\factory.py", line 34, in create
    return RocketChat(config)
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtchat\agent\rocketchat.py", line 30, in __init__
    super().__init__(config)
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtchat\agent\base.py", line 32, in __init__
    self.application = Application(config)
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtai\app\base.py", line 72, in __init__
    self.pipes()
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtai\app\base.py", line 129, in pipes
    self.pipelines[pipeline] = PipelineFactory.create(config, pipeline)
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtai\pipeline\factory.py", line 55, in create
    return pipeline if isinstance(pipeline, types.FunctionType) else pipeline(**config)
  File "c:\AI\T2T\txtchat\venv\lib\site-packages\txtchat\pipeline\wikisearch.py", line 32, in __init__
    self.workflow = Workflow([Question(action=application.pipelines["extractor"]), WikiAnswer()])
KeyError: 'extractor'

davidmezzetti · 2023-11-19T15:21:07Z

Did you run the exact configuration provided above?

davidmezzetti · 2023-12-17T10:42:41Z

Just added a fix with #13 that should fix the KeyError message you're receiving above.

If you install txtai from source, there is now direct support for llama.cpp models. See this article for more.

https://neuml.hashnode.dev/integrate-llm-frameworks

davidmezzetti self-assigned this Dec 17, 2023

davidmezzetti added this to the v0.2.0 milestone Dec 17, 2023

davidmezzetti closed this as completed Dec 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Support for GGUF models (llama.cpp compatible) #12

[Feature Request] Support for GGUF models (llama.cpp compatible) #12

syddharth commented Nov 13, 2023

davidmezzetti commented Nov 17, 2023

syddharth commented Nov 18, 2023

davidmezzetti commented Nov 19, 2023

davidmezzetti commented Dec 17, 2023

[Feature Request] Support for GGUF models (llama.cpp compatible) #12

[Feature Request] Support for GGUF models (llama.cpp compatible) #12

Comments

syddharth commented Nov 13, 2023

davidmezzetti commented Nov 17, 2023

syddharth commented Nov 18, 2023

davidmezzetti commented Nov 19, 2023

davidmezzetti commented Dec 17, 2023