-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Model Inference and Deployment
We mainly provide the following ways for inference and local deployment.
A tool for quantizing model and deploying on local CPU or GPU
Link: llama.cpp-Deployment
Original transformers inference method, support CPU/GPU
Link: Inference-with-Transformers
A tool for deploying model as a web UI.
Link: text-generation-webui
LlamaChat is a macOS app that allows you to chat with LLaMA, Alpaca, etc. Support GGML (.bin) and PyTorch (.pth) formats.
Link: Using-LlamaChat-Interface
LangChain is a framework for developing LLM-driven applications, designed to assist developers in building end-to-end applications using LLM.
With the components and interfaces provided by LangChain, developers can easily design and build various LLM-powered applications such as question-answering systems, summarization tools, chatbots, code comprehension tools, information extraction systems, and more.
Link: Integrated-with-LangChain
privateGPT is an open-source project based on llama-cpp-python and LangChain among others. It aims to provide an interface for localizing document analysis and interactive Q&A using large models. Users can utilize privateGPT to analyze local documents and use GPT4All or llama.cpp compatible large model files to ask and answer questions about document content, ensuring data localization and privacy.
- 模型合并与转换
- 模型量化、推理、部署
- 效果与评测
- 训练细节
- 常见问题
- Model Reconstruction
- Model Quantization, Inference and Deployment
- System Performance
- Training Details
- FAQ