feat: WIP: Adjust GPU Layers #3737

siddimore · 2024-10-06T05:46:53Z

Description

Add GGUF Parser

TODO

Install nvidia-smi driver
Fetch GPU Device information
Offload layers to GPU baed on GGUF Parser metadata

This PR fixes #3541

Notes for Reviewers

Signed commits

Yes, I signed my commits.

Signed-off-by: Siddharth More <[email protected]>

netlify · 2024-10-06T05:47:11Z

✅ Deploy Preview for localai ready!

Name	Link
🔨 Latest commit	`cd1dc5d`
🔍 Latest deploy log	https://app.netlify.com/sites/localai/deploys/6705d246290228000812ee36
😎 Deploy Preview	https://deploy-preview-3737--localai.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

go.mod

…6b61dc98b87a` (mudler#3718) ⬆️ Update ggerganov/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

…dfbc9d51570c4e` (mudler#3719) ⬆️ Update ggerganov/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

…ler#3721) Signed-off-by: Ettore Di Giacinto <[email protected]>

Updated some formatting in the doc. Signed-off-by: JJ Asghar <[email protected]>

Signed-off-by: Ettore Di Giacinto <[email protected]>

…f07d9d7a6077` (mudler#3725) ⬆️ Update ggerganov/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

…ab770389bb442b` (mudler#3724) ⬆️ Update ggerganov/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

feat(multimodal): allow to template image placeholders Signed-off-by: Ettore Di Giacinto <[email protected]>

Signed-off-by: Ettore Di Giacinto <[email protected]>

) * feat(vllm): add support for image-to-text Related to mudler#3670 Signed-off-by: Ettore Di Giacinto <[email protected]> * feat(vllm): add support for video-to-text Closes: mudler#2318 Signed-off-by: Ettore Di Giacinto <[email protected]> * feat(vllm): support CPU installations Signed-off-by: Ettore Di Giacinto <[email protected]> * feat(vllm): add bnb Signed-off-by: Ettore Di Giacinto <[email protected]> * chore: add docs reference Signed-off-by: Ettore Di Giacinto <[email protected]> * Apply suggestions from code review Signed-off-by: Ettore Di Giacinto <[email protected]> --------- Signed-off-by: Ettore Di Giacinto <[email protected]> Signed-off-by: Ettore Di Giacinto <[email protected]>

…b00e0223b6fa` (mudler#3731) ⬆️ Update ggerganov/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

…5c9e2b2529ff2c` (mudler#3730) ⬆️ Update ggerganov/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

We default to a soft kill, however, we might want to force killing backends after a while to avoid hanging requests (which may hallucinate indefinetly) Signed-off-by: Ettore Di Giacinto <[email protected]>

If the LLM does not implement any logic for PredictStream, we close the channel immediately to not leave the process hanging. Signed-off-by: Ettore Di Giacinto <[email protected]>

…c8930d19f45773` (mudler#3735) ⬆️ Update ggerganov/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

…1bd8811a9b44` (mudler#3736) ⬆️ Update ggerganov/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

Signed-off-by: Siddharth More <[email protected]>

siddimore · 2024-10-08T05:48:53Z

@mudler can u kindly check the PR approach and give some high level feedback when possible?

Next step that i will add is some sort of GPU_Layer estimator based on:

VRAM from GGUF Parsing
Noof GPU's on the device

Signed-off-by: Siddharth More <[email protected]>

mudler · 2024-10-09T10:15:17Z

core/cli/run.go

@@ -70,6 +70,7 @@ type RunCMD struct {
 	Federated                          bool     `env:"LOCALAI_FEDERATED,FEDERATED" help:"Enable federated instance" group:"federated"`
 	DisableGalleryEndpoint             bool     `env:"LOCALAI_DISABLE_GALLERY_ENDPOINT,DISABLE_GALLERY_ENDPOINT" help:"Disable the gallery endpoints" group:"api"`
 	LoadToMemory                       []string `env:"LOCALAI_LOAD_TO_MEMORY,LOAD_TO_MEMORY" help:"A list of models to load into memory at startup" group:"models"`
+	AdjustGPULayers                    bool     `env:"LOCALAI_ADJUST_GPU_LAYERS,ADJUST_GPU_LAYERS" help:"Enable OffLoading of model layers to GPU" group:"models"`


Minor nit: I would call this something like AutomaticallyAdjustGPULayers :

Suggested change

AdjustGPULayers bool `env:"LOCALAI_ADJUST_GPU_LAYERS,ADJUST_GPU_LAYERS" help:"Enable OffLoading of model layers to GPU" group:"models"`

AutomaticallyAdjustGPULayers bool `env:"LOCALAI_AUTO_ADJUST_GPU_LAYERS,ADJUST_GPU_LAYERS" help:"Enable Automatic OffLoading of model layers to GPU" group:"models"`

mudler · 2024-10-09T10:24:18Z

pkg/xsysinfo/nvidia_gpu.go

+
+// GetNvidiaGpuInfo uses pkg nvml is a go binding around C API provided by libnvidia-ml.so
+// to fetch GPU stats
+func GetNvidiaGpuInfo() ([]GPUInfo, error) {


Minor nit, but it's good practice to keep acronyms uppercase, e.g. UnmarshalYAML, GetXYZ:

Suggested change

func GetNvidiaGpuInfo() ([]GPUInfo, error) {

func GetNvidiaGPUInfo() ([]GPUInfo, error) {

mudler · 2024-10-09T10:25:34Z

pkg/model/gguf_parser_wrapper_test.go

+	}
+}
+
+func TestGetModelGGufData_URL_WithMockedEstimateModelMemoryUsage(t *testing.T) {


This is still a minor nit, but all other tests are using ginkgo - do you feel to use ginkgo as well? not a blocker in any case, at some point will refactor things out to be more consistent if needed

mudler · 2024-10-09T10:34:18Z

pkg/model/gguf_parser_wrapper.go

+)
+
+// Interface for parsing different model formats
+type LocalAIGGUFParser interface {


Small nit style (non-blocking): it would make the code more reusable if the interface here would require only ParseGGUFFile and have different parsers implementing their ParseGGUFFile logic, for instance a Ollama parser, a Huggingface Parser, etc.

The caller then would need to instantiate the needed parser down the line, for instance:

type GGUFParser interface { Parse(path string) (*ggufparser.GGUFFile, error) } // GetModelGGufData returns the resources estimation needed to load the model. func GetModelGGufData(modelPath string, estimator ModelMemoryEstimator, ollamaModel bool) (*ModelEstimate, error) { ctx := context.Background() fmt.Println("ModelPath: ", modelPath) var ggufParser GGUFParser // Check if the input is a valid URL switch { case isURL(modelPath): ggufParser = &RemoteFileParser{ctx,modelPath} case ollamaModel: ggufParser = &OllaamParser{ctx,modelPath} /// .. other parsers here } return estimator.Estimate(ggufRemoteData) }

Considering that we pass an estimator down the line tells me that this actually should be part of ModelMemoryEstimator as well:

func (g GGUFEstimator) GetModelGGufData(modelPath string, ollamaModel bool) (*ModelEstimate, error) {

mudler · 2024-10-09T10:39:34Z

@siddimore thanks for taking a stab at this, direction looks good here - just few minor nits here and there but definitely not blockers

siddimore · 2024-10-10T03:44:38Z

@siddimore thanks for taking a stab at this, direction looks good here - just few minor nits here and there but definitely not blockers

thanks much @mudler you are welcome!! i will improve the code and add some more testing. Appreciate the feedback and will fix the comments

Add GGUF Parser

790700b

Signed-off-by: Siddharth More <[email protected]>

siddimore changed the title ~~WIP: Figure out GPU Layers~~ feat: WIP: Figure out GPU Layers Oct 6, 2024

siddimore commented Oct 6, 2024

View reviewed changes

go.mod Outdated Show resolved Hide resolved

localai-bot and others added 17 commits October 6, 2024 09:01

chore: ⬆️ Update ggerganov/llama.cpp to `a39ab216aa624308fda7fa84439c…

a234da5

…6b61dc98b87a` (mudler#3718) ⬆️ Update ggerganov/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

chore: ⬆️ Update ggerganov/whisper.cpp to `ede1718f6d45aa3f7ad4a1e169…

a3a03a4

…dfbc9d51570c4e` (mudler#3719) ⬆️ Update ggerganov/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

chore(federated): display a message when nodes are not available (mud…

0dc66a6

…ler#3721) Signed-off-by: Ettore Di Giacinto <[email protected]>

Update CONTRIBUTING.md (mudler#3723)

717978e

Updated some formatting in the doc. Signed-off-by: JJ Asghar <[email protected]>

models(gallery): add salamandra-7b-instruct (mudler#3726)

78e29f3

Signed-off-by: Ettore Di Giacinto <[email protected]>

chore: ⬆️ Update ggerganov/llama.cpp to `d5ed2b929d85bbd7dbeecb690880…

af1eb1d

…f07d9d7a6077` (mudler#3725) ⬆️ Update ggerganov/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

chore: ⬆️ Update ggerganov/whisper.cpp to `ccc2547210e09e3a1785817383…

16dfee9

…ab770389bb442b` (mudler#3724) ⬆️ Update ggerganov/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

feat(multimodal): allow to template placeholders (mudler#3728)

0ec4dc6

feat(multimodal): allow to template image placeholders Signed-off-by: Ettore Di Giacinto <[email protected]>

Update README.md

bb130ff

Signed-off-by: Ettore Di Giacinto <[email protected]>

chore: ⬆️ Update ggerganov/llama.cpp to `71967c2a6d30da9f61580d3e2d4c…

2d11bfc

…b00e0223b6fa` (mudler#3731) ⬆️ Update ggerganov/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

chore: ⬆️ Update ggerganov/whisper.cpp to `2944cb72d95282378037cb0eb4…

fc74bf1

…5c9e2b2529ff2c` (mudler#3730) ⬆️ Update ggerganov/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

feat(shutdown): allow force shutdown of backends (mudler#3733)

4dffc45

We default to a soft kill, however, we might want to force killing backends after a while to avoid hanging requests (which may hallucinate indefinetly) Signed-off-by: Ettore Di Giacinto <[email protected]>

fix(base-grpc): close channel in base grpc server (mudler#3734)

1c0300b

If the LLM does not implement any logic for PredictStream, we close the channel immediately to not leave the process hanging. Signed-off-by: Ettore Di Giacinto <[email protected]>

chore: ⬆️ Update ggerganov/whisper.cpp to `6a94163b913d8e974e60d9ac56…

64ade06

…c8930d19f45773` (mudler#3735) ⬆️ Update ggerganov/whisper.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

chore: ⬆️ Update ggerganov/llama.cpp to `8c475b97b8ba7d678d4c9904b116…

a8f095a

…1bd8811a9b44` (mudler#3736) ⬆️ Update ggerganov/llama.cpp Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <[email protected]>

fix pr comment

1ad80c9

Signed-off-by: Siddharth More <[email protected]>

github-actions bot added kind/documentation Improvements or additions to documentation area/ai-model labels Oct 6, 2024

siddimore changed the title ~~feat: WIP: Figure out GPU Layers~~ feat: WIP: Adjust GPU Layers Oct 7, 2024

siddimore and others added 6 commits October 6, 2024 22:19

Add tests

6dee2a6

Signed-off-by: Siddharth More <[email protected]>

Save Model Memory Usage

6d1199d

Signed-off-by: Siddharth More <[email protected]>

fix merge conflict

eaee726

Signed-off-by: Siddharth More <[email protected]>

Merge branch 'master' into adjust_default_gpu_layers

0a306c8

Merge branch 'master' into adjust_default_gpu_layers

e2fb38f

Add code to query NVIDIA device

18bcce2

Signed-off-by: Siddharth More <[email protected]>

Merge branch 'master' into adjust_default_gpu_layers

37f2d65

siddimore added 2 commits October 8, 2024 17:41

Add AdjustGPULayers flag

7380f80

Signed-off-by: Siddharth More <[email protected]>

rename file

cd1dc5d

Signed-off-by: Siddharth More <[email protected]>

mudler reviewed Oct 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: WIP: Adjust GPU Layers #3737

feat: WIP: Adjust GPU Layers #3737

siddimore commented Oct 6, 2024 •

edited

Loading

netlify bot commented Oct 6, 2024 •

edited

Loading

siddimore commented Oct 8, 2024

mudler Oct 9, 2024

mudler Oct 9, 2024

mudler Oct 9, 2024

mudler Oct 9, 2024 •

edited

Loading

mudler commented Oct 9, 2024

siddimore commented Oct 10, 2024

	AdjustGPULayers bool `env:"LOCALAI_ADJUST_GPU_LAYERS,ADJUST_GPU_LAYERS" help:"Enable OffLoading of model layers to GPU" group:"models"`
	AutomaticallyAdjustGPULayers bool `env:"LOCALAI_AUTO_ADJUST_GPU_LAYERS,ADJUST_GPU_LAYERS" help:"Enable Automatic OffLoading of model layers to GPU" group:"models"`

	func GetNvidiaGpuInfo() ([]GPUInfo, error) {
	func GetNvidiaGPUInfo() ([]GPUInfo, error) {

feat: WIP: Adjust GPU Layers #3737

Are you sure you want to change the base?

feat: WIP: Adjust GPU Layers #3737

Conversation

siddimore commented Oct 6, 2024 • edited Loading

netlify bot commented Oct 6, 2024 • edited Loading

✅ Deploy Preview for localai ready!

siddimore commented Oct 8, 2024

mudler Oct 9, 2024

Choose a reason for hiding this comment

mudler Oct 9, 2024

Choose a reason for hiding this comment

mudler Oct 9, 2024

Choose a reason for hiding this comment

mudler Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

mudler commented Oct 9, 2024

siddimore commented Oct 10, 2024

siddimore commented Oct 6, 2024 •

edited

Loading

netlify bot commented Oct 6, 2024 •

edited

Loading

mudler Oct 9, 2024 •

edited

Loading