[Feature]后续会支持llava等多模态模型的训练吗 #2093

Vincent131499 · 2024-01-04T09:31:15Z

Reminder

I have read the README and searched the existing issues.

Reproduction

如题

Expected behavior

No response

System Info

No response

Others

No response

Katehuuh · 2024-01-27T21:23:40Z

They are more state-of-the-art multimodal and diverse like video, sound, 3D, all-in-one… but I’m interested in using the LLaVA-1.5-LoRA because of oobabooga UI compatibility 4bit.

Katehuuh · 2024-04-26T14:48:03Z

~~Are multiple images supported in same conversation? as we currently have~~

LLaMA-Factory/data/mllm_demo.json

Lines 47 to 54 in 031775a

    
           }, 
        
           { 
        
             "messages": [ 
        
               { 
        
                 "content": "Please describe this image<image>", 
        
                 "role": "user" 
        
               }, 
        
               {

As referred to

LLaMA-Factory/data/mllm_demo.json

Lines 65 to 71 in 031775a

    
                 } 
        
               ], 
        
               "images": [ 
        
                 "images/3.jpg" 
        
               ] 
        
             } 
        
           ]

For example, if we have in the same conversation "content": "Is this the same person?<image>[2]"", and set it:

   "images": [ 
     "images/3.jpg",
     "images/4.jpg" 
   ]

Edit1: it seems like no:

LLaMA-Factory/src/llmtuner/data/preprocess.py

Lines 29 to 33 in 8e09e20

    
           def _preprocess_visual_inputs(images: Sequence["ImageObject"], processor: "ProcessorMixin") -> "NDArray": 
        
               # process visual inputs (currently only supports a single image) 
        
               image_processor: "BaseImageProcessor" = getattr(processor, "image_processor") 
        
               image = images[0] if len(images) != 0 else Image.new("RGB", (100, 100), (255, 255, 255)) 
        
               return image_processor(image, return_tensors="pt")["pixel_values"][0]

From arg:

LLaMA-Factory/examples/lora_single_gpu/sft_mllm.sh

Line 6 in 031775a

--model_name_or_path llava-hf/llava-1.5-7b-hf \

hiyouga: llava + qlora, it now requires ~5GB to fine-tune llava1.5 -7b

It seems like it cannot be train of the base Llama-2/3 however, we have liuhaotian/llava-v1.5-13b-lora that can be applied to Llama-2.

Ignore this line as it is derived but I've seen llava-Phi-3/Llama-3 from here: InternLM/xtuner.
No LoRa for llava-v1.6 but unsure if v1.6 is supported?

Vincent131499 changed the title ~~[Feature]后续会支持llama等多模态模型的训练吗~~ [Feature]后续会支持llava等多模态模型的训练吗 Jan 4, 2024

hiyouga added the pending This problem is yet to be addressed label Jan 4, 2024

hiyouga mentioned this issue Feb 3, 2024

Finetuing qwen-vl support is planned in the future？ #2112

Closed

1 task

hiyouga added the enhancement New feature or request label Feb 6, 2024

hiyouga mentioned this issue Feb 6, 2024

可以增加多模态大语言模型的微调么？ #2442

Closed

hiyouga mentioned this issue Apr 25, 2024

Support fine-tuning LLaVA-1.5 MLLM #3454

Merged

1 task

hiyouga closed this as completed in #3454 Apr 25, 2024

hiyouga added solved This problem has been already solved and removed enhancement New feature or request pending This problem is yet to be addressed labels Apr 25, 2024

marko1616 mentioned this issue Jun 19, 2024

[Feature request] 支持Qwen-VL #4375

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]后续会支持llava等多模态模型的训练吗 #2093

[Feature]后续会支持llava等多模态模型的训练吗 #2093

Vincent131499 commented Jan 4, 2024

Katehuuh commented Jan 27, 2024

Katehuuh commented Apr 26, 2024 •

edited

Loading

[Feature]后续会支持llava等多模态模型的训练吗 #2093

[Feature]后续会支持llava等多模态模型的训练吗 #2093

Comments

Vincent131499 commented Jan 4, 2024

Reminder

Reproduction

Expected behavior

System Info

Others

Katehuuh commented Jan 27, 2024

Katehuuh commented Apr 26, 2024 • edited Loading

Katehuuh commented Apr 26, 2024 •

edited

Loading