Add support for Phi3V #2383

ravi03071991 · 2024-12-07T06:22:36Z

PR to add support for Phi3V.

ravi03071991 · 2024-12-07T15:43:02Z

NOTE: Testing is still pending.

merrymercy · 2024-12-17T12:34:56Z

I added you as a co-author in #2500, so your future commits will trigger CI automatically.

merrymercy · 2024-12-17T12:36:32Z

How did you test the correctness of this model locally? Did you compare the logits against HF implements, similar to this one? #2365 (comment)

ravi03071991 · 2024-12-19T03:21:57Z

How did you test the correctness of this model locally? Did you compare the logits against HF implements, similar to this one? #2365 (comment)

Thanks for the pointer @merrymercy. I'll test it out.

Currently, it seems that Phi3VConfig and Phi3VModel are not part of the transformers library. Instead, they reside in the model files. As a result, I need to add them to the transformers library first to make this PR functional.

In other words, the following imports currently throw an error:

from transformers import Phi3VConfig, Phi3Vmodel

merrymercy · 2024-12-26T16:02:15Z

@ravi03071991 Can you fix the error in CI? https://github.com/sgl-project/sglang/actions/runs/12452217969/job/34760948018?pr=2383#step:4:984 Please make sure you can run it locally.

ravi03071991 · 2024-12-26T18:56:58Z

@merrymercy yeah sure. I am still working on it.

ravi03071991 · 2025-01-08T12:08:58Z

Updates:

Added an image processing step prior to image embedding.
Implemented image embedding.
Integrated Phi3VForCausalLM functionalities:
- Loading weights
- Embedding text and images
- Combining text and image embeddings and passing them to the LLM.

TODO:

The logic for combining text and image embeddings remains unclear. I have imported the logic from Qwen2 VL, but it differs from the approach suggested in the Hugging Face Phi3V code base. Specifically, I am struggling to understand the rationale behind combining the embeddings using image offset and padding.

can someone help me here?

zhyncs · 2025-01-08T12:10:39Z

can someone help me here?

@yizhang2077 may help take a look. Thanks.

yizhang2077 · 2025-01-08T15:57:30Z

can someone help me here?

In function pad_input_ids, it do padding for original input_ids with image tokens and add record image offsets here (where a image embedding start from). I think you can do some modification here, and in forward you can replace embedding by using image embedding and image offsets

ravi03071991 · 2025-01-08T16:40:04Z

can someone help me here?

In function pad_input_ids, it do padding for original input_ids with image tokens and add record image offsets here (where a image embedding start from). I think you can do some modification here, and in forward you can replace embedding by using image embedding and image offsets

Thanks @yizhang2077. The pad_input_ids seems to be defined in qwen2_vl but it does not seem to be used?

yizhang2077 · 2025-01-08T16:46:24Z

Thanks @yizhang2077. The pad_input_ids seems to be defined in qwen2_vl but it does not seem to be used?

It is used in pad_input_ids_func here

ravi03071991 · 2025-01-08T16:53:51Z

It is used in pad_input_ids_func here

Oh, I see. I missed that. The padding function seems different for qwen2_vl and llava. Is it specific to the model, and should it be checked on HF?

ravi03071991 · 2025-01-08T16:58:27Z

Looks like the model provider has some padding logic here.

yizhang2077 · 2025-01-08T16:59:53Z

Oh, I see. I missed that. The padding function seems different for qwen2_vl and llava. Is it specific to the model, and should it be checked on HF?

I think it is specific to the model since how to do padding depends on model implementation

ravi03071991 · 2025-01-10T16:47:33Z

@yizhang2077 / @zhyncs I think I kinda stuck here:

pad_input_ids requires num_img_tokens.
The current implementation computes num_img_tokens based on the image processor, which is called inside the forward method.

I’m a bit confused about how to proceed from here. Could you help me here?

ravi03071991 · 2025-01-12T05:43:58Z

@yizhang2077 / @zhyncs I think I kinda stuck here:

pad_input_ids requires num_img_tokens.

The current implementation computes num_img_tokens based on the image processor, which is called inside the forward method.

I’m a bit confused about how to proceed from here. Could you help me here?

Okay. I figured out the best way to solve this is by moving image_processing step into pad_input_ids and store the necessary information in image_inputs so that they can be used later in forward.

Add support for Phi3V

89415d4

ravi03071991 requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock and ByronHsu as code owners December 7, 2024 06:22

ravi03071991 marked this pull request as draft December 7, 2024 06:22

ravi03071991 added 2 commits December 7, 2024 12:44

add tests

4248d58

Update image_token_index

e0a70a5

ravi03071991 marked this pull request as ready for review December 7, 2024 09:26

merrymercy force-pushed the main branch from 1ad76cd to 835f8af Compare December 9, 2024 07:31

update model config

3dd6218

ravi03071991 marked this pull request as draft December 9, 2024 18:58

Merge branch 'main' into ravi/phi3v

e57c541

ravi03071991 added 3 commits December 22, 2024 12:26

Add phi3v configs

b466d30

update imports

1407697

linting

39a27c5

merrymercy marked this pull request as ready for review December 26, 2024 16:02

merrymercy added the await-response label Dec 26, 2024

merrymercy self-assigned this Dec 26, 2024

ravi03071991 added 2 commits January 8, 2025 17:04

Add Phi3VCausalLM

d8b64c5

linting

58dddb7

zhyncs assigned yizhang2077 Jan 8, 2025

ravi03071991 added 2 commits January 10, 2025 20:36

add trust remote code args

1af9430

Merge branch 'main' into ravi/phi3v

75efbe7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Phi3V #2383

Add support for Phi3V #2383

ravi03071991 commented Dec 7, 2024

ravi03071991 commented Dec 7, 2024

merrymercy commented Dec 17, 2024 •

edited

Loading

merrymercy commented Dec 17, 2024

ravi03071991 commented Dec 19, 2024

merrymercy commented Dec 26, 2024

ravi03071991 commented Dec 26, 2024

ravi03071991 commented Jan 8, 2025

zhyncs commented Jan 8, 2025

yizhang2077 commented Jan 8, 2025

ravi03071991 commented Jan 8, 2025 •

edited

Loading

yizhang2077 commented Jan 8, 2025 •

edited

Loading

ravi03071991 commented Jan 8, 2025

ravi03071991 commented Jan 8, 2025

yizhang2077 commented Jan 8, 2025

ravi03071991 commented Jan 10, 2025

ravi03071991 commented Jan 12, 2025

Add support for Phi3V #2383

Are you sure you want to change the base?

Add support for Phi3V #2383

Conversation

ravi03071991 commented Dec 7, 2024

ravi03071991 commented Dec 7, 2024

merrymercy commented Dec 17, 2024 • edited Loading

merrymercy commented Dec 17, 2024

ravi03071991 commented Dec 19, 2024

merrymercy commented Dec 26, 2024

ravi03071991 commented Dec 26, 2024

ravi03071991 commented Jan 8, 2025

zhyncs commented Jan 8, 2025

yizhang2077 commented Jan 8, 2025

ravi03071991 commented Jan 8, 2025 • edited Loading

yizhang2077 commented Jan 8, 2025 • edited Loading

ravi03071991 commented Jan 8, 2025

ravi03071991 commented Jan 8, 2025

yizhang2077 commented Jan 8, 2025

ravi03071991 commented Jan 10, 2025

ravi03071991 commented Jan 12, 2025

merrymercy commented Dec 17, 2024 •

edited

Loading

ravi03071991 commented Jan 8, 2025 •

edited

Loading

yizhang2077 commented Jan 8, 2025 •

edited

Loading