Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding PaliGemma2 to KerasHub #1998

Merged

Conversation

divyashreepathihalli
Copy link
Collaborator

@divyashreepathihalli divyashreepathihalli commented Dec 5, 2024

Sanity Check Colab : https://colab.sandbox.google.com/drive/1xejmaZvLgMFrzIrIRm2gHzrXF5cprp7P
Model summary
PaliGemma 2 is an update of the PaliGemma vision-language model (VLM) which incorporates the capabilities of the Gemma 2 models. The PaliGemma family of models is inspired by PaLI-3 and based on open components such as the SigLIP vision model and Gemma 2 language models. It takes both image and text as input and generates text as output, supporting multiple languages. It is designed for class-leading fine-tune performance on a wide range of vision-language tasks such as image and short video caption, visual question answering, text reading, object detection and object segmentation.

Model architecture
PaliGemma 2 is the composition of a Transformer decoder and a Vision Transformer image encoder. The text decoder is initialized from Gemma 2 in the 2B, 9B, and 27B parameter sizes. The image encoder is initialized from SigLIP-So400m/14. Similar to the original PaliGemma model, PaliGemma 2 is trained following the PaLI-3 recipes.

Inputs and outputs
Input: Image and text string, such as a prompt to caption the image, or a question. Output: Generated text in response to the input, such as a caption of the image, an answer to a question, a list of object bounding box coordinates, or segmentation codewords.

Model implementation author: @james77777778
KerasHub PaliGemma implementation lead: @divyashreepathihalli

james77777778 and others added 3 commits December 4, 2024 01:14
* Add PaliGemma2 arch

* Enable mixed precision check for PaliGemma

* Add conversion script

* Revert ImageConverter and reduce mem usage in the conversion script

* Remove `compute_output_spec`

* Fix `compute_output_shape` issue for keras 3.1

* Add model cards and update conversion script

* update presets

---------

Co-authored-by: divyashreepathihalli <[email protected]>
@github-actions github-actions bot added the Gemma Gemma model specific issues label Dec 5, 2024
Copy link
Member

@mattdangerw mattdangerw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm!

@divyashreepathihalli divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Dec 5, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Dec 5, 2024
@divyashreepathihalli divyashreepathihalli merged commit f251ed3 into keras-team:master Dec 5, 2024
7 of 10 checks passed
divyashreepathihalli added a commit to divyashreepathihalli/keras-nlp that referenced this pull request Dec 5, 2024
* Add PaliGemma2 (keras-team#96)

* Add PaliGemma2 arch

* Enable mixed precision check for PaliGemma

* Add conversion script

* Revert ImageConverter and reduce mem usage in the conversion script

* Remove `compute_output_spec`

* Fix `compute_output_shape` issue for keras 3.1

* Add model cards and update conversion script

* update presets

---------

Co-authored-by: divyashreepathihalli <[email protected]>

* Update pali_gemma_presets.py - remove mix presets

* Update pali_gemma_presets.py

* Update convert_pali_gemma2_checkpoints.py

---------

Co-authored-by: james77777778 <[email protected]>
divyashreepathihalli added a commit that referenced this pull request Dec 5, 2024
* Adding PaliGemma2 to KerasHub (#1998)

* Add PaliGemma2 (#96)

* Add PaliGemma2 arch

* Enable mixed precision check for PaliGemma

* Add conversion script

* Revert ImageConverter and reduce mem usage in the conversion script

* Remove `compute_output_spec`

* Fix `compute_output_shape` issue for keras 3.1

* Add model cards and update conversion script

* update presets

---------

Co-authored-by: divyashreepathihalli <[email protected]>

* Update pali_gemma_presets.py - remove mix presets

* Update pali_gemma_presets.py

* Update convert_pali_gemma2_checkpoints.py

---------

Co-authored-by: james77777778 <[email protected]>

* Version bump to 0.18.0

* Update pali_gemma_presets.py (#2003)

* Update pali_gemma_presets.py

* code reformat

* Adding PaliGemma2 to KerasHub (#1998)

* Add PaliGemma2 (#96)

* Add PaliGemma2 arch

* Enable mixed precision check for PaliGemma

* Add conversion script

* Revert ImageConverter and reduce mem usage in the conversion script

* Remove `compute_output_spec`

* Fix `compute_output_shape` issue for keras 3.1

* Add model cards and update conversion script

* update presets

---------

Co-authored-by: divyashreepathihalli <[email protected]>

* Update pali_gemma_presets.py - remove mix presets

* Update pali_gemma_presets.py

* Update convert_pali_gemma2_checkpoints.py

---------

Co-authored-by: james77777778 <[email protected]>

* Update pali_gemma_presets.py (#2003)

* Update pali_gemma_presets.py

* code reformat

---------

Co-authored-by: james77777778 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Gemma Gemma model specific issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants