Update fusion layer counting logic for Llama 3.2 weight conversion #1722

ebsmothers · 2024-09-30T19:56:12Z

Checkpoint save errors without this change

Test plan:

tune run lora_finetune_single_device --config llama3_2_vision/11B_lora_single_device max_steps_per_epoch=1

Before: https://gist.github.com/ebsmothers/c9ad0175cedeb5ad2719aec4d266090d

After:

INFO:torchtune.utils._logging:Saving final epoch checkpoint.
INFO:torchtune.utils._logging:The full model checkpoint, including all weights and configurations, has been saved successfully.You can now use this checkpoint for further training or inference.
INFO:torchtune.utils._logging:Checkpoint saved in 46.25 seconds.

pytorch-bot · 2024-09-30T19:56:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1722

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit df8cab3 with merge base 3fddc56 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

RdoubleA · 2024-09-30T20:04:21Z

torchtune/models/llama3_2_vision/_convert_weights.py

-    num_fusion_layers = (
-        max(_layer_num(k) for k in state_dict if "cross_attention_layers" in k) + 1
+    num_fusion_layers = len(
+        set([k.split(".")[2] for k in state_dict if "fusion_layer" in k])


can you add a comment explaining what the FQN looks like here? Why not just count number of "fusion_layer" in k for k in state_dict?

Why not just count number of "fusion_layer" in k for k in state_dict?

There are multiple params for each layer so we need to dedup

felipemello1 · 2024-09-30T20:30:04Z

related: #1721

pbontrager · 2024-09-30T20:35:28Z

torchtune/models/llama3_2_vision/_convert_weights.py

@@ -148,8 +148,8 @@ def llama3_vision_tune_to_meta(

    # Calculate fusion_interval: layer interval where cross attention layers are fused
    num_layers = max(_layer_num(k) for k in state_dict if "layers" in k) + 1
-    num_fusion_layers = (
-        max(_layer_num(k) for k in state_dict if "cross_attention_layers" in k) + 1
+    num_fusion_layers = len(


What is the [2] referring to here? Isn't that the layer number?

Yeah exactly

pbontrager

Thanks for catching this. Can you just add a comment above the change saying you're getting the unique layer numbers or use the _layer_number function?

codecov-commenter · 2024-09-30T21:46:36Z

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 67.64%. Comparing base (6bc143f) to head (df8cab3).
Report is 20 commits behind head on main.

Files with missing lines	Patch %	Lines
...rchtune/models/llama3_2_vision/_convert_weights.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1722      +/-   ##
==========================================
- Coverage   70.67%   67.64%   -3.03%     
==========================================
  Files         299      304       +5     
  Lines       15251    15627     +376     
==========================================
- Hits        10778    10571     -207     
- Misses       4473     5056     +583

Flag	Coverage Δ
	`67.64% <0.00%> (-3.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…1722)

Update fusion layer counting logic for Llama 3.2 weight conversion

60689b9

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 30, 2024

RdoubleA reviewed Sep 30, 2024

View reviewed changes

pbontrager reviewed Sep 30, 2024

View reviewed changes

pbontrager approved these changes Sep 30, 2024

View reviewed changes

ebsmothers added 2 commits September 30, 2024 14:40

add comment

f759b84

fix

df8cab3

ebsmothers merged commit 10b02e0 into pytorch:main Sep 30, 2024
17 checks passed

ebsmothers deleted the llama3_2_vision_fix branch September 30, 2024 22:19

c2huc2hu mentioned this pull request Sep 30, 2024

Fix llama 3.2 checkpointer #1721

Closed

13 tasks

RdoubleA pushed a commit that referenced this pull request Oct 2, 2024

Update fusion layer counting logic for Llama 3.2 weight conversion (#…

ff2a85f

…1722)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update fusion layer counting logic for Llama 3.2 weight conversion #1722

Update fusion layer counting logic for Llama 3.2 weight conversion #1722

ebsmothers commented Sep 30, 2024 •

edited

Loading

pytorch-bot bot commented Sep 30, 2024 •

edited

Loading

RdoubleA Sep 30, 2024

ebsmothers Sep 30, 2024

felipemello1 commented Sep 30, 2024

pbontrager Sep 30, 2024 •

edited

Loading

ebsmothers Sep 30, 2024

pbontrager left a comment

codecov-commenter commented Sep 30, 2024

Update fusion layer counting logic for Llama 3.2 weight conversion #1722

Update fusion layer counting logic for Llama 3.2 weight conversion #1722

Conversation

ebsmothers commented Sep 30, 2024 • edited Loading

pytorch-bot bot commented Sep 30, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1722

✅ No Failures

RdoubleA Sep 30, 2024

Choose a reason for hiding this comment

ebsmothers Sep 30, 2024

Choose a reason for hiding this comment

felipemello1 commented Sep 30, 2024

pbontrager Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

ebsmothers Sep 30, 2024

Choose a reason for hiding this comment

pbontrager left a comment

Choose a reason for hiding this comment

codecov-commenter commented Sep 30, 2024

Codecov Report

ebsmothers commented Sep 30, 2024 •

edited

Loading

pytorch-bot bot commented Sep 30, 2024 •

edited

Loading

pbontrager Sep 30, 2024 •

edited

Loading