Fix for Mask Patch Failure and Quantization Issues in Latest `transformers` Versions #368

Aintor · 2024-11-10T16:57:02Z

In the latest versions of the transformers library (specifically above version 4.34.1), the _make_causal_mask function in the modeling_clip module has been removed. Previously, code that utilized this function looked like this:

causal_attention_mask = _make_causal_mask(input_shape, hidden_states.dtype, device=hidden_states.device)

However, with recent updates, this call has been replaced by:

causal_attention_mask = _create_4d_causal_attention_mask(input_shape, hidden_states.dtype, device=hidden_states.device)

You can see more in this thread huggingface/transformers#28305.

This change disrupts the functionality in python_coreml_stable_diffusion/torch2coreml.py, where the following line:

modeling_clip._make_causal_mask = patched_make_causal_mask

can no longer patch the _make_causal_mask function as expected, resulting in the following error during quantization:

ValueError: Input X contains infinity or a value too large for dtype('float64').

See related issues: #331, #303, #325, #246

This PR addresses the issue by adding a monkey patch to modeling_clip for the _create_4d_causal_attention_mask function, thereby fixing the mask patch failure and restoring compatibility with the --quantize-nbits feature in the latest transformers versions. It also retains the original function override to maintain support for older transformers versions.

Aintor · 2024-11-10T17:05:53Z

By the way, I have tested this PR in a new conda environment initialized with pip install -e ., and the --quantize-nbits flag works as expected.

python_coreml_stable_diffusion/torch2coreml.py

Aintor · 2024-11-12T03:13:14Z

@aseemw I noticed PR #316 mentions transformers version 4.29.2, where it uses the following line:
causal_attention_mask = self._build_causal_attention_mask(bsz, seq_len, hidden_states.dtype, device=hidden_states.device)
Do you think I should add support for versions below 4.29.2 as well?

Aintor · 2024-11-12T03:43:43Z

@aseemw I noticed PR #316 mentions transformers version 4.29.2, where it uses the following line: causal_attention_mask = self._build_causal_attention_mask(bsz, seq_len, hidden_states.dtype, device=hidden_states.device) Do you think I should add support for versions below 4.29.2 as well?

However, I don’t recommend adding support for versions below 4.30.0, as the mask functionality has changed frequently in the modeling_clip module since its creation and remained unstable until version 4.30.0. I believe it would be best to discontinue support for versions earlier than 4.30.0.

Update torch2coreml.py

2447030

aseemw reviewed Nov 11, 2024

View reviewed changes

python_coreml_stable_diffusion/torch2coreml.py Outdated Show resolved Hide resolved

Aintor added 2 commits November 12, 2024 10:31

Update torch2coreml.py

4ef83f1

Update torch2coreml.py

7e3df29

Aintor requested a review from aseemw November 12, 2024 02:37

Update torch2coreml.py

50bb4c4

aseemw approved these changes Nov 12, 2024

View reviewed changes

aseemw requested a review from TobyRoseman November 12, 2024 16:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for Mask Patch Failure and Quantization Issues in Latest `transformers` Versions #368

Fix for Mask Patch Failure and Quantization Issues in Latest `transformers` Versions #368

Aintor commented Nov 10, 2024 •

edited

Loading

Aintor commented Nov 10, 2024

Aintor commented Nov 12, 2024

Aintor commented Nov 12, 2024

Fix for Mask Patch Failure and Quantization Issues in Latest transformers Versions #368

Are you sure you want to change the base?

Fix for Mask Patch Failure and Quantization Issues in Latest transformers Versions #368

Conversation

Aintor commented Nov 10, 2024 • edited Loading

Aintor commented Nov 10, 2024

Aintor commented Nov 12, 2024

Aintor commented Nov 12, 2024

Fix for Mask Patch Failure and Quantization Issues in Latest `transformers` Versions #368

Fix for Mask Patch Failure and Quantization Issues in Latest `transformers` Versions #368

Aintor commented Nov 10, 2024 •

edited

Loading