fix: pass rope_theta argument when initializing LlamaLikeBlock for models like qwen2, mistral, etc. #568

Shuai-Xie · 2024-08-03T07:16:28Z

This PR aims to fix this issue #567.

    def fuse_transformer(self):
            ...
            blocks.append(
                LlamaLikeBlock(
                    hidden_size=self.model.config.hidden_size,
                    n_heads=self.model.config.num_attention_heads,
                    n_kv_heads=self.model.config.num_key_value_heads,
                    qkv_layer=qkv,
                    o_proj=module.self_attn.o_proj,
                    mlp=module.mlp,
                    norm_1=norm_1,
                    norm_2=norm_2,
                    dev=device,
                    max_seq_len=self.model.config.max_seq_len,
                    rope_theta=self.model.config.rope_theta,                # only add this line.
                )
            )

casper-hansen · 2024-08-04T16:26:14Z

Thanks for the fix!

fix: pass rope_theta argument when initializing LlamaLikeBlock

272a4b4

Shuai-Xie mentioned this pull request Aug 3, 2024

AutoAWQ models like qwen2, mistral and aquila, the Fuser class should pass the rope_theta argument when initializing the LlamaLikeBlock. #567

Closed

casper-hansen closed this Aug 4, 2024

casper-hansen reopened this Aug 4, 2024

casper-hansen merged commit 202b967 into casper-hansen:main Aug 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: pass rope_theta argument when initializing LlamaLikeBlock for models like qwen2, mistral, etc. #568

fix: pass rope_theta argument when initializing LlamaLikeBlock for models like qwen2, mistral, etc. #568

Shuai-Xie commented Aug 3, 2024

casper-hansen commented Aug 4, 2024

fix: pass rope_theta argument when initializing LlamaLikeBlock for models like qwen2, mistral, etc. #568

fix: pass rope_theta argument when initializing LlamaLikeBlock for models like qwen2, mistral, etc. #568

Conversation

Shuai-Xie commented Aug 3, 2024

casper-hansen commented Aug 4, 2024