Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: [CPU][ARM] Weights compression f32->f16 is moved to CPU Plug-in side #21080

Closed

Conversation

antonvor
Copy link
Contributor

@antonvor antonvor commented Nov 15, 2023

Details:

The PR disables weights compression fp32->fp16 on the ngraph side and moves them to the CPU plug-in in fp32 precision. It allows us to improve memory consumption on ARM64 platforms. This change only affects MatMul nodes
PR to oneDNN fork: openvinotoolkit/oneDNN#220

Tickets:

@antonvor antonvor added this to the 2023.3 milestone Nov 15, 2023
@antonvor antonvor self-assigned this Nov 15, 2023
@antonvor antonvor requested review from a team as code owners November 15, 2023 07:37
@antonvor antonvor requested review from ilya-lavrenov and removed request for a team November 15, 2023 07:37
@github-actions github-actions bot added category: CPU OpenVINO CPU plugin category: transformations OpenVINO Runtime library - Transformations category: samples OpenVINO Runtime Samples labels Nov 15, 2023
@antonvor antonvor force-pushed the feature/llm_memory_consumption branch from c8cee60 to d62e4d3 Compare November 15, 2023 07:40
@github-actions github-actions bot removed the category: samples OpenVINO Runtime Samples label Nov 15, 2023
@antonvor antonvor force-pushed the feature/llm_memory_consumption branch from d62e4d3 to 2e42899 Compare December 11, 2023 09:59
@antonvor antonvor requested a review from a team as a code owner December 11, 2023 09:59
@github-actions github-actions bot added the category: build OpenVINO cmake script / infra label Dec 11, 2023
@antonvor antonvor force-pushed the feature/llm_memory_consumption branch 2 times, most recently from 0f383e0 to a09364c Compare December 13, 2023 06:54
@antonvor
Copy link
Contributor Author

@itikhono may I ask you to review transformation changes?

@antonvor antonvor requested a review from itikhono December 13, 2023 08:51
@antonvor antonvor requested a review from vurusovs December 14, 2023 07:33
@vurusovs
Copy link
Contributor

LGTM from tests side

@antonvor antonvor force-pushed the feature/llm_memory_consumption branch 2 times, most recently from 3bcbe16 to aaa63c7 Compare December 15, 2023 08:18
@@ -43,4 +49,19 @@ class TRANSFORMATIONS_API Decompression : public RuntimeAttribute {
}
};

class TRANSFORMATIONS_API Compression : public RuntimeAttribute {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please provide in comment with explanation why we need this rt_info? Fron which it will be clear why we cannot use the existing ones a need a new rt_info

Compression() = default;

bool visit_attributes(AttributeVisitor& visitor) override {
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it really necessary to store this rt_info to IR?

@@ -48,6 +49,7 @@ bool ov::pass::AlignMixedFP32FP16Types::run_on_model(const std::shared_ptr<ov::M
copy_runtime_info(incoming_node, convert);
input.replace_source_output(convert);
disable_fp16_compression(convert);
mark_as_compression(convert);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This converts are decompression converts: they upcast to fp32 for precision sensitive subgraphs. Is it possible to rename mark_as_compression to avoid confusions?
Since mark_as_compression is used only for converts that are inserted to align types for f16 and f32 parts, can we name it e.g. mark_type_aligning_convert to avoid confusion?

@antonvor antonvor force-pushed the feature/llm_memory_consumption branch from aaa63c7 to 0e48570 Compare December 19, 2023 08:50
@antonvor antonvor force-pushed the feature/llm_memory_consumption branch from 0e48570 to 1b28c78 Compare December 19, 2023 09:24
@dmitry-gorokhov dmitry-gorokhov modified the milestones: 2023.3, 2024.0 Dec 20, 2023
Copy link
Contributor

github-actions bot commented Jan 4, 2024

This PR will be closed in a week because of 2 weeks of no activity.

@github-actions github-actions bot added the Stale label Jan 4, 2024
Copy link
Contributor

This PR was closed because it has been stalled for 2 week with no activity.

@github-actions github-actions bot closed this Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: build OpenVINO cmake script / infra category: CPU OpenVINO CPU plugin category: transformations OpenVINO Runtime library - Transformations Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants