2024年3月28号启动
完整理解PaddlePaddle项目
如果只是在搜索引擎 搜 PaddlePaddle是远远不够的
PaddlePaddle + 架构图
PaddlePaddle + 概念关键词
PaddlePaddle + 问题排查
PaddlePaddle + 面试汇总
PaddlePaddle + 极客挑战赛
PaddlePaddle + 论坛会议
PaddlePaddle + 论文
PaddlePaddle + 前沿分享
PaddlePaddle + 场景应用
PaddlePaddle + PaddlePaddle大佬名字
PaddlePaddle + 公司项目
等等才能完全熟悉PaddlePaddle
- 编译 测试 和 自动跳转功能
- paddle和flink看的顺序不一样,flink是java,可以根据package,横向拆解,一层层看代码,有简单模块,看向复杂模块。但是paddle是c++,横向拆解不够粗粒度,只能纵向拆解,先串通整个流程设计的概念,再从源码编译顺序看看怎么横向分层
- 代码拆解
- python 维度,大量业务用了哪些接口
- c++ 维度,不同包直接的依赖,依赖程度情况
- fluid
- pir
- phi
- 编译顺序,横向拆解
- 目录角度,纵向拆解
- 理论角度:有了工具,就需要找业务问题,有多少场景,每个场景为什么有这个模型,不同模型的收益是怎么样的。每篇论文提出的背景是什么,假设是什么,解决的问题是什么,改造点在哪里就让原始数据集有效果提升。
- 现实角度:这么算法和论文,哪些才是长久跑到线上,哪些是重投入的,哪些是暴力出效果的
- 关键词搜索
- paddle op
- paddle program 设计
- paddle ir 设计
- paddle 前向图 设计
- paddle 反向图 设计
- 分布式设计
- StaticRNN
- DynamicRNN
- place是什么
- PlaceGroup
- Executor
- pserver
- CompiledProgram
- Transpiler
- op分配到硬件
- DistributedBlock
- DistributeTranspiler
- MemoryOptimizationTranspiler
- ExecutorRole
- Momentum
- paddle 源码分析
- 对比pytorch语料太少了,运营不给力啊
- 挖issue 挖docs 挖comment 挖commit 挖核心开发人员的开发路径 大功能迭代
- jacquesqiao
- paddle理解历程
- 横向结构看,太难,看不下去
- 纵向结构拆解,稍微有点思路
- cpp理解,太难,看不下去
- python + 横向拆解 稍微有点思路
- 关键词尽可能搜网上设计文档 稍微有点思路
- 飞桨框架v2.4 API新升级!全面支持稀疏计算、图学习、语音处理等任务:https://mp.weixin.qq.com/s/8IYYEbJqIjyWd2zO7TGGFw
- PaddleBox:百度基于GPU的超大规模离散DNN模型训练解决方案:https://mp.weixin.qq.com/s/o-ZoRnAMnINGHVALj7DQRA
- 大模型时代的异构计算平台:https://mp.weixin.qq.com/s/ZAP_8ZwkZ295I225QTh9Yw
- 委员会
- 技术分享
- pfcc会议链接:https://github.com/PaddlePaddle/community/blob/62cb010a432155cb9265c6f98f0aa3d28cee3a0a/pfcc/meetings/2024/2024-11-14-meeting-minutes.md
- 训练营:https://github.com/PFCCLab/Camp?tab=readme-ov-file
- 代码串讲:https://github.com/PFCCLab/Camp
- 读书会:https://github.com/PaddlePaddle/community/tree/master/pfcc/paddle-code-reading
- pfcc:https://pfcc.blog/
- 周会:https://github.com/PaddlePaddle/community/tree/master/pfcc
- code reading:https://github.com/PaddlePaddle/community/tree/master/pfcc/paddle-code-reading
- 模型库
- 性能优化
- blog
- 黑客松
- Hackathon 1th:PaddlePaddle/Paddle#35940
- Hackathon 2th:PaddlePaddle/Paddle#40234
- Hackathon 3th:PaddlePaddle/Paddle#43938
- Hackathon 4th:PaddlePaddle/Paddle#51281
- Hackathon 5th:https://github.com/PaddlePaddle/docs/blob/release/2.5/docs/guides/10_contribution/hackathon_cn.md
- Hackathon 6th:PaddlePaddle/Paddle#62907
- Hackathon 7th:PaddlePaddle/Paddle#68242
- 人物经历
- 课程
- 李宏毅课程-机器学习:https://aistudio.baidu.com/course/introduce/1978
- Pir Parser实现分享:https://github.com/PFCCLab/Camp/blob/main/Docs/Hackathon_5th/01_PIR_OpTestFixingAndProgramTranslatorRefinement/pir_parser_implementation_sharing.md
- PIR 组网 API:https://github.com/PFCCLab/Camp/blob/main/Docs/Hackathon_5th/03_NewIRAPI_AutoDiffAndCoreComponentRefinement/CodeReading/Over_view_PIR_construct_API_As_CodeGen_perspective.md
- codegen:https://github.com/PaddlePaddle/community/blob/master/pfcc/paddle-code-reading/code_gen/code_gen_ops.md
- 算子组合机制:https://github.com/PFCCLab/Camp/blob/main/Docs/Hackathon_5th/04_TheUnityOfOperatorForwardAndBackwardInCombinationFeatures/CodeReading/the_unity_of_operator_forward_and_backward_in_combination_features.md
- Paddle静态图并行编排与执行流程:https://github.com/PFCCLab/Camp/blob/main/Docs/Hackathon_5th/10_StaticGraph_Semi-AutomaticParallel_ExecutionFrameworkUpgrade/CodeReading/1-paddle_static_graph_pipelining.md
- 代码层面熟悉 PIR:https://github.com/PFCCLab/Camp/blob/main/Docs/Hackathon_5th/19_PIR_Adapt_CINN/CodeReading/PIR_source_code_reading_guide.md
- ZB VPP 实现方案:https://github.com/PFCCLab/Camp/blob/main/Docs/Hackathon_6th/10_Static_Graph_semi-automatic_parallel_training_performance_optimization/zb_vpp_design.md
- PaddlePaddle inference 源码分析
- 三 Operator定义注册机制:https://www.cnblogs.com/zl1991/p/15712940.html
- 四 预测处理的流程:https://www.cnblogs.com/zl1991/p/15728406.html
- 五 graph和pass:https://www.cnblogs.com/zl1991/p/16010209.html
- PaddleDetection源码解析:https://aistudio.baidu.com/projectdetail/1327454
- 超大规模稀疏参数(DistributedLookupTable)的本地预测+增量使用
- 入门经历,按顺序看
- https://space.keter.top/docs/high_performance/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0%E6%A1%86%E6%9E%B6%E5%BC%80%E5%8F%91/Paddle%E9%85%8D%E7%BD%AETensorrt
- https://space.keter.top/docs/high_performance/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0%E6%A1%86%E6%9E%B6%E5%BC%80%E5%8F%91/Paddle%E7%AE%97%E5%AD%90%E5%BC%80%E5%8F%91
- https://space.keter.top/docs/high_performance/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0%E6%A1%86%E6%9E%B6%E5%BC%80%E5%8F%91/%E9%9D%99%E6%80%81%E5%9B%BE%E7%BB%84%E7%BD%91%E8%BF%87%E7%A8%8B
- https://space.keter.top/docs/high_performance/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0%E6%A1%86%E6%9E%B6%E5%BC%80%E5%8F%91/%E9%9D%99%E6%80%81%E5%9B%BE%E6%89%A7%E8%A1%8C%E8%BF%87%E7%A8%8B
- https://space.keter.top/docs/high_performance/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0%E6%A1%86%E6%9E%B6%E5%BC%80%E5%8F%91/TensorRT%E7%AE%97%E5%AD%90%E5%BC%80%E5%8F%91
- paddle所有运营活动在github,但是运营内容都是中文,没有通过微信或者官网得到很好关键词挖掘
- 搜 paddle 技术串讲 在google和baidu是两个不一样的结果。中国人其实更容易学paddle,但是获取相关信息渠道出了问题
- 活动很多,但是没有统一首页管理
- ao
- amp
- autograd
- nn
- layer
- torch
- aot
- Ahead-of-Time (AOT) 的缩写,它是一种编译技术,用于在运行之前将代码编译成机器代码。与即时编译(Just-in-Time, JIT)相反,AOT编译在程序运行之前完成,将源代码或中间表示(如抽象语法树)转换为机器代码。
- jit
- qat
- 量化感知训练(Quantization Aware Training),是一种用于量化神经网络的训练方法。在深度学习中,通常使用浮点数进行计算和训练,但浮点数计算需要较大的存储和计算资源。量化是一种将浮点数模型转换为定点数模型的技术,可以在一定程度上减少模型的存储空间和计算量,从而提高模型的效率。
- sparse
- pruning
- unsqueeze
- unique
- empty
- randint
- arange
- clamp or clip
- no_grad 禁止梯度计算
- ts教程:https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html
- ts中文教程:https://pytorch.panchuang.net/EigthSection/torchScript/
- onnx文件及其结构:https://blog.csdn.net/Rolandxxx/article/details/127713806
- pd文件:https://gitcode.com/gh_mirrors/onn/onnx/blob/main/onnx/onnx.proto
- IR:https://github.com/onnx/onnx/blob/main/docs/IR.md
- 神经网络模型的Program,由多个Block(控制流结构)构成,每个Block是由Operator(算子)和数据表示Variable(变量)构成
- 思考问题
- 自定义layer
- 自定义op
- 自定义program
- 自定义block
- python测试用例 如何验证以上关键类
- LayerHelper
- create_variable_for_type_inference 变量类型推导
- append_op op管理
- 核心类
- framework.py 引入 libpaddle c++
- Program
- core.py 引入 libpaddle c++
- compiler.py
- backward.py
- executor.py
- fluid
- framework.py 引入 libpaddle c++
- 概念
- trt
- 前端编程界面:用户使用框架的方式是基于API定义网络结构(前向计算逻辑),并控制训练过程。所以,API的设计是用户最直接感受产品是否好用的触点。
- 自动微分机制:自动完成反向逻辑,这是深度学习框架与Numpy科学计算库的本质区别。如果只是进行数学运算,大量的科学计算库均能满足需求。深度学习框架的特色是根据用户设定的前向计算逻辑,自动生产反向计算的逻辑,使得用户不用操心模型的训练过程。
- 内部表达IR:统一表达有两个用处,一方面便于进行算子融合等计算优化,另一方面便于序列化,用于对接不同的硬件以及进行分布式训练等,即接下来的第4步和第5步。
- 计算图优化:可以用户手工优化,也可以框架根据策略自动优化,当然后者对用户更加友好。
- 算子序列执行:单机单卡的训练是比较简单的,根据前向和后向计算逻辑顺序执行就好。但对于单机多卡和多机多卡的分布式训练,如何并发、异步执行计算流程,并调度异构设备等问题就需要在这个环节仔细设计。
- 算子库:框架产品最好有完备的算子库,对于99%以上的模型均有高效实现的算子可以直接使用。但由于科学的不断前进,总有新创造的模型可能会用一些新的计算函数,所以框架要允许易添加用户自定义的算子也是同样重要的,用户添加的算子同时要具备原生算子一样的训练和推理能力,并且可以在广泛的硬件上部署。
- 后端硬件接入:框架需要与众多种类的训练和预测硬件完成对接,这些硬件接口开放的层次往往也是不同的,典型有IR子图的高层接入方式和底层编程接口接入方式,目前最新的AI编译器技术也在尝试解决这个问题。
# custom device
from .libpaddle import ( # noqa: F401
CustomDeviceEvent,
CustomDeviceStream,
_get_current_custom_device_stream,
_set_current_custom_device_stream,
_synchronize_custom_device,
)
- fluid包关联
/Paddle/paddle/fluid/framework/framework.proto
data_feed.proto
distributed_strategy.proto
framework.proto
heter_service.proto
op_def.proto
op_proto_maker.cc
op_proto_maker.h
op_version_proto.cc
op_version_proto.h
pass_desc.proto
proto_desc.h
trainer_desc.proto
- 一个Program的集合通常包含初始化程序(startup_program)与主程序(main_program),默认情况下,飞桨的神经网络模型都包括两个program,分别是static.default_startup_program()以及static.default_main_program(),它们共享参数。 default_startup_program 只运行一次来初始化参数,训练时,default_main_program 在每个batch中运行并更新权重。
- 图描述调研:https://github.com/PaddlePaddle/docs/blob/develop/docs/design/others/graph_survey.md
- 设计:https://www.paddlepaddle.org.cn/documentation/api/paddle/static/Program_cn.html
- block设计
- Block
- a Program is consistence of multi-Block, and Block stores VarDesc and OpDes
- Block 是高级语言中变量作用域的概念,在编程语言中,Block 是一对大括号,其中包含局部变量定义和一系列指令或操作符
- Block 描述了一组以顺序、选择或是循环执行的 Operator 以及 Operator 操作的对象:Tensor。
- 控制流
- ifelse
- switch
- while
- DynamicRNN
- StaticRNN
- describes the process and is conceptually like an abstract syntax tree.
- BlockDesc
- VarDesc 变量描述
- OpDesc 算子描述
message ProgramDesc {
reserved 2, 3; // For backward compatibility.
repeated BlockDesc blocks = 1;
optional Version version = 4;
optional OpVersionMap op_version_map = 5;
}
message BlockDesc {
required int32 idx = 1;
required int32 parent_idx = 2;
repeated VarDesc vars = 3;
repeated OpDesc ops = 4;
optional int32 forward_block_idx = 5 [ default = -1 ];
}
message VarDesc {
message Attr {
required string name = 1;
required AttrType type = 2;
optional int32 i = 3;
optional string s = 4;
repeated int32 ints = 5;
};
required string name = 1;
required VarType type = 2;
optional bool persistable = 3 [ default = false ];
// True if the variable is an input data and
// have to check the feed data shape and dtype
optional bool need_check_feed = 4 [ default = false ];
optional bool is_parameter = 5 [ default = false ];
optional bool stop_gradient = 6 [ default = false ];
repeated Attr attrs = 7;
}
message OpDesc {
message Attr {
required string name = 1;
required AttrType type = 2;
optional int32 i = 3;
optional float f = 4;
optional string s = 5;
repeated int32 ints = 6;
repeated float floats = 7;
repeated string strings = 8;
optional bool b = 10;
repeated bool bools = 11;
optional int32 block_idx = 12;
optional int64 l = 13;
repeated int32 blocks_idx = 14;
repeated int64 longs = 15;
repeated double float64s = 16;
optional string var_name = 17;
repeated string vars_name = 18;
optional double float64 = 19;
optional Scalar scalar = 20;
repeated Scalar scalars = 21;
};
message Var {
required string parameter = 1;
repeated string arguments = 2;
};
required string type = 3;
repeated Var inputs = 1;
repeated Var outputs = 2;
repeated Attr attrs = 4;
optional bool is_target = 5 [ default = false ];
};
import paddle
import paddle.static as static
paddle.enable_static()
# 当输入为单个张量时
train_program = static.Program()
start_program = static.Program()
places = static.cpu_places()
with static.program_guard(train_program, start_program):
data = static.data(name="data1", shape=[2, 3], dtype="float32")
data2 = static.data(name="data2", shape=[3, 4], dtype="float32")
res = paddle.matmul(data, data2)
print(static.default_main_program())
结构如下
{ // block 0
var data1 : LOD_TENSOR.shape(2, 3).dtype(float32).stop_gradient(True)
var data2 : LOD_TENSOR.shape(3, 4).dtype(float32).stop_gradient(True)
var matmul_v2_0.tmp_0 : LOD_TENSOR.shape(2, 4).dtype(float32).stop_gradient(False)
{Out=['matmul_v2_0.tmp_0']} = matmul_v2(inputs={X=['data1'], Y=['data2']}, fused_reshape_Out = [], fused_reshape_X = [], fused_reshape_Y = [], fused_transpose_Out = [], fused_transpose_X = [], fused_transpose_Y = [], mkldnn_data_type = float32, op_device = , op_namescope = /, op_role = 0, op_role_var = [], trans_x = False, trans_y = False, use_mkldnn = False, with_quant_attr = False)
}
结构拆解 program呈现字典形式的结构:
blocks{
vars{
"""vars attribute"""
}
vars{
"""vars attribute"""
}
ops{
"""opeartors attribute"""
}
version{
"""other information"""
}
}
import paddle
import paddle.static as static
paddle.enable_static()
# 当输入为单个张量时
train_program = static.Program()
start_program = static.Program()
places = static.cpu_places()
with static.program_guard(train_program, start_program):
data = static.data(name="data1", shape=[2, 3], dtype="float32")
data2 = static.data(name="data2", shape=[3, 4], dtype="float32")
res = paddle.matmul(data, data2)
print(static.default_main_program(), file=open("program.txt", 'w'))
- python/paddle/_C_ops.py
- core.eager.ops
- core.pir.ops
- test/ir/inference/program_config.py
- create_fake_model
- create_quant_model
- 对于 opt_op, 他做了 Placement, 将 opt_op 放到了 pserver program 中;
- 对于 backward_op 和 forward_op, 他做了 Replication, 每一个 trainer program 都有这些 op;
- 对于 var, 他既有 Placement, 又有 Replication;
- 用于把程序转化为不同的优化组合:https://www.paddlepaddle.org.cn/documentation/zh/api_guides/low_level/compiled_program.html#api-guide-compiled-program
# 首先创建 Executor。
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
exe = fluid.Executor(place)
# 运行启动程序仅一次。
exe.run(fluid.default_startup_program())
# 直接运行主程序,无需编译。
loss = exe.run(fluid.default_main_program(),
feed=feed_dict,
fetch_list=[loss.name])
# 或者编译程序后用优化过的 Program 运行模型。
build_strategy = fluid.BuildStrategy()
build_strategy.memory_optimize = True if memory_opt else False
compiled_prog = compiler.CompiledProgram(
fluid.default_main_program(),
build_strategy=build_strategy)
loss, = exe.run(compiled_prog,
feed=feed_dict,
fetch_list=[loss.name])
- 调用链路:
- https://github.com/PaddlePaddle/community/blob/master/pfcc/paddle-code-reading/Dygraph/20221201_dygraph_forward.md
- https://github.com/PaddlePaddle/community/blob/master/pfcc/paddle-code-reading/Dygraph/20221201_dygraph_backward.md
- https://github.com/PaddlePaddle/community/blob/master/pfcc/paddle-code-reading/Dygraph/20221201_dygraph_autodifferentiation_datastructure.md
- 用户端;python框架端;python-c映射层;dygraph_function层;c++api层;c++kernel层
- 反向图相关类
- AbstractAutogradMeta
- TensorWrapper
- GradNodeBase
- GradTensorHolder
- paddle 动态图和静态图的区别
- 静态图模式的思想是以程序代码完整描述一个网络结构和训练过程,然后将模型封装成一个Program交予执行器进行执行
- 动态图的模式则会使用Python原生的控制流语句,实时解释执行模型训练过程
- 代码组织方式不同
- 在使用静态图实现算法训练时,需要使用很多代码完成预定义的过程,包括program声明,执行器Executor执行program等等。但是在动态图中,动态图的代码是实时解释执行的,训练过程也更加容易调试。
- 代码执行方式不同
- 在静态图模式下,完整的网络结构在执行前是已知的,因此图优化分析的灵活性比较大,往往执行性能更佳,但调试难度大。
- 动态执行过程:https://www.paddlepaddle.org.cn/tutorials/projectdetail/4047189#anchor-7
- 图的设计思想:https://www.paddlepaddle.org.cn/tutorials/projectdetail/3991882
- 新动态图前向调用过程代码详解:https://github.com/PaddlePaddle/community/blob/master/pfcc/paddle-code-reading/Dygraph/20221201_dygraph_forward.md
- 官方概念:https://www.paddlepaddle.org.cn/tutorials/projectdetail/4047189
- 执行过程:https://github.com/PaddlePaddle/community/blob/master/pfcc/paddle-code-reading/static_graph_execution/20221230_static_graph_execution.md
- 动态图
- 在动态图模式下,Operator 是即时执行的,即用户每调用一个飞桨API,API均会马上执行返回结果
- 在模型训练过程中,在运行前向 Operator 的同时,框架底层会自动记录对应的反向 Operator 所需的信息,即一边执行前向网络,另一边同时构建反向计算图。
- 在动态图模式下,执行器并不是先掌握完整的网络全图,再按照固定模式批量执行。而是根据Python的原生控制流代码,逐条执行前向计算过程和后向计算过程,其中后向计算的逻辑飞桨框架会在前向计算的时候自动化构建,不需要用户再操心。
Variable:表示网络中的数据。
Operator:表示网络中的操作。
Block:表示编程语言中的控制流结构,如条件结构(if-else)、循环结构(while)等。
Program:基于Protobuf的序列化能力提供模型保存、加载功能。Protobuf是Google推出的一个结构化数据的序列化框架,可将结构化数据序列化为二进制流,或从二进制流中反序列化出结构化数据。飞桨模型的保存、加载功能依托于Protobuf的序列化和反序列化能力。
Transpiler:可选的编译步骤,作用是将一个Program转换为另一个Program。
Intermediate Representation:在执行前期,用户定义的Program会转换为一个统一的中间表达。
Executor:用于快速调度 Operator ,完成网络训练/预测。
- python/paddle/tensor
- 矩阵运算,线性运行和常见数学运算
- 真正的实体是在cpp实现,python提供一个接口透传运算逻辑到cpp中
tensor.py 其实没有用,只需把计算逻辑传给cpp即可
array.py
attribute.py
creation.py
einsum.py
layer_function_generator.py
linalg.py
logic.py
manipulation.py
math.py
ops.py
random.py
search.py
stat.py
tensor.prototype.pyi
to_string.py
- DenseTensorDesc
- DenseTensorArrayDesc
- SparseCooTensor
- SparseCsrTensor
message TensorDesc {
// Should only be PODType. Is enforced in C++
required Type data_type = 1;
repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480]
}
- 执行问题:PaddlePaddle/Paddle#11771
- 把opt op移入到pserver program里去;
- 根据pserver数量对是Grad类型的Variable进行切分, 并且加上split_op;
- 对切分过后的Variable加上send_op;
- 对是Param类型的Variable进行切分, 并且加上concat_op;
- 对切分过后的Variable加上recv_op;
- 另外我们还要为pserver program加上listen_and_serv_op;
- 最后, 我们还要考虑到所有被opt_op依赖的lr_decay_op, 还有lr_decay_op所引入的所有sub_blocks...
- What's more, 以上的任意一步, 只要逻辑发生改变, 我们就得对Transpiler进行大动...
- Dist-Transpiler
- 修改当前的 Program, 直到能运行在分布式环境下;
- 把当前的 Program 拆成两个或多个 Program;
- 针对这些 Program, 再做出一些修改, 用以适配pserver的模式;
- ParallelExecutor:https://www.paddlepaddle.org.cn/documentation/zh/api_guides/low_level/parallel_executor.html#api-guide-parallel-executor
- memory reuse
- 内存分配api:https://github.com/PaddlePaddle/docs/blob/develop/docs/dev_guides/custom_device_docs/memory_api_cn.md
- 新硬件接入:https://github.com/PaddlePaddle/docs/blob/develop/docs/dev_guides/custom_device_docs/custom_device_example_cn.md
- 训练硬件 Custom Device 接入方案介绍:https://github.com/PaddlePaddle/docs/blob/develop/docs/dev_guides/custom_device_docs/custom_device_overview_cn.md
- PaddlePaddle/Paddle#11771
- PlaceGroup 硬件描述
- Executor 执行器 pserver
- 分配每个node到指定机器节点
# message ProgramDesc {
# block[0] = Block {
# vars = [
# x: {
# attr: {
# place: PlaceGroup {
# roles[0] = ExecutorRole.trainer,
# nodes[0] = Node {
# ip = 192.168.10.1,
# port = 7164
# },
# }
# }
# },
#
# w: {
# attr: {
# place: PlaceGroup {
# roles[0] = ExecutorRole.pserver,
# nodes[0] = Node {
# ip = 127.0.0.1,
# port = 7164
# },
# }
# }
# },
#
# b: {
# attr: {
# place: PlaceGroup {
# roles[0] = ExecutorRole.pserver,
# nodes[0] = Node {
# ip = 127.0.0.1,
# port = 7164
# },
# }
# }
# },
#
# y: {
# attr: {
# place: PlaceGroup {
# roles[0] = ExecutorRole.trainer,
# nodes[0] = Node {
# ip = 192.168.10.1,
# port = 7164
# },
# }
# }
# },
#
# w@GRAD: {
# attr: {
# place: PlaceGroup {
# roles[0] = ExecutorRole.trainer,
# nodes[0] = Node {
# ip = 127.0.0.1,
# port = 7164
# },
# }
# }
# },
#
# b@GRAD: {
# attr: {
# place: PlaceGroup {
# roles[0] = ExecutorRole.trainer,
# nodes[0] = Node {
# ip = 127.0.0.1,
# port = 7164
# },
# }
# }
# },
#
# lr: {
# attr: {
# place: PlaceGroup {
# roles[0] = ExecutorRole.pserver,
# nodes[0] = Node {
# ip = 127.0.0.1,
# port = 7164
# },
# }
# }
# },
# ]
- https://github.com/PaddlePaddle/docs/blob/develop/docs/design/dist_train/src/local-graph.png
- https://github.com/PaddlePaddle/docs/blob/develop/docs/design/dist_train/src/dist-graph.png
- https://github.com/PaddlePaddle/docs/blob/develop/docs/design/dist_train/src/distributed_lookup_table.jpeg
- https://github.com/PaddlePaddle/docs/blob/develop/docs/design/dist_train/src/distributed_architecture.png
- https://github.com/PaddlePaddle/docs/blob/develop/docs/design/dist_train/src/remote_executor.png
- https://github.com/PaddlePaddle/docs/blob/develop/docs/design/dist_train/src/sync_distributed_training.png
- https://github.com/PaddlePaddle/docs/blob/develop/docs/design/dist_train/src/mpi_module.png
- 通信api:https://github.com/PaddlePaddle/docs/blob/develop/docs/dev_guides/custom_device_docs/ccl_api_cn.md
参考 机器学习解决方案
- data_loader phi_dynamic_loader
- dynamic_loader
- tensor
- tensor_meta
- tensor_base
- tensor_array
- sparse_csr_tensor
- 算法相关
- blas
- 性能优化
- prune
- eigen_function
- pooling
- cross_entropy
- matrix_reduce
- cos_sim_functor
- fft
- elementwise
- fusion
- 非核心
- workqueue_utils
- memory_utils
- profiler_utils
- op_version_registry
- threadpool
- place
- ddim
- mlu_tracer custom_tracer cuda_tracer
- Op 和 Kernel
- layer
- torchbind
- jitbackend MIR
- SSA Graph
- 推理引擎
- 图拆解
- 动态图
- final_dygraph_node
- final_dygraph_function
- dygraph_function
- dygraph_node
- 静态图
- subgraph_util
- subgraph_detector
- graph_helper
- phi_dygraph_api
- graph_traits
- graph
- cuda_graph_with_memory_pool
- staticgraph_executor_statistics
- 动态图
- 图并行
- fast_threaded_ssa_graph_executor
- scope_buffered_ssa_graph_executor
- bind_threaded_ssa_graph_executor
- threaded_ssa_graph_executor
- ssa_graph_executor
- 图算子
- op_graph_view
- graph_khop_sampler_op
- graph_sample_neighbors_op
- graph_reindex_op
- ir + pass
- ir_graph_build_pass
- ir_graph_to_program_pass
- graph_to_program_pass
- pass
- multi_devices_graph_pass
- multi_devices_graph_print_pass
- multi_devices_graph_check_pass
- graph_pattern_detector
- paddle/distributed/auto_parallel/graph.py
- github/baidu_paddle/paddle/fluid/framework/ir/node.h
- github/baidu_paddle/paddle/fluid/operators/graph_reindex_op.cc
- paddle
- autograd
- nn
- distributed distribution
- inference
- cinn 自动调优框架
- https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/cinn/auto_schedule_cn.html
- CINN 在算子(融合算子)实现时采用的是 compute 与 schedule 分离的思想,compute 表示算子的朴素计算语义,schedule 表示具体的计算方式。auto-schedule 的作用是自动生成算子的 schedule 配置,降低新硬件接入编译器的人力成本和技术门槛,迎合追求极致性能场景的优化需求, 以下简要介绍自动调优框架在 CINN 整体架构中的位置和核心模块的功能。
- https://github.com/PaddlePaddle/CINN/tree/develop
- auto_schedule
- ir compiler runtime framework frontend
- extern_zlib
- extern_threadpool
- extern_dlpack
- extern_gflags
- extern_eigen3
- extern_mklml
- extern_mkldnn
- framework_proto
- phi_profiler_proto
- auto_parallel_proto
- 介绍:https://github.com/PaddlePaddle/Paddle-Inference-Demo/tree/master
- 源码分析1:https://www.cnblogs.com/zl1991/p/15688005.html
- 源码分析2:https://www.cnblogs.com/zl1991/p/15688762.html
--cmake #cmake编译脚本以及编译链接的第三方库等
--doc
--paddle #c++代码
-fluid
-distributed #分布式相关代码,主要为训练使用,包括模型内all_reduce进行跨卡通信、跨机通信等
-extension #
-framework #基础组件代码
-imperative #分布式通信相关代码,包括nccl、all_reduce、bkcl等
-inference #预测相关代码以及api定义
-memory
-operators #算子
-platform #平台相关代码
-pybind #pybind接口定义
-string
-scripts
-testing
-utils
--patches
--python #python部分代码
--r
--tools
--CMakeLists.txt #编译脚本,包括大部分编译参数、三方库依赖等逻辑
- API不用看,未来会下线这个模块
- 飞桨框架 2.5 版本开始:https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/fluid/Overview_cn.html
- paddle_inference_api
- 飞桨高可复用算子库 PHI (Paddle HIgh reusability operator library),或者我们也称之为函数式算子库,支持基于已有的算子内核以及 Kernel Primitives API 组合实现新的算子,支持插件式接入新硬件或者新加速库。
- 以贡献飞桨框架API为例,手把手教你从User进阶到Contributor - 飞桨PaddlePaddle的文章 - 知乎 https://zhuanlan.zhihu.com/p/592130623
- 设计背景:https://github.com/PaddlePaddle/docs/blob/develop/docs/design/phi/design_cn.md
- 按运算设备拆分编译
- 例如:仅编译 cpu 的,或者仅编译 gpu 的
- 按训练和推理场景拆分编译
- 例如:推理不编译反向相关 kernel,也不编译带有 Intermediate 输出的前向 kernel
- 按移动端设备实际使用算子精准裁剪编译(目前尚未支持)
- 例如:一个模型只用了 add 和 mul,极致情况下应该能裁到仅剩 2 个 kernel
- 对外api和capi
paddle/phi
./api (对外暴露的高层 API 及其实现)
./include(对外暴露的高层 API 头文件)
./lib(对外暴露 API 的实现)
./capi (对外暴露的 C API 及其实现)
./include
./lib
./common (内外部均会使用到的基础数据结构)
./core (基础组件,比如基础 Tensor 相关接口,kernel 注册接口,管理单元等)
./backends (各设备及后端的基础组件,下设 cpu,gpu 等后端目录)
./infermeta (shape、dtype、layout 等 meta 信息的推导函数)
./kernels (各设备及后端的 kernel 实现)
./ops (各 Op 的定义,后续采取自动生成的方式完成大部分工作,目前仅有兼容用的代码)
./tests (单元测试)
paddle/phi/kernels
./ (放置设备无关的 kernel 声明和实现)
./cpu(仅放置 cpu 后端的 kernel 实现)
./gpu
./xpu
./onednn
./gpudnn
./impl (考虑到现状,放置原先 Kernel 在 CPU 和 GPU 或其他设备一致的实现,便于复用)
./funcs(放置原 fluid operators 下一些支持多设备的 functor 和 funcs)
./primitive(放置 Kernel Primitive API 的基础实现)
...
- 硬件相关功能 cpu xpu gpu
- 设备管理器等
- Pass 的核心是子图匹配和替换(即图变换),是将一个 Program 通过某种规则转换为另一个新的 Program。IR 中包含了计算图中全局信息,如上下游算子的邻接关系等,更有利于进行图优化,比如常量折叠、算子融合,Inplace 策略等:
- 提供了 2 种 Pass 开发范式:Pattern Rewriter 和 Declarative Rewrite Rule(简称 DRR)
- PIR 基本概念和开发:https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/paddle_v3_features/paddle_ir_cn.html
- 技术文章
- 常见的深度学习 IR
- MLIR,TVM relay,MindIR,XLA hlo,TorchScript等。
- 飞桨先后实现了两套 IR
- 一套是 Program
- 一套是 Graph
- 新设计
- 坚持SSA(静态单赋值)原则,模型等价于一个有向无环图。 Operation 为节点,Value 为边
- 算子分层,分类。通过将算子和类型定义在不同的 Dialect 来实现算子和类型的分层分类
- 底层算子与算子库 Api 对齐
- 通过yaml + python脚本的形式,生成算子的 C++ 定义
- 优化
- 类型的本质是对内存的一种抽象,不同的类型,会有不同的内存布局和内存分配的策略
- Operation 表示计算图中的节点:一个 Operation 表示一个算子,它里面包含了零个或多个 Region;Region 表示一个闭包,它里面包含了零个或多个 Block;Block 表示一个符合 SSA 的基本块,里面包含了零个或多个 Operation;三者循环嵌套,可以实现任意复杂的语法结构
- Value 表示计算图中的有向边:用来将两个 Operaton 关联起来,描述了程序中的 UD 链(即 Use-Define 链);OpResult 表示定义端,定义了一个 Value,OpOperand 表示使用端,描述了对一个 Value 的使用
- 推理部署 :简化抽象计算图,解决有环问题,降低 Pass 的开发成本
- 分布式侧 :多 Dialect 管理算子,支持分布式属性的灵活标记
- 编译器侧 :严格遵循 SSA 原则,灵活支撑编译优化鲁棒性
- 完备且鲁邦的语义表达能力、训推全架构统一表示和高效可插拔的性能优化策略(Pass)开发机制
- IR设计
- 控制流表示:https://github.com/PaddlePaddle/community/blob/master/pfcc/paddle-code-reading/IR_Dialect/control_flow.md
- 模型表示:https://github.com/PaddlePaddle/community/blob/master/pfcc/paddle-code-reading/IR_Dialect/ir_program.md
- 数据类型表示:https://github.com/PaddlePaddle/community/blob/master/pfcc/paddle-code-reading/IR_Dialect/control_flow.md
- 新 IR 适配编译器方案思路:PaddlePaddle/Paddle#56879
- Interface 、Trait、Type、Attribute、Op、Porgram
- 新 IR Program 可以逐步替换当前框架中的 Program、Graph、Cinn Program、Cinn Graph
- 支持List、Dict等复杂类型
- 定义相应的Trait, 对 inplace 算子以及 View 类算子进行合理的抽象描述
- 计算图严格遵守 SSA 原则
- IR 基础库会抽象出一套 Pass 基础组件
- 源码
- paddle/pir/src/core
- 问题
- 代码复用性
- 两种场景 Paddle 和 Cinn 。很多功能相能相同 Pass ,需要重复开发两次(ConstantFold、CSE等等)。
- IR 元素和具体的场景强绑定在一起,场景无关逻辑无法复用。
- 框架稳定性问题
- 目前的 paddle 模型结构 Program 存在序列式 IR 的隐式依赖。所有可能破坏算子顺序的模块,都是不完备的,都是存在风险的。
- 在 Pass中 ,图变换会对算子进行增删移动,改变算子顺序。因此,Pass 是存在风险的。
- 新执行器中,会对算子进行多流并行调度,相当于改变了算子的顺序。因此,执行器也是存在风险的。
- 能力不足
- 框架对 IR 元素的抽象相对都比较朴素,表达能力不足。
- 没有规范的类型系统,无法支持 List 、Dict等复杂类型,以及更近一步的类型嵌套表示。
- 缺乏严谨的 Tensor 别名机制。目前可以通过输入&输出同名来标志 inplace 算子,但是对于 share_date 、reshape (编译器算子)...这类view 类的别名算子是无法表达的。
- 计算图不符合 SSA 原则。编译器领域的很多成熟的优化理论无法直接应用。
- 代码复用性
- 解决思路
- MLIR 的出现证明了可以用一套数据结构同时满足 Graph 和 Program 的功能,解决 Program & Graph 的二者同步问题。
- 模型表示层
- Program 用来表示一个具体的模型。它包含两部分:计算图 和 权重
- Weight 用来对模型的权重参数进行单独存储,这也是深度学习框架和传统编译器不一样的地方。传统编译器会将数据段内嵌到程序里面。这是因为传统编译器里面,数据和代码是强绑定的,不可分割。但是对神经网络而言,一个计算图的每个 epoch 都会存在一份权重参数,多个计算图也有可能共同一份权重参数,二者不是强绑定的。
- Value、Operation 用来对计算图进行抽象
- 模型优化
- 图变换表示对一个 Program 进行优化,将其转换为另一个 Program 。里面包含了我们遇见的所有 Pass 。包括 Mapping``、Fusion、Pruning、Canonalization、CSE等等
- framework::ProgramDesc => ir::Graph => frontend::Program (NetBuilder 层) => hlir::Graph =>
Compute/Schedule() => AST 层面 => Module::Builder => CodeGen+NVRTC => Runtime::Program
- MLIR 文章视频汇总 - 法斯特豪斯的文章 - 知乎 https://zhuanlan.zhihu.com/p/141256429
- cinn Compiler Infrastructure for Neural Networks:https://github.com/PaddlePaddle/CINN
- Paddle 训练框架应用 CINN 进行编译优化加速:https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/cinn/paddle2cinn_intro_cn.html
- CINN 神经网络编译器:https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/paddle_v3_features/cinn_cn.html
- paddle代码自动生成机制讲解:https://github.com/PaddlePaddle/community/blob/master/pfcc/paddle-code-reading/code_gen/code_gen_ops.md
- 从 CodeGen 视角看 PIR 组网 API:https://github.com/PFCCLab/Camp/blob/main/Docs/Hackathon_5th/03_NewIRAPI_AutoDiffAndCoreComponentRefinement/CodeReading/Over_view_PIR_construct_API_As_CodeGen_perspective.md
- 框架静态图算子自动生成:PaddlePaddle/Paddle#51842
- 算子定义生成体系建设--静态图算子自动生成:PaddlePaddle/Paddle#53267
- 新 IR 下自动代码生成 :PaddlePaddle/Paddle#56849
- infer_meta + Yaml文件自动生成算子代码
- 存在三套 CodeGen 设计体系:即动态图、静态图、旧动态图,但旧动态图在完全退场后,旧动态图的CodeGen体系也预计会被完全清除。
- python api -> python-c 转发 -> python-c 映射层 -> PIR op组网 API -> pir::Op 生成函数
- 算子组合开发问题:https://github.com/PFCCLab/Camp/blob/main/Docs/Hackathon_5th/04_TheUnityOfOperatorForwardAndBackwardInCombinationFeatures/CodeReading/the_unity_of_operator_forward_and_backward_in_combination_features.md
- 组合机制设计文档:https://github.com/PaddlePaddle/community/blob/master/pfcc/paddle-code-reading/operator_decomposition_mechanism/design.md
- 问题背景
- NVIDIA数据公布, GPT-3 175B 使用1024块A100 训练34天,需要强大分布式训练能力。
- 科学计算微分方程求解如EulerBeam,前向运算四阶微分,加上一次反向传播过程,需要五阶微分。
- 编译器加速、国产硬件支持也近期深度学习框架演进重要方向。
- 飞桨框架底层基于原生算子体系(1061个,正向 691个,反向 370个,且持续增长)构建,导致上述场景适配难度极大,具体原因如下:
- 分布式:需要基于原生算子体系手写自动并行策略。
- 高阶微分:需要手写高阶微分算子,如Matmul五阶代码行数超过3000。
- 编译器:需要将原生大算子拆解成细粒度编译器小算子,才能进一步融合优化。
- 新硬件:需要接入所有算子,才能正确运行。
- 飞桨高可复用算子库 PHI 设计文档:https://github.com/PaddlePaddle/docs/blob/develop/docs/design/phi/design_cn.md
- PHI注册:https://github.com/PaddlePaddle/community/blob/master/pfcc/paddle-code-reading/PHI_kernel_registration/PHI_kernel_registration.md
- Inplace 介绍 & 使用介绍:https://github.com/PaddlePaddle/community/blob/master/pfcc/paddle-code-reading/Inplace/inplace_introduction.md
- Paddle Inplace 使用指南:https://github.com/AndSonder/community/blob/ae18ba0d0b3b4ddd5ebc814c8190cd8f7a42b1bc/rfcs/Article/20240321_guide_to_using_Inplace.md
- https://github.com/PaddlePaddle/docs/blob/develop/docs/design/dist_train/parameter_server.md
- 参考tf:TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
- PaddlePaddle源代码解析之:Parameter切分逻辑分析:PaddlePaddle/Paddle#1994
- paddle.distributed.fleet.data_generator
- paddle.onnx
- paddle.pir
- paddle.jit
- paddle.sparse
- jit load
__init__.py: 这个文件通常用于将目录标记为Python包,可能还包含一些初始化代码。
api.py: 定义了PaddlePaddle JIT功能的API接口,允许用户以编程方式访问JIT相关功能。
dy2static 目录: 包含将动态图转换为静态图所需的工具和模块。
__init__.py: 初始化dy2static包。
ast_utils.py: 包含用于操作抽象语法树(AST)的实用工具。
convert_call_func.py: 用于转换函数调用。
convert_operators.py: 用于转换操作符。
error.py: 定义了转换过程中可能遇到的错误类型。
export_subgraph.py: 用于导出子图。
function_spec.py: 包含函数规范的描述。
logging_utils.py: 包含日志记录工具。
origin_info.py: 包含原始信息处理工具。
partial_program.py: 包含部分程序的表示。
pir_partial_program.py: 包含Paddle Intermediate Representation(PIR)的部分程序表示。
program_translator.py: 用于翻译程序的模块。
py_layer.py: 包含Python层的表示。
layer.py: 定义了深度学习层的抽象,这些层可以被JIT编译。
pir_dy2static 目录: 包含Paddle Intermediate Representation(PIR)到静态图转换相关的模块。
__init__.py: 初始化pir_dy2static包。
parameter_recorder.py: 用于记录参数信息。
sot 目录: 包含静态操作码翻译(Static Operation Translator)相关的模块。
__init__.py: 初始化sot包。
infer_meta.py: 用于推断元数据。
opcode_translator 目录: 包含操作码翻译相关的模块。
__init__.py: 初始化opcode_translator包。
breakpoint.py: 可能用于处理断点。
custom_code.py: 包含自定义代码的处理。
eval_frame_callback.py: 包含评估帧回调的处理。
executor 目录: 包含执行静态图所需的组件。
__init__.py: 初始化executor包。
其他文件包含执行器的内部组件,如分发函数、指令标志等。
instruction_utils 目录: 包含指令分析和转换的工具。
skip_files.py: 可能用于跳过某些文件的处理。
profiler.py: 用于性能分析。
psdb.py: 可能与性能数据库有关。
symbolic 目录: 包含符号执行相关的模块。
compile_cache.py: 可能用于编译缓存。
export.py: 用于导出符号表示。
interpreter.py: 包含解释器的实现。
statement_ir.py: 包含语句的中间表示。
symbolic_context.py: 包含符号执行上下文。
translate.py: 用于翻译的模块。
utils 目录: 包含实用工具。
__init__.py: 初始化utils包。
其他文件包含特定实用工具的实现。
translated_layer.py: 包含被JIT编译后的层的表示。
utils.py: 包含一些通用的实用工具。
- 单机多卡与多机多卡组网示例:https://aistudio.baidu.com/projectdetail/1222066
- 建议使用spawn方式
当调用paddle.Model高层API来实现训练时,想要启动单机多卡训练非常简单,代码不需要做任何修改,只需要在启动时增加一下参数-m paddle.distributed.launch。
#单机单卡启动,默认使用第0号卡
$ python train.py
#单机多卡启动,默认使用当前可见的所有卡
$ python -m paddle.distributed.launch train.py
#单机多卡启动,设置当前使用的第0号和第1号卡
$ python -m paddle.distributed.launch --selected_gpus='0,1' train.py
#单机多卡启动,设置当前使用第0号和第1号卡
$ export CUDA_VISIABLE_DEVICES='0,1'
$ python -m paddle.distributed.launch train.py
https://aistudio.baidu.com/projectdetail/1222066
修改三处:
#第1处改动,import库
import paddle.distributed as dist
#第2处改动,初始化并行环境
dist.init_parallel_env()
#第3处改动,增加paddle.DataParallel封装
net = paddle.DataParallel(paddle.vision.models.LeNet())
import paddle.distributed as dist
# 启动train多进程训练,默认使用所有可见的GPU卡
if __name__ == '__main__':
dist.spawn(train)
# 启动train函数2个进程训练,默认使用当前可见的前2张卡
if __name__ == '__main__':
dist.spawn(train, nprocs=2)
# 启动train函数2个进程训练,默认使用第4号和第5号卡
if __name__ == '__main__':
dist.spawn(train, nprocs=2, selelcted_gpus='4,5')spawn方式
总结
spawn方式不需要去修改代码的内部部分,只是加上dist.spawn(train)这句,相当于给训练代码加了一个多进程的壳,简单方便,是推荐使用的单机多卡组网方式!
spawn方式下在notebook里报错的情况,猜测应该是notebook进程管理限制导致的。在命令行情况下或者cell里加叹号运行的时候,就没有问题。
在不支持spawn的情况下,再去考虑用launch方式启动单机多卡。
- 使用Fleet分布式训练方式
GPU单机多卡训练
若启动单机4卡的任务,只需通过--gpus指定空闲的4张卡即可。
fleetrun --gpus=0,1,2,3 train.py
如果指定了export CUDA_VISIBLE_DEVICES=0,1,2,3,则可以直接使用:
export CUDA_VISIBLE_DEVICES=0,1,2,3
fleetrun train.py
[示例一] 2机8卡 (每个节点4卡)
fleetrun --ips="xx.xx.xx.xx,yy.yy.yy.yy" --gpus=0,1,2,3 train.py
[示例二] 2机16卡(每个节点8卡,假设每台机器均有8卡可使用)
fleetrun --ips="xx.xx.xx.xx,yy.yy.yy.yy" train.py
- TranslatedLayer
- github/Paddle/paddle/fluid/framework/new_executor/garbage_collector
- github/Paddle/paddle/fluid/framework/new_executor/workqueue
- https://www.paddlepaddle.org.cn/inference/master/guides/export_model/visual_model.html
- 导出模型:https://www.paddlepaddle.org.cn/inference/master/guides/export_model/index_export_model.html
- graph_compiler
- 各种矩阵乘法 和 阶乘实现
这么多norm函数,应该用哪个,平时怎么用,如何用?
返回结果有差异
linalg 实现上更丰富
nn.functional.norm 主要实现l1 l2
./paddle/distributed/auto_parallel/static/operators/dist_layer_norm.py
./paddle/distributed/auto_parallel/static/operators/dist_pnorm.py
./paddle/nn/initializer/normal.py
./paddle/nn/functional/norm.py
./paddle/nn/layer/norm.py
./paddle/nn/utils/spectral_norm_hook.py
./paddle/nn/utils/weight_norm_hook.py
./paddle/nn/utils/clip_grad_norm_.py
./paddle/distribution/multivariate_normal.py
./paddle/distribution/lognormal.py
./paddle/distribution/normal.py
./paddle/sparse/nn/layer/norm.py
./paddle/incubate/nn/functional/fused_rms_norm.py
./paddle/incubate/nn/functional/fused_layer_norm.py
11 results - 10 files
github/Paddle/python/paddle/nn/functional/norm.py:
30
31: def normalize(x, p=2, axis=1, epsilon=1e-12, name=None):
32 r"""
github/Paddle/python/paddle/nn/utils/spectral_norm_hook.py:
23
24: def normal_(x, mean=0.0, std=1.0):
25 temp_value = paddle.normal(mean, std, shape=x.shape)
github/Paddle/python/paddle/nn/utils/weight_norm_hook.py:
48
49: def norm_except_dim(p, dim):
50 shape = p.shape
github/Paddle/python/paddle/static/io.py:
199
200: def normalize_program(program, feed_vars, fetch_vars, **kwargs):
201 """
github/Paddle/python/paddle/tensor/linalg.py:
938
939: def norm(x, p=None, axis=None, keepdim=False, name=None):
940 """
github/Paddle/python/paddle/tensor/random.py:
683
684: def normal(mean=0.0, std=1.0, shape=None, name=None):
685 """
791 @dygraph_only
792: def normal_(x, mean=0.0, std=1.0, name=None):
793 """
github/Paddle/python/paddle/utils/cpp_extension/extension_utils.py:
532
533: def normalize_extension_kwargs(kwargs, use_cuda=False):
534 """
github/Paddle/python/paddle/vision/transforms/functional_cv2.py:
682
683: def normalize(img, mean, std, data_format='CHW', to_rgb=False):
684 """Normalizes a ndarray image or image with mean and standard deviation.
github/Paddle/python/paddle/vision/transforms/functional_tensor.py:
158
159: def normalize(img, mean, std, data_format='CHW'):
160 """Normalizes a tensor image given mean and standard deviation.
github/Paddle/python/paddle/vision/transforms/functional.py:
958
959: def normalize(img, mean, std, data_format='CHW', to_rgb=False):
960 """Normalizes a tensor or image with mean and standard deviation.
- normalize
- 归一化函数。它使用Lp范数来计算归一化的结果
- 在数学上,Lp范数是一个向量空间中的范数,用于衡量向量中元素的大小。
- p是范数的指数值,通常取大于等于1的实数。当p=2时,即为欧几里得范数(Euclidean Norm),也称为2-范数或L2范数。
- 当p=1时,即为曼哈顿范数(Manhattan Norm),也称为1-范数或L1范数。
- batch_norm
- layer_norm
- instance_norm
- local_response_norm
- group_norm
- BCELoss
- SampledSoftmaxLoss
- 这里充分运用了python可解释性和语法解析的灵活性
- 对比sql dsl来看,这样的设计灵活性更强
def logaddexp(x, y, name=None):
"""
.. math::
Out=log(X.exp()+Y.exp())
return paddle.log1p(paddle.exp(-paddle.abs(x - y))) + paddle.maximum(x, y)
PyTorch 1.8 与 Paddle 2.0 API 映射表
https://www.paddlepaddle.org.cn/documentation/docs/zh/2.4/guides/model_convert/pytorch_api_mapping_cn.html
github/Paddle/python/paddle/optimizer/optimizer.py
adagrad.py:AdaGrad优化器的实现。
adamax.py:Adamax优化器的实现。
asgd.py:平均随机梯度下降(Average Stochastic Gradient Descent)优化器的实现。
lbfgs.py:L-BFGS优化器的实现。
momentum.py:动量优化器的实现。
rmsprop.py:RMSProp优化器的实现。
sgd.py:随机梯度下降(Stochastic Gradient Descent)优化器的实现。
adadelta.py:AdaDelta优化器的实现。
adam.py:Adam优化器的实现。
adamw.py:AdamW优化器的实现。
lamb.py:LAMB(Layer-wise Adaptive Moments optimizer for Batch training)优化器的实现。
lr.py:学习率调度器的实现。
optimizer.py:优化器基类的实现。
rprop.py:Rprop(Resilient Backpropagation)优化器的实现。
梯度清零
pytorch opt.zero_grad()
paddle opt.clear_grad()
- 稀疏计算 使用指南
- https://github.com/AndSonder/community/blob/ae18ba0d0b3b4ddd5ebc814c8190cd8f7a42b1bc/rfcs/Article/20240412_guide_to_using_sparse.md
- https://github.com/PaddlePaddle/community/pull/880/files?short_path=71ec981#diff-71ec9818e57cd303cbc56336bcc80d7d6aa2a50c91a38a57a997202bd68a99c1
- https://github.com/PaddlePaddle/community/pull/883/files#diff-044bb5185eb6d238efea0f2eb7eb76648def35376960caea021c4dd6f9f2417d
- COO 与 CSR 稀疏矩阵存取格式:https://cloud.tencent.com/developer/article/1766995
- 存储效率:https://www.cnblogs.com/xbinworld/p/4273506.html
- paddle
- 结构:COO 与 CSR 稀疏矩阵存取格式
- COO(Coordinate) 格式是最直接的稀疏存储方式,它将非零元素的值和对应的行列索引分别存储在两个向量中。
- CSR(Compressed Sparse Row) 格式按行存储压缩矩阵的非零元素。每一行中的非零元素用连续的储存空间存储, 从而可以高效地访问矩阵中同一行的数据。
待看
https://zhuanlan.zhihu.com/p/188700729
https://matteding.github.io/2019/04/25/sparse-matrices/
https://xie.infoq.cn/article/41ebc9e8a53bf4ba5fa14acca
https://github.com/PaddlePaddle/Paddle3D/blob/release/1.0/paddle3d/models/middle_encoders/sparse_resnet.py
https://github.com/PaddlePaddle/Paddle3D/blob/develop/docs/models/centerpoint/README.md
- Paddle框架的0-D设计全表:https://github.com/PaddlePaddle/community/blob/master/pfcc/paddle-code-reading/ZeroDim/all_zero_dim_api.md
- 0维的判断标准:https://github.com/PaddlePaddle/community/blob/master/pfcc/paddle-code-reading/ZeroDim/judge_zero_dim.md
- 0-D Tensor:https://github.com/PaddlePaddle/community/blob/master/pfcc/paddle-code-reading/ZeroDim/zero_dim_concept.md
- 0-d Tensor 使用指南:https://github.com/PaddlePaddle/community/pull/860/files?short_path=97dfe85#diff-97dfe85948dba8115a1ff933b2ebc047005fcc8b2dc22db542039663f5c800f8
- Paddle 线性代数基础及其领域应用
- https://github.com/PaddlePaddle/community/pull/898/files?short_path=cf9bb59#diff-cf9bb59aeef3608b9a1a175149b7d583ba49fe4fd99febdc7577a9435bcfa351
- https://github.com/PaddlePaddle/community/pull/867/files#diff-005b97daccba370e0a0fcf462c6a63028d009a7aa91ade244e282d96a169d292
- PaddlePaddle/community#861
- PaddlePaddle/community#868
[ 12%] Linking CXX static library libstring_array.a
[ 12%] Linking CXX static library libtensor_base.a
[ 12%] Linking CXX static library libworkqueue_utils.a
[ 12%] Linking CXX static library libop_version_registry.a
[ 12%] Linking CXX static library libdevice_mesh.a
[ 12%] Linking CXX static library libtensor_meta.a
[ 12%] Linking CXX static library libphi_os_info.a
[ 12%] Linking CXX static library libprune.a
[ 12%] Linking CXX static library libattribute.a
[ 12%] Linking CXX static library libarg_map_context.a
[ 12%] Linking CXX static library libthreadpool.a
[ 12%] Linking CXX static library libplace.a
[ 12%] Linking CXX static library libddim.a
[ 12%] Linking CXX static library libmlu_tracer.a
[ 12%] Linking CXX static library libmemory_utils.a
[ 12%] Linking CXX static library libprofiler_utils.a
[ 12%] Linking CXX static library libextended_tensor.a
[ 13%] Linking CXX static library liblod_utils.a
[ 13%] Linking CXX static library libget_kerneltype_forvar_utils.a
[ 13%] Linking CXX static library libcustom_tracer.a
[ 14%] Linking CXX static library libcache.a
[ 15%] Linking CXX static library libcustom_device_resource_pool.a
[ 15%] Linking CXX static library libphi_profiler.a
[ 15%] Linking CXX static library libprocess_mesh.a
[ 15%] Linking CXX static library liboperants_manager.a
[ 15%] Linking CXX static library libtcp_store.a
[ 15%] Linking CXX static library libtask_loop_thread_pool.a
[ 15%] Linking CXX static library libshell.a
[ 16%] Linking CXX static library libworkqueue.a
[ 16%] Linking CXX static library libdata_loader.a
[ 16%] Linking CXX static library libpaddle_crypto.a
[ 16%] Linking CXX static library libbenchmark.a
[ 16%] Linking CXX static library libphi_dynamic_loader.a
[ 17%] Linking CXX static library libdist_mapper.a
[ 17%] Linking CXX static library libstaticgraph_executor_statistics.a
[ 17%] Linking CXX static library libop_utils.a
[ 17%] Linking CXX static library libsparse_coo_tensor.a
[ 17%] Linking CXX static library libphi_dynload_warprnnt.a
[ 17%] Linking CXX static library libphi_dynload_warpctc.a
[ 17%] Linking CXX static library libphi_dynload_lapack.a
[ 17%] Linking CXX static library libgenerator.a
[ 17%] Linking CXX static library libcuda_tracer.a
[ 17%] Linking CXX static library libtensor_api.a
[ 17%] Linking CXX static library libstats.a
[ 17%] Linking CXX static library libdynamic_loader.a
[ 17%] Linking CXX static library libcomm_context_manager.a
[ 17%] Linking CXX static library libphi_dynload_mklml.a
[ 17%] Linking CXX static library libswitch_autotune.a
[ 17%] Linking CXX static library libfs.a
[ 17%] Linking CXX static library libevent_node.a
[ 17%] Linking CXX static library libdynload_warpctc.a
[ 17%] Linking CXX static library libdynload_mklml.a
[ 17%] Linking CXX static library libcblas.a
[ 17%] Linking CXX static library libcpu_helper.a
[ 19%] Linking CXX static library libprofiler_logger.a
[ 19%] Linking CXX static library libphi_backends.a
[ 19%] Linking CXX static library libevent_bind.a
[ 19%] Linking CXX static library libconvert_utils.a
[ 20%] Linking CXX static library libphi_device_context.a
[ 20%] Linking CXX static library libselected_rows.a
[ 20%] Linking CXX static library libdense_tensor.a
[ 20%] Linking CXX static library libcpu_utilization.a
[ 20%] Linking CXX static library libops_extra_info.a
[ 20%] Linking CXX static library libtensor_array.a
[ 20%] Linking CXX static library libsparse_csr_tensor.a
[ 20%] Linking CXX static library libstring_tensor.a
[ 20%] Linking CXX static library libmeta_tensor.a
[ 20%] Linking CXX static library libblas.a
[ 21%] Linking CXX static library libphi_c_device_context.a
[ 21%] Linking CXX static library libkernel_context.a
[ 21%] Linking CXX static library libkernel_factory.a
[ 21%] Linking CXX static library libphi_c_data_type.a
[ 21%] Linking CXX static library libphi_c_tensor.a
[ 21%] Linking CXX static library libcomm_static_check.a
[ 21%] Linking CXX static library libphi_c_kernel_context.a
[ 21%] Linking CXX static library libphi_c_kernel_registry.a
[ 22%] Linking CXX static library libprocess_group.a
[ 22%] Linking CXX static library libsparse_backward_infermeta.a
[ 22%] Linking CXX static library libinfermeta_utils.a
[ 22%] Linking CXX static library libcustom_kernel.a
[ 22%] Linking CXX static library libphi_c_kernel_factory.a
[ 22%] Linking CXX static library libbackward_infermeta.a
[ 22%] Linking CXX static library libscope.a
[ 22%] Linking CXX static library libop_proto_maker.a
[ 22%] Linking CXX static library libsparse_infermeta.a
[ 22%] Linking CXX static library libstring_infermeta.a
[ 22%] Linking CXX static library libop_call_stack.a
[ 23%] Linking CXX static library libeigen_function.a
[ 23%] Linking CXX static library libvar_type_traits.a
[ 23%] Linking CXX static library libinfermeta.a
[ 23%] Linking CXX static library libscope_pool.a
[ 23%] Linking CXX static library libnan_inf_utils.a
[ 23%] Linking CXX static library libhost_tracer.a
[ 23%] Linking CXX static library libop_compat_infos.a
[ 23%] Linking CXX static library libnew_profiler.a
[ 23%] Linking CXX static library libselected_rows_utils.a
[ 23%] Linking CXX static library libshape_inference.a
[ 23%] Linking CXX static library libprofiler.a
[ 23%] Linking CXX static library libdevice_context.a
[ 24%] Linking CXX static library liballocator.a
[ 24%] Linking CXX static library libmalloc.a
[ 24%] Linking CXX static library libmaxouting.a
[ 24%] Linking CXX static library libgpc.a
[ 24%] Linking CXX static library libjit_kernel_base.a
[ 24%] Linking CXX static library libconcat_and_split_functor.a
[ 24%] Linking CXX static library libim2col.a
[ 24%] Linking CXX static library libmemcpy.a
[ 24%] Linking CXX static library liblstm_compute.a
[ 24%] Linking CXX static library liblapack_function.a
[ 24%] Linking CXX static library libno_need_buffer_vars_inference.a
[ 24%] Linking CXX static library libsequence_scale.a
[ 24%] Linking CXX static library libsequence_padding.a
[ 24%] Linking CXX static library libvol2col.a
[ 24%] Linking CXX static library libpooling.a
[ 24%] Linking CXX static library libcross_entropy.a
[ 24%] Linking CXX static library libdeformable_conv_functor.a
[ 24%] Linking CXX static library libop_kernel_type.a
[ 24%] Linking CXX static library libmatrix_bit_code.a
[ 24%] Linking CXX static library libinit.a
[ 24%] Linking CXX static library libmatrix_reduce.a
[ 24%] Linking CXX static library libheter_wrapper.a
[ 24%] Linking CXX static library libop_info.a
[ 24%] Linking CXX static library libmatrix_inverse.a
[ 24%] Linking CXX static library libtransfer_scope_cache.a
[ 24%] Linking CXX static library libcollective_helper.a
[ 24%] Linking CXX static library libcos_sim_functor.a
[ 24%] Linking CXX static library libconcat_and_split.a
[ 24%] Linking CXX static library libsequence2batch.a
[ 24%] Linking CXX static library libmemory.a
[ 24%] Linking CXX static library libsampler.a
[ 24%] Linking CXX static library libsample_prob.a
[ 24%] Linking CXX static library libsegment_pooling.a
[ 24%] Linking CXX static library libscope_buffered_monitor.a
[ 24%] Linking CXX static library libjit_kernel_intrinsic.a
[ 25%] Linking CXX static library libcuda_graph_with_memory_pool.a
[ 25%] Linking CXX static library libunpooling.a
[ 25%] Linking CXX static library libdevice_code.a
[ 25%] Linking CXX static library libjit_kernel_refer.a
[ 26%] Linking CXX static library libjit_kernel_mix.a
[ 26%] Linking CXX static library libjit_kernel_mkl.a
[ 26%] Linking CXX static library libunused_var_check.a
[ 26%] Linking CXX static library libfft.a
[ 26%] Linking CXX static library libphi_tensor_utils.a
[ 26%] Linking CXX static library libcontext_pool.a
[ 26%] Linking CXX static library libmask_util.a
[ 26%] Linking CXX static library libinit_phi.a
[ 26%] Linking CXX static library libjit_kernel_jitcode.a
[ 26%] Linking CXX static library libjit_kernel_helper.a
[ 26%] Linking CXX static library libgarbage_collector.a
[ 26%] Linking CXX static library libmixed_vector.a
[ 26%] Linking CXX static library libint_array.a
[ 26%] Linking CXX static library libscalar.a
[ 26%] Linking CXX static library libphi_c_int_array.a
[ 26%] Linking CXX static library libphi_c_scalar.a
[ 26%] Linking CXX static library libfc_functor.a
[ 26%] Linking CXX static library libtensor.a
[ 26%] Linking CXX static library libphi_capi.a
[ 26%] Linking CXX static library libprocess_group_custom.a
[ 26%] Linking CXX static library liblayout_autotune.a
[ 26%] Linking CXX static library libphi_tensor_raw.a
[ 26%] Linking CXX static library libdata_device_transform.a
[ 26%] Linking CXX static library libgather_scatter_functor.a
[ 26%] Linking CXX static library libphi_data_layout_transform.a
[ 27%] Linking CXX static library libdlpack_tensor.a
[ 28%] Linking CXX static library liblod_tensor.a
[ 28%] Linking CXX static library libapi_tensor_utils.a
[ 28%] Linking CXX static library libjit_property.a
[ 28%] Linking CXX static library libapi_gen_utils.a
[ 28%] Linking CXX static library libkernel_dispatch.a
[ 28%] Linking CXX static library libdata_type_transform.a
[ 28%] Linking CXX static library libvar_helper.a
[ 28%] Linking CXX static library libmkldnn_quantizer_config.a
[ 28%] Linking CXX static library libop_meta_info.a
[ 28%] Linking CXX static library libreader.a
[ 28%] Linking CXX static library libprocessgroup_comm_utils.a
[ 28%] Linking CXX static library libreset_tensor_array.a
[ 28%] Linking CXX static library libbox_wrapper.a
[ 28%] Linking CXX static library libpybind_util.a
[ 28%] Linking CXX static library liblod_rank_table.a
[ 28%] Linking CXX static library libvariable_visitor.a
[ 28%] Linking CXX static library libmath_function.a
[ 28%] Linking CXX static library libzero_copy_tensor.a
[ 28%] Linking CXX static library libdevice_worker.a
[ 28%] Linking CXX static library libfeed_fetch_method.a
[ 28%] Linking CXX static library libpy_reader.a
[ 28%] Linking CXX static library libvariable_helper.a
[ 28%] Linking CXX static library libanalysis_config.a
[ 28%] Linking CXX static library libjit_serializer.a
[ 28%] Linking CXX static library libbuffered_reader.a
[ 28%] Linking CXX static library libgru_compute.a
[ 28%] Linking CXX static library libdata_layout_transform.a
[ 28%] Linking CXX static library libcontext_project.a
[ 28%] Linking CXX static library libtree2col.a
[ 28%] Linking CXX static library libselected_rows_functor.a
[ 28%] Linking CXX static library libpaddle_infer_contrib.a
[ 28%] Linking CXX static library libsoftmax.a
[ 28%] Linking CXX static library libbeam_search.a
[ 28%] Linking CXX static library liblodtensor_printer.a
[ 28%] Linking CXX static library libmatrix_solve.a
[ 29%] Linking CXX static library libsequence_pooling.a
[ 29%] Linking CXX static library libgloo_wrapper.a
[ 29%] Linking CXX static library libfleet_wrapper.a
[ 29%] Linking CXX static library libdata_transform.a
[ 29%] Linking CXX static library libps_gpu_wrapper.a
[ 29%] Linking CXX static library libmetrics.a
[ 46%] Linking CXX static library libphi_cpu.a
[ 46%] Linking CXX static library libapi_int_array.a
[ 46%] Linking CXX static library libapi_scalar.a
[ 47%] Linking CXX static library libtensor_copy.a
[ 47%] Linking CXX static library libphi.a
[ 49%] Linking CXX static library libstrings_api.a
[ 49%] Linking CXX static library libphi_data_transform.a
[ 49%] Linking CXX static library libwrapped_infermeta.a
[ 49%] Linking CXX static library libphi_utils.a
[ 49%] Linking CXX static library libapi_custom_impl.a
[ 49%] Linking CXX static library libsparse_api.a
[ 49%] Linking CXX static library libsparse_bw_api.a
[ 49%] Linking CXX static library libinfershape_utils.a
[ 49%] Linking CXX static library libproto_desc.a
[ 49%] Linking CXX static library liboperator.a
[ 49%] Linking CXX static library libdist_attr.a
[ 49%] Linking CXX static library libnode.a
[ 49%] Linking CXX static library libprogram_utils.a
[ 49%] Linking CXX static library libop_compatible_info.a
[ 49%] Linking CXX static library libargument.a
[ 49%] Linking CXX static library libanalysis_pass.a
[ 49%] Linking CXX static library libstatic_global_utils.a
[ 49%] Linking CXX static library libop_variant.a
[ 49%] Linking CXX static library libauto_parallel.a
[ 49%] Linking CXX static library libget_expected_kernel_func.a
[ 49%] Linking CXX static library libcommon_infer_shape_functions.a
[ 49%] Linking CXX static library libmodel_utils.a
[ 49%] Linking CXX static library libop_registry.a
[ 49%] Linking CXX static library libvar_handle.a
[ 49%] Linking CXX static library libjit_function_utils.a
[ 49%] Linking CXX static library libjit_serializer_utils.a
[ 49%] Linking CXX static library libphi_function_api.a
[ 49%] Linking CXX static library libprim_utils.a
[ 49%] Linking CXX static library libop_handle_base.a
[ 49%] Linking CXX static library libwhile_op_helper.a
[ 49%] Linking CXX static library libstatic_utils.a
[ 49%] Linking CXX static library libgraph.a
[ 49%] Linking CXX static library libmemory_optim_pass.a
[ 49%] Linking CXX static library libprepared_operator.a
[ 49%] Linking CXX static library libdevice_event_base.a
[ 49%] Linking CXX static library libcustom_device_common_op_registry.a
[ 49%] Linking CXX static library libphi_tensor_operants.a
[ 49%] Linking CXX static library libshare_tensor_buffer_functor.a
[ 50%] Linking CXX static library libcomputation_op_handle.a
[ 50%] Linking CXX static library librpc_op_handle.a
[ 50%] Linking CXX static library libfetch_barrier_op_handle.a
[ 50%] Linking CXX static library libphi_tensor.a
[ 50%] Linking CXX static library libbroadcast_op_handle.a
[ 50%] Linking CXX static library libfetch_async_op_handle.a
[ 50%] Linking CXX static library libpy_func_op.a
[ 50%] Linking CXX static library libreduce_op_handle.a
[ 51%] Linking CXX static library libjit_function_schema.a
[ 51%] Linking CXX static library libgraph_traits.a
[ 51%] Linking CXX static library libphi_dygraph_api.a
[ 51%] Linking CXX static library libscale_loss_grad_op_handle.a
[ 51%] Linking CXX static library libgather_op_handle.a
[ 52%] Linking CXX static library libfused_all_reduce_op_handle.a
[ 52%] Linking CXX static library liball_reduce_op_handle.a
[ 52%] Linking CXX static library libreader_op_registry.a
[ 52%] Linking CXX static library libdevice_event_custom_device.a
[ 52%] Linking CXX static library libmanual_static_prim_api.a
[ 52%] Linking CXX static library libop_graph_view.a
[ 52%] Linking CXX static library libeager_nan_inf_utils.a
[ 52%] Linking CXX static library libfetch_op_handle.a
[ 52%] Linking CXX static library libfused_broadcast_op_handle.a
[ 52%] Linking CXX static library libgenerated_static_prim_api.a
[ 52%] Linking CXX static library libstatic_prim_api.a
[ 52%] Linking CXX static library libshare_tensor_buffer_op_handle.a
[ 52%] Linking CXX static library libgrad_merge_all_reduce_op_handle.a
[ 52%] Linking CXX static library libreference_count_pass_helper.a
[ 52%] Linking CXX static library liblayer.a
[ 52%] Linking CXX static library libstatic_tensor_operants.a
[ 52%] Linking CXX static library libop_desc_meta.a
[ 52%] Linking CXX static library libeager_deletion_op_handle.a
[ 52%] Linking CXX static library libgraph_helper.a
[ 52%] Linking CXX static library libamp.a
[ 52%] Linking CXX static library libgradient_accumulator.a
[ 52%] Linking CXX static library libload_combine_op.a
[ 52%] Linking CXX static library libwarpctc_op.a
[ 53%] Linking CXX static library libsave_combine_op.a
[ 53%] Linking CXX static library libquantize_linear_op.a
[ 53%] Linking CXX static library libbox_coder_op.a
[ 53%] Linking CXX static library libiou_similarity_op.a
[ 53%] Linking CXX static library libdensity_prior_box_op.a
[ 53%] Linking CXX static library libpolygon_box_transform_op.a
[ 53%] Linking CXX static library libbipartite_match_op.a
[ 53%] Linking CXX static library libyolo_box_op.a
[ 53%] Linking CXX static library libprior_box_op.a
[ 53%] Linking CXX static library libtarget_assign_op.a
[ 53%] Linking CXX static library libanchor_generator_op.a
[ 53%] Linking CXX static library libmine_hard_examples_op.a
[ 53%] Linking CXX static library librpn_target_assign_op.a
[ 53%] Linking CXX static library libbox_clip_op.a
[ 53%] Linking CXX static library libgenerate_proposal_labels_op.a
[ 53%] Linking CXX static library libmatrix_nms_op.a
[ 53%] Linking CXX static library liblocality_aware_nms_op.a
[ 53%] Linking CXX static library libmulticlass_nms_op.a
[ 54%] Linking CXX static library libbox_decoder_and_assign_op.a
[ 54%] Linking CXX static library libyolov3_loss_op.a
[ 55%] Linking CXX static library libsigmoid_focal_loss_op.a
[ 55%] Linking CXX static library libretinanet_detection_output_op.a
[ 55%] Linking CXX static library libgenerate_proposals_v2_op.a
[ 55%] Linking CXX static library libgenerate_proposals_op.a
[ 55%] Linking CXX static library libdistribute_fpn_proposals_op.a
[ 55%] Linking CXX static library libgenerate_mask_labels_op.a
[ 55%] Linking CXX static library libroi_perspective_transform_op.a
[ 55%] Linking CXX static library libcollect_fpn_proposals_op.a
[ 55%] Linking CXX static library libelementwise_floordiv_op.a
[ 55%] Linking CXX static library libelementwise_add_op.a
[ 55%] Linking CXX static library libelementwise_div_op.a
[ 55%] Linking CXX static library libelementwise_heaviside_op.a
[ 56%] Linking CXX static library libelementwise_mod_op.a
[ 56%] Linking CXX static library libelementwise_min_op.a
[ 56%] Linking CXX static library libelementwise_max_op.a
[ 56%] Linking CXX static library libfused_adam_op.a
[ 56%] Linking CXX static library libfused_conv2d_op.a
[ 56%] Linking CXX static library libelementwise_mul_op.a
[ 56%] Linking CXX static library libelementwise_pow_op.a
[ 56%] Linking CXX static library libelementwise_sub_op.a
[ 56%] Linking CXX static library libfused_elementwise_op.a
[ 56%] Linking CXX static library libfused_matmul_op.a
[ 56%] Linking CXX static library libfused_embedding_fc_lstm_op.a
[ 56%] Linking CXX static library libfused_softplus_op.a
[ 56%] Linking CXX static library libfused_embedding_seq_pool_op.a
[ 56%] Linking CXX static library libfusion_seqconv_eltadd_relu_op.a
[ 56%] Linking CXX static library libfused_transpose_op.a
[ 56%] Linking CXX static library libfusion_repeated_fc_relu_op.a
[ 56%] Linking CXX static library libfused_seqpool_cvm_op.a
[ 57%] Linking CXX static library libfusion_seqexpand_concat_fc_op.a
[ 57%] Linking CXX static library libfusion_seqpool_concat_op.a
[ 57%] Linking CXX static library libfusion_seqpool_cvm_concat_op.a
[ 57%] Linking CXX static library libfusion_squared_mat_sub_op.a
[ 57%] Linking CXX static library libaccuracy_op.a
[ 57%] Linking CXX static library libmulti_gru_op.a
[ 57%] Linking CXX static library libauc_op.a
[ 57%] Linking CXX static library libadadelta_op.a
[ 57%] Linking CXX static library libfusion_gru_op.a
[ 57%] Linking CXX static library libfusion_lstm_op.a
[ 57%] Linking CXX static library libfused_elemwise_activation_op.a
[ 57%] Linking CXX static library libprecision_recall_op.a
[ 57%] Linking CXX static library libadagrad_op.a
[ 57%] Linking CXX static library libadam_op.a
[ 57%] Linking CXX static library libadamax_op.a
[ 57%] Linking CXX static library libadamw_op.a
[ 57%] Linking CXX static library libdecayed_adagrad_op.a
[ 57%] Linking CXX static library libdistributed_fused_lamb_init_op.a
[ 57%] Linking CXX static library libdgc_momentum_op.a
[ 57%] Linking CXX static library libdpsgd_op.a
[ 57%] Linking CXX static library libdistributed_fused_lamb_op.a
[ 57%] Linking CXX static library libftrl_op.a
[ 57%] Linking CXX static library libmerged_adam_op.a
[ 57%] Linking CXX static library liblars_momentum_op.a
[ 57%] Linking CXX static library libmerged_momentum_op.a
[ 57%] Linking CXX static library libpow2_decay_with_linear_warmup_op.a
[ 57%] Linking CXX static library libmomentum_op.a
[ 57%] Linking CXX static library libproximal_gd_op.a
[ 57%] Linking CXX static library libproximal_adagrad_op.a
[ 57%] Linking CXX static library librmsprop_op.a
[ 57%] Linking CXX static library liblogsumexp_op.a
[ 57%] Linking CXX static library libreduce_amax_op.a
[ 57%] Linking CXX static library libreduce_any_op.a
[ 57%] Linking CXX static library libsparse_momentum_op.a
[ 57%] Linking CXX static library libreduce_amin_op.a
[ 57%] Linking CXX static library libreduce_max_op.a
[ 57%] Linking CXX static library libreduce_min_op.a
[ 57%] Linking CXX static library libreduce_mean_op.a
[ 57%] Linking CXX static library libreduce_prod_op.a
[ 57%] Linking CXX static library libreduce_sum_op.a
[ 57%] Linking CXX static library libsequence_concat_op.a
[ 57%] Linking CXX static library libsequence_erase_op.a
[ 57%] Linking CXX static library libsequence_enumerate_op.a
[ 57%] Linking CXX static library libsequence_expand_as_op.a
[ 57%] Linking CXX static library libsequence_mask_op.a
[ 57%] Linking CXX static library libsequence_conv_op.a
[ 57%] Linking CXX static library libsequence_pad_op.a
[ 57%] Linking CXX static library libsequence_reshape_op.a
[ 57%] Linking CXX static library libsequence_reverse_op.a
[ 58%] Linking CXX static library libsequence_expand_op.a
[ 58%] Linking CXX static library libsequence_pool_op.a
[ 59%] Linking CXX static library libsequence_scatter_op.a
[ 59%] Linking CXX static library libsequence_slice_op.a
[ 59%] Linking CXX static library libsequence_unpad_op.a
[ 59%] Linking CXX static library libsequence_softmax_op.a
[ 59%] Linking CXX static library libadd_p_op.a
[ 59%] Linking CXX static library libsequence_topk_avg_pooling_op.a
[ 59%] Linking CXX static library libabs_p_op.a
[ 59%] Linking CXX static library libbroadcast_p_op.a
[ 59%] Linking CXX static library libbernoulli_p_op.a
[ 59%] Linking CXX static library libfaster_tokenizer_op.a
[ 59%] Linking CXX static library libcast_p_op.a
[ 60%] Linking CXX static library libconcat_p_op.a
[ 60%] Linking CXX static library libcos_p_op.a
[ 60%] Linking CXX static library libdiv_p_op.a
[ 60%] Linking CXX static library libeq_p_op.a
[ 60%] Linking CXX static library liberf_p_op.a
[ 60%] Linking CXX static library libexp_p_op.a
[ 60%] Linking CXX static library libge_p_op.a
[ 60%] Linking CXX static library libfill_constant_p_op.a
[ 60%] Linking CXX static library libgt_p_op.a
[ 60%] Linking CXX static library libgather_p_op.a
[ 60%] Linking CXX static library libmatmul_p_op.a
[ 60%] Linking CXX static library libmax_p_op.a
[ 60%] Linking CXX static library liblog_p_op.a
[ 60%] Linking CXX static library libmul_p_op.a
[ 61%] Linking CXX static library libne_p_op.a
[ 61%] Linking CXX static library libpow_p_op.a
[ 61%] Linking CXX static library librsqrt_p_op.a
[ 61%] Linking CXX static library libreshape_p_op.a
[ 61%] Linking CXX static library libreduce_sum_p_op.a
[ 61%] Linking CXX static library libsin_p_op.a
[ 61%] Linking CXX static library libscatter_add_p_op.a
[ 61%] Linking CXX static library libselect_p_op.a
[ 61%] Linking CXX static library libslice_assign_p_op.a
[ 61%] Linking CXX static library libslice_select_p_op.a
[ 61%] Linking CXX static library libsqrt_p_op.a
[ 61%] Linking CXX static library libsplit_p_op.a
[ 61%] Linking CXX static library libtanh_p_op.a
[ 61%] Linking CXX static library libtranspose_p_op.a
[ 61%] Linking CXX static library libsub_p_op.a
[ 61%] Linking CXX static library liballoc_float_status_op.a
[ 61%] Linking CXX static library libuniform_random_p_op.a
[ 61%] Linking CXX static library libclear_float_status_op.a
[ 61%] Linking CXX static library libcheck_finite_and_unscale_op.a
[ 61%] Linking CXX static library libget_float_status_op.a
[ 61%] Linking CXX static library libcreate_double_buffer_reader_op.a
[ 61%] Linking CXX static library libcreate_py_reader_op.a
[ 61%] Linking CXX static library libread_op.a
[ 61%] Linking CXX static library libpass.a
[ 61%] Linking CXX static library libfuse_all_reduce_op_pass.a
[ 61%] Linking CXX static library libbasic_engine.a
[ 61%] Linking CXX static library libmulti_devices_helper.a
[ 61%] Linking CXX static library libcoalesce_grad_tensor_pass.a
[ 61%] Linking CXX static library libfuse_optimizer_op_pass.a
[ 61%] Linking CXX static library libplacement_pass_base.a
[ 61%] Linking CXX static library libprogram_desc_tracer.a
[ 61%] Linking CXX static library libpass_builder.a
[ 61%] Linking CXX static library libadd_reader_dependency_pass.a
[ 61%] Linking CXX static library libwhile_op_eager_deletion_pass.a
[ 61%] Linking CXX static library libreference_count_pass.a
[ 61%] Linking CXX static library libgraph_pattern_detector.a
[ 61%] Linking CXX static library libengine.a
[ 61%] Linking CXX static library libsequential_execution_pass.a
[ 61%] Linking CXX static library libmulti_devices_graph_check_pass.a
[ 61%] Linking CXX static library libfix_op_run_order_pass.a
[ 61%] Linking CXX static library liball_reduce_deps_pass.a
[ 61%] Linking CXX static library libbackward_optimizer_op_deps_pass.a
[ 63%] Linking CXX static library libop_compat_sensible_pass.a
[ 63%] Linking CXX static library libmodify_op_lock_and_record_event_pass.a
[ 63%] Linking CXX static library libmulti_devices_graph_print_pass.a
[ 63%] Linking CXX static library libpass_test_util.a
[ 63%] Linking CXX static library libfuse_adamw_op_pass.a
[ 63%] Linking CXX static library libmulti_devices_graph_pass.a
[ 63%] Linking CXX static library libfuse_sgd_op_pass.a
[ 63%] Linking CXX static library libfuse_adam_op_pass.a
[ 63%] Linking CXX static library libfuse_momentum_op_pass.a
[ 63%] Linking CXX static library libfuse_bn_add_act_pass.a
[ 63%] Linking CXX static library libmemory_reuse_pass.a
[ 63%] Linking CXX static library libfuse_relu_depthwise_conv_pass.a
[ 63%] Linking CXX static library libfuse_bn_act_pass.a
[ 63%] Linking CXX static library libfuse_elewise_add_act_pass.a
[ 64%] Linking CXX static library libset_reader_device_info_utils.a
[ 64%] Linking CXX static library libfuse_gemm_epilogue_pass.a
[ 64%] Linking CXX static library libfuse_pass_base.a
[ 64%] Linking CXX static library libtracer.a
[ 64%] Linking CXX static library libgraph_to_program_pass.a
[ 64%] Linking CXX static library libfused_feedforward_pass.a
[ 64%] Linking CXX static library libinplace_addto_op_pass.a
[ 64%] Linking CXX static library libfused_attention_pass.a
[ 64%] Linking CXX static library libbuffer_shared_cross_op_memory_reuse_pass.a
[ 64%] Linking CXX static library libskip_layernorm_fuse_pass.a
[ 64%] Linking CXX static library libsqueeze2_transpose2_onednn_fuse_pass.a
[ 64%] Linking CXX static library liboperator_scale_onednn_fuse_pass.a
[ 64%] Linking CXX static library libmulti_gru_seq_fuse_pass.a
[ 64%] Linking CXX static library libtrt_delete_weight_dequant_linear_op_pass.a
[ 64%] Linking CXX static library libunsqueeze2_eltwise_fuse_pass.a
[ 64%] Linking CXX static library libconv_elementwise_add_act_fuse_pass.a
[ 64%] Linking CXX static library libdelete_quant_dequant_filter_op_pass.a
[ 65%] Linking CXX static library libsimplify_with_basic_ops_pass.a
[ 65%] Linking CXX static library liboperator_unsqueeze2_onednn_fuse_pass.a
[ 65%] Linking CXX static library libdelete_c_identity_op_pass.a
[ 65%] Linking CXX static library libadaptive_pool2d_convert_global_pass.a
[ 65%] Linking CXX static library libembedding_fc_lstm_fuse_pass.a
[ 65%] Linking CXX static library libmap_op_to_another_pass.a
[ 65%] Linking CXX static library libruntime_context_cache_pass.a
[ 65%] Linking CXX static library libsync_batch_norm_pass.a
[ 66%] Linking CXX static library libgraph_viz_pass.a
[ 66%] Linking CXX static library libmulti_batch_merge_pass.a
[ 66%] Linking CXX static library liblock_free_optimize_pass.a
[ 66%] Linking CXX static library libmkldnn_placement_pass.a
[ 66%] Linking CXX static library libdelete_dropout_op_pass.a
[ 66%] Linking CXX static library libdense_fc_to_sparse_pass.a
[ 66%] Linking CXX static library libadd_support_int8_pass.a
[ 66%] Linking CXX static library libfc_lstm_fuse_pass.a
[ 66%] Linking CXX static library libshuffle_channel_detect_pass.a
[ 66%] Linking CXX static library libgpu_cpu_map_matmul_to_mul_pass.a
[ 66%] Linking CXX static library libdepthwise_conv_mkldnn_pass.a
[ 66%] Linking CXX static library libcpu_quantize_squash_pass.a
[ 66%] Linking CXX static library libseqconv_eltadd_relu_fuse_pass.a
[ 66%] Linking CXX static library libconv_affine_channel_mkldnn_fuse_pass.a
[ 66%] Linking CXX static library libcpu_quantize_pass.a
[ 67%] Linking CXX static library libinplace_op_var_pass.a
[ 67%] Linking CXX static library libconv_elementwise_add2_act_fuse_pass.a
[ 67%] Linking CXX static library libmatmul_scale_fuse_pass.a
[ 67%] Linking CXX static library libbatch_norm_act_fuse_pass.a
[ 67%] Linking CXX static library libyolo_box_fuse_pass.a
[ 67%] Linking CXX static library libdense_multihead_matmul_to_sparse_pass.a
[ 67%] Linking CXX static library libdelete_fill_constant_op_pass.a
[ 67%] Linking CXX static library libfc_elementwise_layernorm_fuse_pass.a
[ 68%] Linking CXX static library libauto_mixed_precision_pass.a
[ 68%] Linking CXX static library libseqpool_concat_fuse_pass.a
[ 68%] Linking CXX static library libconv_bias_mkldnn_fuse_pass.a
[ 68%] Linking CXX static library libparams_quantization_mkldnn_pass.a
[ 68%] Linking CXX static library libmatmul_activation_mkldnn_fuse_pass.a
[ 68%] Linking CXX static library libfused_multi_transformer_encoder_pass.a
[ 68%] Linking CXX static library libglobal_utils.a
[ 68%] Linking CXX static library libattention_lstm_fuse_pass.a
[ 68%] Linking CXX static library libconv_bn_fuse_pass.a
[ 68%] Linking CXX static library libquant_dequant_mkldnn_pass.a
[ 68%] Linking CXX static library libquant_transpose2_dequant_onednn_fuse_pass.a
[ 68%] Linking CXX static library liblayer_norm_onednn_optimization_pass.a
[ 68%] Linking CXX static library libfc_fuse_pass.a
[ 68%] Linking CXX static library libshuffle_channel_mkldnn_detect_pass.a
[ 68%] Linking CXX static library libtranspose_flatten_concat_fuse_pass.a
[ 69%] Linking CXX static library libquant_conv2d_dequant_fuse_pass.a
[ 69%] Linking CXX static library libscale_matmul_fuse_pass.a
[ 69%] Linking CXX static library libsoftplus_activation_onednn_fuse_pass.a
[ 70%] Linking CXX static library libconv_elementwise_add_fuse_pass.a
[ 70%] Linking CXX static library libint8_scale_calculation_mkldnn_pass.a
[ 70%] Linking CXX static library libconstant_folding_pass.a
[ 70%] Linking CXX static library libseqpool_cvm_concat_fuse_pass.a
[ 71%] Linking CXX static library libfc_mkldnn_pass.a
[ 71%] Linking CXX static library libvit_attention_fuse_pass.a
[ 71%] Linking CXX static library libreshape_transpose_matmul_mkldnn_fuse_pass.a
[ 71%] Linking CXX static library libinterpolate_mkldnn_pass.a
[ 71%] Linking CXX static library librepeated_fc_relu_fuse_pass.a
[ 71%] Linking CXX static library libsilu_fuse_pass.a
[ 71%] Linking CXX static library libpreln_residual_bias_fuse_pass.a
[ 71%] Linking CXX static library libnaive_executor.a
[ 71%] Linking CXX static library libcpu_bfloat16_pass.a
[ 71%] Linking CXX static library liblayer_norm_fuse_pass.a
[ 71%] Linking CXX static library libdelete_concat_op_pass.a
[ 71%] Linking CXX static library libmatmul_elementwise_add_mkldnn_fuse_pass.a
[ 71%] Linking CXX static library libfused_multi_transformer_decoder_pass.a
[ 71%] Linking CXX static library libidentity_scale_op_clean_pass.a
[ 71%] Linking CXX static library libdelete_quant_dequant_op_pass.a
[ 71%] Linking CXX static library libdelete_quant_dequant_linear_op_pass.a
[ 71%] Linking CXX static library libseq_concat_fc_fuse_pass.a
[ 71%] Linking CXX static library libdelete_op_device_pass.a
[ 71%] Linking CXX static library libconv_activation_mkldnn_fuse_pass.a
[ 71%] Linking CXX static library libfuse_multi_transformer_layer_pass.a
[ 71%] Linking CXX static library libcpu_quantize_placement_pass.a
[ 71%] Linking CXX static library libis_test_pass.a
[ 71%] Linking CXX static library libdelete_weight_dequant_linear_op_pass.a
[ 71%] Linking CXX static library liboperator_reshape2_onednn_fuse_pass.a
[ 71%] Linking CXX static library libfc_elementwise_add_mkldnn_fuse_pass.a
[ 71%] Linking CXX static library libsquared_mat_sub_fuse_pass.a
[ 71%] Linking CXX static library libconv2d_fusion_layout_transfer_pass.a
[ 71%] Linking CXX static library libelt_act_mkldnn_fuse_pass.a
[ 71%] Linking CXX static library libconv_elementwise_add_mkldnn_fuse_pass.a
[ 71%] Linking CXX static library libmulti_gru_fuse_pass.a
[ 71%] Linking CXX static library libmatmul_transpose_reshape_mkldnn_fuse_pass.a
[ 71%] Linking CXX static library libfc_gru_fuse_pass.a
[ 71%] Linking CXX static library libcpu_bfloat16_placement_pass.a
[ 71%] Linking CXX static library libir_graph_to_program_pass.a
[ 71%] Linking CXX static library libadjust_cudnn_workspace_size_pass.a
[ 71%] Linking CXX static library libfc_act_mkldnn_fuse_pass.a
[ 72%] Linking CXX static library libinference_op_replace_pass.a
[ 72%] Linking CXX static library libmultihead_matmul_roformer_fuse_pass.a
[ 72%] Linking CXX static library libcompute_propagate_scales_mkldnn_pass.a
[ 72%] Linking CXX static library libmultihead_matmul_fuse_pass.a
[ 72%] Linking CXX static library libgenerate_pass.a
[ 72%] Linking CXX static library libbuild_strategy.a
[ 72%] Linking CXX static library libinterpreter.a
[ 72%] Linking CXX static library librecurrent_op_helper.a
[ 72%] Linking CXX static library libconditional_block_op_helper.a
[ 72%] Linking CXX static library libphi_bw_function_api.a
[ 72%] Linking CXX static library libphi_api.a
[ 72%] Linking CXX static library libconditional_block_op.a
[ 72%] Linking CXX static library libautograd_meta.a
[ 72%] Linking CXX static library libgrad_node_info.a
[ 72%] Linking CXX static library libscale_node.a
[ 72%] Linking CXX static library libgrad_tensor_holder.a
[ 72%] Linking CXX static library librecurrent_op.a
[ 72%] Linking CXX static library libcustom_operator.a
[ 72%] Linking CXX static library libaccumulation_node.a
[ 72%] Linking CXX static library libpy_layer_node.a
[ 72%] Linking CXX static library libeager_scale.a
[ 72%] Linking CXX static library libinterpretercore_garbage_collector.a
[ 72%] Linking CXX static library libpaddle_inference_api.a
[ 72%] Linking CXX static library libtensor_utils.a
[ 72%] Linking CXX static library libcustom_operator_node.a
[ 72%] Linking CXX static library libinfer_io_utils.a
[ 72%] Linking CXX static library libstandalone_executor.a
[ 73%] Linking CXX static library libexecutor.a
[ 73%] Linking CXX static library libexecutor_gc_helper.a
[ 74%] Linking CXX static library libfleet_executor.a
[ 74%] Linking CXX static library libpaddle_framework.a
[ 74%] Linking CXX static library libsubgraph_detector.a
[ 74%] Linking CXX static library libbuffer_shared_inplace_op_pass.a
[ 74%] Linking CXX static library libconditional_block_op_eager_deletion_pass.a
[ 74%] Linking CXX static library librecurrent_op_eager_deletion_pass.a
[ 74%] Linking CXX static library libarg_max_op.a
[ 74%] Linking CXX static library libarg_min_op.a
[ 74%] Linking CXX static library libadd_position_encoding_op.a
[ 74%] Linking CXX static library libabs_op.a
[ 74%] Linking CXX static library libaffine_grid_op.a
[ 74%] Linking CXX static library libaffine_channel_op.a
[ 74%] Linking CXX static library libarray_to_lod_tensor_op.a
[ 74%] Linking CXX static library libascend_trigger_op.a
[ 74%] Linking CXX static library libassign_pos_op.a
[ 75%] Linking CXX static library libassert_op.a
[ 75%] Linking CXX static library libassign_value_op.a
[ 75%] Linking CXX static library libaverage_accumulates_op.a
[ 75%] Linking CXX static library libbatch_fc_op.a
[ 75%] Linking CXX static library libattention_lstm_op.a
[ 75%] Linking CXX static library libbce_loss_op.a
[ 75%] Linking CXX static library libbeam_search_op.a
[ 75%] Linking CXX static library libbeam_search_decode_op.a
[ 75%] Linking CXX static library libbilinear_tensor_product_op.a
[ 75%] Linking CXX static library libbatch_norm_op.a
[ 75%] Linking CXX static library libbilateral_slice_op.a
[ 75%] Linking CXX static library libbincount_op.a
[ 75%] Linking CXX static library libbpr_loss_op.a
[ 75%] Linking CXX static library libcheck_memory_continue_op.a
[ 75%] Linking CXX static library libcast_op.a
[ 75%] Linking CXX static library libchannel_shuffle_op.a
[ 75%] Linking CXX static library libclip_by_norm_op.a
[ 76%] Linking CXX static library libclass_center_sample_op.a
[ 76%] Linking CXX static library libchunk_eval_op.a
[ 76%] Linking CXX static library libcenter_loss_op.a
[ 76%] Linking CXX static library libcoalesce_tensor_op.a
[ 76%] Linking CXX static library libconcat_op.a
[ 76%] Linking CXX static library libcopy_cross_scope_op.a
[ 76%] Linking CXX static library libconv_shift_op.a
[ 76%] Linking CXX static library libcrf_decoding_op.a
[ 76%] Linking CXX static library libcorrelation_op.a
[ 76%] Linking CXX static library libcos_sim_op.a
[ 76%] Linking CXX static library libconv_op.a
[ 76%] Linking CXX static library libctc_align_op.a
[ 76%] Linking CXX static library libconv_transpose_op.a
[ 76%] Linking CXX static library libcrop_op.a
[ 76%] Linking CXX static library libcross_entropy_op.a
[ 76%] Linking CXX static library libdecode_jpeg_op.a
[ 76%] Linking CXX static library libcudnn_lstm_op.a
[ 76%] Linking CXX static library libcvm_op.a
[ 76%] Linking CXX static library libdelete_var_op.a
[ 76%] Linking CXX static library libcum_op.a
[ 76%] Linking CXX static library libdeformable_conv_op.a
[ 77%] Linking CXX static library libdequantize_abs_max_op.a
[ 77%] Linking CXX static library libdeformable_conv_v1_op.a
[ 77%] Linking CXX static library libdequantize_log_op.a
[ 77%] Linking CXX static library libdeformable_psroi_pooling_op.a
[ 77%] Linking CXX static library libdata_norm_op.a
[ 77%] Linking CXX static library libdequeue_op.a
[ 77%] Linking CXX static library libdequantize_op.a
[ 77%] Linking CXX static library libdiag_op.a
[ 77%] Linking CXX static library libdgc_clip_by_norm_op.a
[ 77%] Linking CXX static library libedit_distance_op.a
[ 78%] Linking CXX static library libempty_op.a
[ 78%] Linking CXX static library libenqueue_op.a
[ 78%] Linking CXX static library libeinsum_op.a
[ 78%] Linking CXX static library libdropout_op.a
[ 78%] Linking CXX static library libeigvalsh_op.a
[ 78%] Linking CXX static library libdetection_map_op.a
[ 78%] Linking CXX static library libexpand_as_v2_op.a
[ 78%] Linking CXX static library libexpand_as_op.a
[ 78%] Linking CXX static library libexpand_v2_op.a
[ 78%] Linking CXX static library libexponential_op.a
[ 78%] Linking CXX static library libfill_any_like_op.a
[ 78%] Linking CXX static library libfake_dequantize_op.a
[ 78%] Linking CXX static library libexpand_op.a
[ 78%] Linking CXX static library libfake_quantize_op.a
[ 78%] Linking CXX static library libfill_any_op.a
[ 78%] Linking CXX static library libfill_constant_batch_size_like_op.a
[ 79%] Linking CXX static library libfc_op.a
[ 80%] Linking CXX static library libfill_zeros_like_op.a
[ 80%] Linking CXX static library libfill_op.a
[ 80%] Linking CXX static library libfill_constant_op.a
[ 80%] Linking CXX static library libfilter_by_instag_op.a
[ 80%] Linking CXX static library libfused_softmax_mask_op.a
[ 80%] Linking CXX static library libfused_softmax_mask_upper_triangle_op.a
[ 80%] Linking CXX static library libfsp_op.a
[ 80%] Linking CXX static library libfused_token_prune_op.a
[ 81%] Linking CXX static library libflatten_op.a
[ 81%] Linking CXX static library libgaussian_random_batch_size_like_op.a
[ 81%] Linking CXX static library libgather_op.a
[ 81%] Linking CXX static library libgaussian_random_op.a
[ 81%] Linking CXX static library libget_tensor_from_selected_rows_op.a
[ 81%] Linking CXX static library libgraph_reindex_op.a
[ 81%] Linking CXX static library libgenerated_fused_op.a
[ 81%] Linking CXX static library libgraph_sample_neighbors_op.a
[ 81%] Linking CXX static library libgraph_khop_sampler_op.a
[ 81%] Linking CXX static library libgroup_norm_op.a
[ 81%] Linking CXX static library libgenerated_static_op.a
[ 81%] Linking CXX static library libhash_op.a
[ 81%] Linking CXX static library libgru_op.a
[ 82%] Linking CXX static library libhinge_loss_op.a
[ 82%] Linking CXX static library libidentity_loss_op.a
[ 82%] Linking CXX static library libhierarchical_sigmoid_op.a
[ 82%] Linking CXX static library libim2sequence_op.a
[ 82%] Linking CXX static library libincrement_op.a
[ 82%] Linking CXX static library libisfinite_op.a
[ 82%] Linking CXX static library libinstance_norm_op.a
[ 82%] Linking CXX static library libgru_unit_op.a
[ 82%] Linking CXX static library libl1_norm_op.a
[ 82%] Linking CXX static library libinplace_abn_op.a
[ 82%] Linking CXX static library libgenerated_sparse_op.a
[ 82%] Linking CXX static library liblayer_norm_op.a
[ 82%] Linking CXX static library liblimit_by_capacity_op.a
[ 82%] Linking CXX static library libinterpolate_op.a
[ 82%] Linking CXX static library liblod_array_length_op.a
[ 82%] Linking CXX static library liblod_rank_table_op.a
[ 82%] Linking CXX static library libload_op.a
[ 83%] Linking CXX static library liblogspace_op.a
[ 83%] Linking CXX static library liblod_tensor_to_array_op.a
[ 83%] Linking CXX static library liblinear_chain_crf_op.a
[ 83%] Linking CXX static library liblod_reset_op.a
[ 83%] Linking CXX static library libmargin_cross_entropy_op.a
[ 83%] Linking CXX static library liblookup_table_dequant_op.a
[ 83%] Linking CXX static library liblstm_unit_op.a
[ 83%] Linking CXX static library liblookup_table_v2_op.a
[ 84%] Linking CXX static library liblstsq_op.a
[ 84%] Linking CXX static library liblookup_table_op.a
[ 84%] Linking CXX static library liblrn_op.a
[ 84%] Linking CXX static library libmarker_op.a
[ 84%] Linking CXX static library liblu_op.a
[ 85%] Linking CXX static library liblstmp_op.a
[ 85%] Linking CXX static library libmargin_rank_loss_op.a
[ 85%] Linking CXX static library libmax_sequence_len_op.a
[ 85%] Linking CXX static library libmatch_matrix_tensor_op.a
[ 85%] Linking CXX static library libmatrix_rank_op.a
[ 86%] Linking CXX static library libmean_iou_op.a
[ 86%] Linking CXX static library libmean_op.a
[ 86%] Linking CXX static library libmemcpy_d2h_op.a
[ 86%] Linking CXX static library libmemcpy_h2d_op.a
[ 86%] Linking CXX static library libmatmul_v2_op.a
[ 86%] Linking CXX static library libmatmul_op.a
[ 86%] Linking CXX static library libmemcpy_op.a
[ 86%] Linking CXX static library libmoe_op.a
[ 86%] Linking CXX static library libminus_op.a
[ 86%] Linking CXX static library libmerge_lod_tensor_op.a
[ 86%] Linking CXX static library libnop_op.a
[ 86%] Linking CXX static library libmodified_huber_loss_op.a
[ 86%] Linking CXX static library libnumber_count_op.a
[ 86%] Linking CXX static library libmul_op.a
[ 86%] Linking CXX static library libnorm_op.a
[ 86%] Linking CXX static library libnce_op.a
[ 86%] Linking CXX static library libone_hot_op.a
[ 87%] Linking CXX static library libpad_op.a
[ 87%] Linking CXX static library libpad2d_op.a
[ 87%] Linking CXX static library libpad_constant_like_op.a
[ 88%] Linking CXX static library libpartial_concat_op.a
[ 88%] Linking CXX static library libpad3d_op.a
[ 88%] Linking CXX static library libpixel_unshuffle_op.a
[ 88%] Linking CXX static library libpartial_sum_op.a
[ 88%] Linking CXX static library libpool_op.a
[ 88%] Linking CXX static library libpositive_negative_pair_op.a
[ 88%] Linking CXX static library libpool_with_index_op.a
[ 89%] Linking CXX static library libprune_gate_by_capacity_op.a
[ 89%] Linking CXX static library libprint_op.a
[ 89%] Linking CXX static library libpsroi_pool_op.a
[ 89%] Linking CXX static library libpull_box_extended_sparse_op.a
[ 89%] Linking CXX static library libprroi_pool_op.a
[ 89%] Linking CXX static library libpull_gpups_sparse_op.a
[ 89%] Linking CXX static library libpull_box_sparse_op.a
[ 89%] Linking CXX static library libpush_dense_op.a
[ 89%] Linking CXX static library libpull_sparse_v2_op.a
[ 89%] Linking CXX static library libpull_sparse_op.a
[ 89%] Linking CXX static library libquantize_op.a
[ 89%] Linking CXX static library libqueue_generator_op.a
[ 89%] Linking CXX static library librandint_op.a
[ 89%] Linking CXX static library libpyramid_hash_op.a
[ 89%] Linking CXX static library librandom_crop_op.a
[ 89%] Linking CXX static library librandom_routing_op.a
[ 89%] Linking CXX static library librandperm_op.a
[ 89%] Linking CXX static library libread_file_op.a
[ 89%] Linking CXX static library librank_loss_op.a
[ 89%] Linking CXX static library librank_attention_op.a
[ 89%] Linking CXX static library libreorder_lod_tensor_by_rank_op.a
[ 89%] Linking CXX static library librequantize_op.a
[ 89%] Linking CXX static library librepeat_interleave_op.a
[ 89%] Linking CXX static library libreverse_op.a
[ 89%] Linking CXX static library librnn_memory_helper_op.a
[ 89%] Linking CXX static library libroi_pool_op.a
[ 89%] Linking CXX static library libreshape_op.a
[ 89%] Linking CXX static library librnn_op.a
[ 89%] Linking CXX static library librow_conv_op.a
[ 90%] Linking CXX static library libroi_align_op.a
[ 90%] Linking CXX static library libsampling_id_op.a
[ 90%] Linking CXX static library librrelu_op.a
[ 90%] Linking CXX static library libsample_logits_op.a
[ 90%] Linking CXX static library libsave_op.a
[ 90%] Linking CXX static library libseed_op.a
[ 90%] Linking CXX static library libselect_input_op.a
[ 91%] Linking CXX static library libselect_output_op.a
[ 91%] Linking CXX static library libshape_op.a
[ 91%] Linking CXX static library libshare_data_op.a
[ 91%] Linking CXX static library libsegment_pool_op.a
[ 91%] Linking CXX static library libshrink_rnn_memory_op.a
[ 91%] Linking CXX static library libshuffle_batch_op.a
[ 91%] Linking CXX static library libset_value_op.a
[ 91%] Linking CXX static library libshuffle_channel_op.a
[ 91%] Linking CXX static library libsimilarity_focus_op.a
[ 91%] Linking CXX static library libsparse_manual_op.a
[ 92%] Linking CXX static library libsoftmax_with_cross_entropy_op.a
[ 92%] Linking CXX static library libslice_op.a
[ 92%] Linking CXX static library libsmooth_l1_loss_op.a
[ 92%] Linking CXX static library libsoftmax_op.a
[ 92%] Linking CXX static library libsplit_lod_tensor_op.a
[ 92%] Linking CXX static library libspace_to_depth_op.a
[ 92%] Linking CXX static library libspp_op.a
[ 92%] Linking CXX static library libsplit_op.a
[ 92%] Linking CXX static library libsquared_l2_distance_op.a
[ 92%] Linking CXX static library libsquared_l2_norm_op.a
[ 92%] Linking CXX static library libstrided_slice_op.a
[ 92%] Linking CXX static library libsum_op.a
[ 92%] Linking CXX static library libtdm_child_op.a
[ 93%] Linking CXX static library libsqueeze_op.a
[ 93%] Linking CXX static library libstft_op.a
[ 93%] Linking CXX static library libteacher_student_sigmoid_loss_op.a
[ 93%] Linking CXX static library libtemporal_shift_op.a
[ 93%] Linking CXX static library libtensor_array_to_tensor_op.a
[ 93%] Linking CXX static library libtile_op.a
[ 93%] Linking CXX static library libtransfer_layout_op.a
[ 93%] Linking CXX static library libtdm_sampler_op.a
[ 93%] Linking CXX static library libtop_k_op.a
[ 93%] Linking CXX static library libtranspose_op.a
[ 93%] Linking CXX static library libtril_indices_op.a
[ 93%] Linking CXX static library libtriangular_solve_op.a
[ 93%] Linking CXX static library libtree_conv_op.a
[ 93%] Linking CXX static library libtriu_indices_op.a
[ 93%] Linking CXX static library libtril_triu_op.a
[ 93%] Linking CXX static library libuniform_random_batch_size_like_op.a
[ 93%] Linking CXX static library libtruncated_gaussian_random_op.a
[ 93%] Linking CXX static library libuniform_random_inplace_op.a
[ 93%] Linking CXX static library libuniform_random_op.a
[ 94%] Linking CXX static library libunique_op.a
[ 94%] Linking CXX static library libunzip_op.a
[ 94%] Linking CXX static library libunpool_op.a
[ 94%] Linking CXX static library libtensor_formatter.a
[ 94%] Linking CXX static library libunique_with_counts_op.a
[ 94%] Linking CXX static library libvar_conv_2d_op.a
[ 94%] Linking CXX static library libunsqueeze_op.a
[ 94%] Linking CXX static library libactivation_op.a
[ 94%] Linking CXX static library libdepend_op.a
[ 94%] Linking CXX static library libconditional_block_infer_op.a
[ 94%] Linking CXX static library liblstm_op.a
[ 94%] Linking CXX static library libfetch_op.a
[ 94%] Linking CXX static library libfeed_op.a
[ 94%] Linking CXX static library libget_places_op.a
[ 94%] Linking CXX static library libfetch_v2_op.a
[ 94%] Linking CXX static library liblogical_op.a
[ 94%] Linking CXX static library libjit_interpreter_engine.a
[ 94%] Linking CXX static library libtensor_array_read_write_op.a
[ 94%] Linking CXX static library libsubgraph_util.a
[ 95%] Linking CXX static library libcost_model.a
[ 95%] Linking CXX static library libwhile_op.a
[ 95%] Linking CXX static library libeager_deletion_pass.a
[ 95%] Linking CXX static library libssa_graph_executor.a
[ 95%] Linking CXX static library libgenerated_op.a
[ 95%] Linking CXX static library libbind_threaded_ssa_graph_executor.a
[ 95%] Linking CXX static library libthreaded_ssa_graph_executor.a
[ 95%] Linking CXX static library libscope_buffered_ssa_graph_executor.a
[ 95%] Linking CXX static library libfast_threaded_ssa_graph_executor.a
[ 95%] Linking CXX static library libhook_utils.a
[ 95%] Linking CXX static library libasync_ssa_graph_executor.a
[ 95%] Linking CXX static library libparallel_ssa_graph_executor.a
[ 95%] Linking CXX static library libutils.a
[ 95%] Linking CXX static library libparallel_executor.a
[ 95%] Linking CXX static library libeager_api.a
[ 95%] Linking CXX static library libbackward.a
[ 95%] Linking CXX static library libexecutor_cache.a
[ 95%] Linking CXX static library libjit_compilation_unit.a
[ 95%] Linking CXX static library libeager_reducer.a
[ 95%] Linking CXX static library libgenerated_eager_prim_api.a
[ 95%] Linking CXX static library librun_program_op.a
[ 95%] Linking CXX static library libeager_utils.a
[ 95%] Linking CXX static library libpaddle_inference_io.a
[ 95%] Linking CXX static library libmanual_eager_prim_api.a
[ 95%] Linking CXX static library libeager_prim_api.a
[ 95%] Linking CXX static library libanalysis_helper.a
[ 96%] Linking CXX static library libir_pass_manager.a
[ 96%] Linking CXX static library libir_params_sync_among_devices_pass.a
[ 96%] Linking CXX static library libir_graph_build_pass.a
[ 96%] Linking CXX static library libir_analysis_pass.a
[ 96%] Linking CXX static library libconvert_to_mixed_precision.a
[ 96%] Linking CXX static library libanalysis_passes.a
[ 96%] Linking CXX static library libanalysis.a
[ 97%] Linking CXX static library libdygraph_node.a
[ 97%] Linking CXX static library libanalysis_predictor.a
[ 97%] Linking CXX static library libdygraph_function.a
[ 97%] Linking CXX static library libjit_predictor_engine.a
[ 97%] Linking CXX static library libjit_function.a
[ 97%] Linking CXX static library libjit_layer.a
[ 97%] Linking CXX static library libfinal_dygraph_function.a
[ 97%] Linking CXX static library libfinal_dygraph_node.a
[ 97%] Linking CXX static library libprim_api.a
[ 97%] Linking CXX static library libeager_tensor_operants.a
[ 98%] Linking CXX static library libperformance_benchmark_utils.a
[ 98%] Linking CXX static library libpaddle_inference.a
[ 99%] Linking CXX static library libpaddle_inference_c.a