【Hackathon No.36】优化 lerp_grad op 在 GPU 上的计算性能 #45946

Rayman96 · 2022-09-10T09:45:40Z

PR types

Performance optimization

PR changes

OPs

Describe

实验开发环境：
硬件：Tesla-P4
软件环境：CUDA 11.2，CuDNN 8

Case No.	input_shape	data_type	Paddle_modify Perf(us)	Perf_over_paddle_origin(%)	Perf_over_pytorch(%)
1	[16, 102400]	float32	134.98	65.56%	41.04% ｜
2	[16, 204800]	float64	543.17	37.96%	42.55%

代码逻辑简介：
对于cpu端，直接复制原本的逻辑，可以正常运行。
对于GPU端，需要考虑X，Y，W三者broadcoat的情况。
1. 真正计算过程中只涉及W和Dout，如果w是单个值（大部分情况），可以不对weight进行broadcast，直接与dout相乘；
2. 如果weight是矩阵，需要先broadcast到dout的维度然后对应相乘。
3. 对于X，Y维度不同时，如果是类似（x: 1x2x3, y:2x2x3）这样只需要broadcast的维度在起始维度，那么可以不用将x拓展到y的维度。
4. 如果是类似（x:2x1x3, y:2x2x3）这样需要拓展的维度在中间的话，就需要先将x拓展进行计算，计算完成后在reduce回自身维度。

4的计算时间与原始计算步骤相当，从1～4的计算过程逐渐增大。所以情况1的性能优化应该最好，4的最差

paddle-bot · 2022-09-10T09:46:06Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

paddle-bot · 2022-09-10T09:46:09Z

✅ This PR's description meets the template requirements!
Please wait for other CI results.

Rayman96 · 2022-09-15T01:17:55Z

@Ligoml 请问下目前有两个失败的流水线检查，CE-Framework中显示test_quantile.py存在bug，这个好像测试的不是我修改的算子。
CI-Coverage里显示覆盖不足，这里的覆盖是使用的paddle/python/paddle/fluid/tests/unittests/test_lerp_op.py这个单测文件吗？原有的单测里确实覆盖不全，是否需要我修改这个单测文件。

Rayman96 · 2022-09-15T04:11:52Z

注释掉了一个新增的单测

这个单测结果是正确的，print出grad值都是一致的，但是Optest里的check_grad会判定diff大于1e-7, 好像是assertLessEqual判定的错误

Rayman96 · 2022-09-20T05:16:55Z

麻烦请教各位大佬，现在CE不通过的错误是什么原因呢？
另外CI-Coverage失败是说CPU端代码的问题，CPU端我是直接拷贝的原有实现应该也没有问题

ZzSean · 2022-09-28T08:18:21Z

建议不在.cc文件里增加代码，还是使用impl.h就可以，这样应该不会报覆盖率的问题

Rayman96 · 2022-09-28T08:23:18Z

建议不在.cc文件里增加代码，还是使用impl.h就可以，这样应该不会报覆盖率的问题

好的我试试

Rayman96 · 2022-09-28T08:59:05Z

@ZzSean 请问CI-build里提示refusing to merge unrelated histories是什么原因呢？现在只保留了两条commit

update: 我重跑CI-Clone后不提示上边的错了，但是build又提示docker md5 check failed😂

CLAassistant · 2022-09-29T01:39:53Z

All committers have signed the CLA.

luotao1 · 2022-09-29T02:20:04Z

请问CI-build里提示refusing to merge unrelated histories是什么原因呢？现在只保留了两条commit

和commit数量无关，需要merge下最新的develop分支

Rayman96 · 2022-09-29T02:48:32Z

请问CI-build里提示refusing to merge unrelated histories是什么原因呢？现在只保留了两条commit

和commit数量无关，需要merge下最新的develop分支

好的谢谢应该可以了

paddle/phi/kernels/gpu/lerp_grad_kernel.cu

python/paddle/fluid/tests/unittests/test_lerp_op.py

paddle/phi/kernels/gpu/lerp_grad_kernel.cu

ZzSean · 2022-09-29T12:07:38Z

paddle/phi/kernels/gpu/lerp_grad_kernel.cu

+  // x:1*2*3, y:2*2*3，无需reduce。
+
+  int reduce_flag = XYNeedReduce(x, y, out);
+  if (reduce_flag == 0) {


有个建议，这里是不是可以不用区分这么多种情况
对于dy来说，有2个运算，1. weight与out_grad的广播乘，设结果为tmp；2. 将tmp reduce到y的维度大小
对于dx来说，同样的，1. 上述的第一步中的结果tmp作为输入，做一次scale运算，a*tmp+b，其中a=-1，b=1；2. 再将结果reduce到x的维度大小
以上的broadcast与reduce都有封装好的kernel可以直接调用，这样可以使整个代码结构更加清晰，且很多条件判断可以省略

有个建议，这里是不是可以不用区分这么多种情况对于dy来说，有2个运算，1. weight与out_grad的广播乘，设结果为tmp；2. 将tmp reduce到y的维度大小对于dx来说，同样的，1. 上述的第一步中的结果tmp作为输入，做一次scale运算，a*tmp+b，其中a=-1，b=1；2. 再将结果reduce到x的维度大小以上的broadcast与reduce都有封装好的kernel可以直接调用，这样可以使整个代码结构更加清晰，且很多条件判断可以省略

目前的计算逻辑其实和这个是比较类似的，做了情况的区分是因为在一些dx或者dy不需要进行reduce的情况下（有些维度不一致的情况下也不需要reduce），可以省略这一步以提高速度，通过reduce_flag进行控制。看起来分的情况多是逻辑上的判断写的比较复杂，最终其实只有是否需要reduce两种。

ZzSean · 2022-09-30T03:02:10Z

paddle/phi/kernels/gpu/lerp_grad_kernel.cu

+  auto* dy = y_grad;
+  DDim dx_dims;
+  DDim dy_dims;
+  Eigen::DSizes<int, D * 2> dx_reshape_dims;


这里reduce能否使用Paddle内的ReduceKernel，性能应该会比Eigen好，且可以不用再封装GetRduceResult这个函数。可以参考下ElementwiseGrad这类算子

这里reduce能否使用Paddle内的ReduceKernel，性能应该会比Eigen好，且可以不用再封装GetRduceResult这个函数。可以参考下ElementwiseGrad这类算子

好的我参考测试下使用paddle的reduceKernel，之前没有找到对应算子的名字就自己封装了

ZzSean · 2022-09-30T03:03:03Z

paddle/phi/kernels/gpu/lerp_grad_kernel.cu

+  int out_rank = out_dims.size();
+  int smaller_rank = std::min(x_rank, y_rank);
+  if (std::max(x_rank, y_rank) < out_rank) {
+    return 1;


1->true

ZzSean · 2022-09-30T03:03:15Z

paddle/phi/kernels/gpu/lerp_grad_kernel.cu

+    int y_idx = y_rank - i;
+    int out_idx = out_rank - i;
+    if (x_dims[x_idx] != y_dims[y_idx]) {
+      return 1;


ZzSean · 2022-09-30T03:07:39Z

paddle/phi/kernels/gpu/lerp_grad_kernel.cu

+namespace phi {
+
+template <typename T>
+__global__ void GetLerpGrad(const T* weight,


Kernel名字还是改下吧，改成LerpGradKernelImpl好一些

ZzSean · 2022-09-30T03:07:47Z

paddle/phi/kernels/gpu/lerp_grad_kernel.cu

+}
+
+template <typename T>
+__global__ void GetLerpGradRankZero(const T* weight,


LerpGradScalarKernelImpl

ZzSean · 2022-09-30T03:09:34Z

paddle/phi/kernels/gpu/lerp_grad_kernel.cu

+                                    const int x_size,
+                                    const int y_size) {
+  CUDA_KERNEL_LOOP_TYPE(idx, out_size, int64_t) {
+    T temp_dx = weight[0] * dout[idx];


当weight为scalar的时候，这里可以把weight放入寄存器中，不参与循环，减少重复读

好的我修改下

Xreki · 2022-10-08T05:00:38Z

paddle/phi/kernels/gpu/lerp_grad_kernel.cu

@@ -1,4 +1,4 @@
-// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.


该文件并不是这个PR新增，不要改copyright年份。

该文件并不是这个PR新增，不要改copyright年份。

done

Xreki · 2022-10-08T05:05:42Z

paddle/phi/kernels/gpu/lerp_grad_kernel.cu

+namespace phi {
+
+template <typename T>
+__global__ void LerpGradKernelImpl(const T* weight,


该计算似乎可以调用ElementwiseKernel实现，多输出的用法可参考：

Paddle/paddle/phi/kernels/gpu/sigmoid_cross_entropy_with_logits_kernel.cu

Lines 75 to 76 in 1974683

phi::funcs::ElementwiseKernel<T, decltype(functor), 2>(

dev_ctx, ins, &outs, functor);

该计算似乎可以调用ElementwiseKernel实现，多输出的用法可参考：

Paddle/paddle/phi/kernels/gpu/sigmoid_cross_entropy_with_logits_kernel.cu

Lines 75 to 76 in 1974683

phi::funcs::ElementwiseKernel<T, decltype(functor), 2>(

dev_ctx, ins, &outs, functor);

之前在做另一个算子优化时测试使用elementwisekernel速度没有直接实现的快，这里我测试一下看哪个速度更好

这个PR修改成当前实现也OK，可优先修复一下CI问题：

好的已修复，再观察下CI

Xreki · 2022-10-08T05:06:27Z

paddle/phi/kernels/gpu/lerp_grad_kernel.cu

+}
+
+template <typename T>
+__global__ void LerpGradScalarKernelImpl(const T* weight,


Xreki · 2022-10-08T05:08:23Z

paddle/phi/kernels/gpu/lerp_grad_kernel.cu

+    DenseTensor b_out = phi::EmptyLike<T>(ctx, out_grad);
+    std::vector<DenseTensor*> out_tensors = {&b_weight, &b_out};
+
+    phi::BroadcastTensorsKernel<T, Context>(ctx, in_tensors, out_tensors);


此处也可以调用BroadcastKernel，性能应该会好一些。

此处爷可以调用BroadcastKernel，性能应该会好一些。

好的我测试一下

这个PR修改成当前实现也OK，可优先修复一下CI问题

这个PR修改成当前实现也OK，可优先修复一下CI问题

已修复，CI测试基本都已通过。又一个Kunlun-kp-build提示失败，我重启流水线提示没有权限，麻烦大佬帮忙重启下可以吗？

这个CI不是required，由于网络原因失败，我咨询过了，可以不用管这个CI。

AnnaTrainingG · 2022-10-09T02:49:03Z

LGTM for PR-CI-Kunlun-KP-Build

Xreki · 2022-10-09T03:05:15Z

paddle/phi/kernels/gpu/lerp_grad_kernel.cu

+                  const int y_grad_size,
+                  T* x_grad_data,
+                  T* y_grad_data) {
+  if (weight.dims().size() == 1) {


这个if里面的判断条件不对？应该用weight.numel() == 1？我看了下，单测里面没有w是1-D且shape不是[1]的测试。

这个if里面的判断条件不对？应该用weight.numel() == 1？我看了下，单测里面没有w是1-D且shape不是[1]的测试。

是的，bug已修复，在单测中也将TestLerpBroadWToXY修改为1-D且shape不是[1]，同时测试broadcast和此处提到的情况

Xreki · 2022-10-09T03:05:48Z

paddle/phi/kernels/gpu/lerp_grad_kernel.cu

+  if (weight.dims().size() == 1) {
+    //    condition when weight is a scalar
+    const T* weight_data = weight.data<T>();
+    // const T weight_scalar = weight_data[0];


无用的注释可以删除

无用的注释可以删除

done

Xreki · 2022-10-09T03:06:59Z

paddle/phi/kernels/gpu/lerp_grad_kernel.cu

+    // const T weight_scalar = weight_data[0];
+    const T* out_grad_data = out_grad.data<T>();
+    const int out_size = out_grad.numel();
+    const int weight_size = weight.numel();


这里建议使用int64_t类型

这里建议使用int64_t类型

done

Xreki · 2022-10-09T03:08:38Z

paddle/phi/kernels/gpu/lerp_grad_kernel.cu

+                                   const T* dout,
+                                   T* dx,
+                                   T* dy,
+                                   const int out_size,


调这个kernel的要求是x、y、weight的size都一样？那参数是不是只用传一个numel就行？

这里的x和y的大小可能是不一致的，例如x:[2,3,4], y:[2,2,3,4]，只有axis0不同的情况下是不需要进行broadcast到同样维度的，但是需要在kernel里对x和y计算的数据量进行区分

Xreki · 2022-10-09T03:12:15Z

paddle/phi/kernels/gpu/lerp_grad_kernel.cu

+    DenseTensor b_out = phi::EmptyLike<T>(ctx, out_grad);
+    std::vector<DenseTensor*> out_tensors = {&b_weight, &b_out};
+
+    phi::BroadcastTensorsKernel<T, Context>(ctx, in_tensors, out_tensors);


这个CI不是required，由于网络原因失败，我咨询过了，可以不用管这个CI。

Xreki · 2022-10-09T03:15:29Z

paddle/phi/kernels/gpu/lerp_grad_kernel.cu

+    //    broadcast weight with out_grad's dimensions
+    const std::vector<const DenseTensor*> in_tensors = {&weight, &out_grad};
+    DenseTensor b_weight = phi::EmptyLike<T>(ctx, out_grad);
+    DenseTensor b_out = phi::EmptyLike<T>(ctx, out_grad);


我理解，out应该永远都不需要broadcast？

我理解，out应该永远都不需要broadcast？

是的 out是不需要broadcast 的，这里做这步操作是为了使用BroadcastTensorsKernel将weight进行broadcast，由于out_grad本身是const的无法作为入参，所以加了这一步操作

Xreki · 2022-10-09T06:21:21Z

原OP Benchmark里面的2个lerp配置不够典型，故PaddlePaddle/benchmark#1530 将第2个double类型的配置改成了一个实际模型中用到且性能很差的配置。另外PR中还贴了模型中几个其他的配置的性能，可供参考。

OP Benchmark CI性能如下：

配置一：前向+反向总GPU时间加速比为1.77x，反向GPU时间加速比为2.26x。
配置一：pr优化后nvprof结果：
配置二：前向+反向总GPU时间加速比为686x，反向GPU时间加速比为1597x。
配置二：pr优化后nvprof结果：

优化效果汇总如下表，符合黑客松的验收要求：

若您还有兴趣做进一步性能优化，可以考虑以下几个方面：

目前没有w.shape不是[1]的配置的性能，L123 - L146 BroadcastTensorsKernel + LerpGradKernelImpl可以考虑使用一个BroadcastKernel调用来替换。
nvprof中存在[CUDA memcpy DtoD]，应该是来自反向，可以排查下来源，看是否可以避免。

Xreki

性能优化效果已验收，见#45946 (comment) ，辛苦再加一下x_grad、y_grad可能为nullptr的逻辑。

Xreki · 2022-10-09T06:24:36Z

paddle/phi/kernels/gpu/lerp_grad_kernel.cu

+  //  check if x_grad and y_grad need to be reduced
+  //  if x has a different dimension with y or weight in the middle axis, then
+  //  they need to be broadcast and then reduced.
+  bool reduce_flag = XYNeedReduce(x, y, out);


Paddle/paddle/phi/kernels/impl/lerp_grad_kernel_impl.h

Lines 45 to 56 in 9a849a3

if (dx) {

dx_dims = phi::funcs::ExtendDims2Rank(dx->dims(), D);

phi::funcs::GetBroadcastDims<D>(dx_dims, dout_dims, &dx_bcast_dims);

}

if (dy) {

dy_dims = phi::funcs::ExtendDims2Rank(dy->dims(), D);

phi::funcs::GetBroadcastDims<D>(dy_dims, dout_dims, &dy_bcast_dims);

}

phi::funcs::GetBroadcastDims<D>(w_dims, dout_dims, &w_bcast_dims);

auto eigen_w = phi::EigenTensor<T, D>::From(w, w_dims);

auto eigen_dout = phi::EigenTensor<T, D>::From(dout);

注意到develop实现代码中，存在对dx、dy是否为空的判断。Paddle的任何算子，当存在多个梯度输出时，所有的梯度变量都允许为nullptr，所以还需要加一下相关的逻辑。

好的，我加一下再提交

Xreki · 2022-10-09T10:12:12Z

paddle/phi/kernels/gpu/lerp_grad_kernel.cu

+  if (!reduce_flag) {
+    int x_grad_size = 0, y_grad_size = 0;
+    T* x_grad_data;
+    T* y_grad_data;


这里最好把指针初始化为nullptr吧

Xreki · 2022-10-09T10:12:23Z

paddle/phi/kernels/gpu/lerp_grad_kernel.cu

+    DenseTensor b_xgrad = phi::EmptyLike<T, Context>(ctx, out_grad);
+    DenseTensor b_ygrad = phi::EmptyLike<T, Context>(ctx, out_grad);
+    T* x_grad_data;
+    T* y_grad_data;


这里最好把指针初始化为nullptr吧

已通过全部CI测试

Xreki

LGTM. Good work~

Rayman96 · 2022-10-10T01:50:12Z

原OP Benchmark里面的2个lerp配置不够典型，故PaddlePaddle/benchmark#1530 将第2个double类型的配置改成了一个实际模型中用到且性能很差的配置。另外PR中还贴了模型中几个其他的配置的性能，可供参考。

OP Benchmark CI性能如下：

配置一：前向+反向总GPU时间加速比为1.77x，反向GPU时间加速比为2.26x。

配置一：pr优化后nvprof结果：

配置二：前向+反向总GPU时间加速比为686x，反向GPU时间加速比为1597x。

配置二：pr优化后nvprof结果：

优化效果汇总如下表，符合黑客松的验收要求：

若您还有兴趣做进一步性能优化，可以考虑以下几个方面：

目前没有w.shape不是[1]的配置的性能，L123 - L146 BroadcastTensorsKernel + LerpGradKernelImpl可以考虑使用一个BroadcastKernel调用来替换。

nvprof中存在[CUDA memcpy DtoD]，应该是来自反向，可以排查下来源，看是否可以避免。

回复：1. 我在benchmark中增加了多一条配置：x: [16, 1, 1, ,1], y:[16, 3, 244, 244], w:[3, 1, 244]，三者都需要进行broadcast的情况。前向由于是一样的实现没有提升，后向大概提升100x，我后续补一下详细结果（已更新）。

两个kernel替换的工作下一次优化时进行
2. [CUDA memcpy DtoD] 的操作我查看运行图好像是由于benchmark执行中产生。我用benchmark测试别的算子也发现在前向和后向中间会加入一段 memcpy DtoD，具体原因还没有详细研究。

Xreki · 2022-10-10T02:25:39Z

@Rayman96 这个DtoD拷贝不是benchmark产生的，你看PaddlePaddle/benchmark#1530 ，我跑的develop的nvprof，并没有这个DtoD拷贝。

Rayman96 · 2022-10-10T05:22:16Z

PaddlePaddle/benchmark#1530

确实比较奇怪，会不会是由于PaddlePaddle/benchmark#1530 里全都是Eigen实现所以没有呀？我在本地测试好像只要有自己实现的kernel就会出现DtoD

paddle-bot bot added contributor External developers status: proposed labels Sep 10, 2022

luotao1 assigned luotao1, ZzSean and Ligoml Sep 13, 2022

Rayman96 marked this pull request as ready for review September 14, 2022 12:25

Rayman96 changed the title ~~For lerp~~ 【Hackathon No.36】优化 lerp_grad op 在 GPU 上的计算性能 Sep 14, 2022

Rayman96 mentioned this pull request Sep 14, 2022

【PaddlePaddle Hackathon 第三期】任务总览 #43938

Closed

This comment was marked as resolved.

Sign in to view

Rayman96 mentioned this pull request Sep 16, 2022

【Hackathon No.56&38】deformable_conv_v1 算子实现 float16 数据类型支持&前向运行加速 #46111

Merged

Rayman96 force-pushed the for_lerp branch 2 times, most recently from 303ecf0 to 89e45e1 Compare September 28, 2022 08:53

Rayman96 force-pushed the for_lerp branch 2 times, most recently from 81b4f44 to 170b04d Compare September 29, 2022 01:58

Rayman96 force-pushed the for_lerp branch from df79261 to 170b04d Compare September 29, 2022 02:39

add gpu kernel and tests

c56ba75

Rayman96 force-pushed the for_lerp branch from 170b04d to c56ba75 Compare September 29, 2022 02:45

expand broadcast to rank 6

bdc3e18

ZzSean reviewed Sep 29, 2022

View reviewed changes

modify comments and test

864b14d

ZzSean reviewed Sep 30, 2022

View reviewed changes

Rayman96 added 8 commits September 30, 2022 11:29

edit return

69e44a2

change kernel name

85ac23d

change bool value

9257d6a

change scala

086a544

change scala

2de5b76

modify reduce kernel

4a9902d

modify unit_test

2b0db4d

modify unit_test

eea788b

Xreki reviewed Oct 8, 2022

View reviewed changes

Rayman96 added 2 commits October 8, 2022 13:15

modify year

2357734

part of grad do not need reduce

8cae570

Xreki reviewed Oct 9, 2022

View reviewed changes

fix

ba01c36

Xreki reviewed Oct 9, 2022

View reviewed changes

modify for null

b6ef210

Xreki reviewed Oct 9, 2022

View reviewed changes

modify for null

5bd79c0

Xreki approved these changes Oct 10, 2022

View reviewed changes

Xreki merged commit ef61df3 into PaddlePaddle:develop Oct 10, 2022

		@@ -1,4 +1,4 @@
		// Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
		// Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.

	phi::funcs::ElementwiseKernel<T, decltype(functor), 2>(
	dev_ctx, ins, &outs, functor);

	if (dx) {
	dx_dims = phi::funcs::ExtendDims2Rank(dx->dims(), D);
	phi::funcs::GetBroadcastDims<D>(dx_dims, dout_dims, &dx_bcast_dims);
	}
	if (dy) {
	dy_dims = phi::funcs::ExtendDims2Rank(dy->dims(), D);
	phi::funcs::GetBroadcastDims<D>(dy_dims, dout_dims, &dy_bcast_dims);
	}
	phi::funcs::GetBroadcastDims<D>(w_dims, dout_dims, &w_bcast_dims);

	auto eigen_w = phi::EigenTensor<T, D>::From(w, w_dims);
	auto eigen_dout = phi::EigenTensor<T, D>::From(dout);

【Hackathon No.36】优化 lerp_grad op 在 GPU 上的计算性能 #45946

【Hackathon No.36】优化 lerp_grad op 在 GPU 上的计算性能 #45946

Conversation

Rayman96 commented Sep 10, 2022 • edited Loading

PR types

PR changes

Describe

paddle-bot bot commented Sep 10, 2022

paddle-bot bot commented Sep 10, 2022 • edited Loading

Rayman96 commented Sep 15, 2022

Rayman96 commented Sep 15, 2022 • edited Loading

This comment was marked as resolved.

Rayman96 commented Sep 20, 2022

ZzSean commented Sep 28, 2022

Rayman96 commented Sep 28, 2022

Rayman96 commented Sep 28, 2022 • edited Loading

CLAassistant commented Sep 29, 2022 • edited Loading

luotao1 commented Sep 29, 2022

Rayman96 commented Sep 29, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Xreki Oct 8, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AnnaTrainingG commented Oct 9, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Xreki commented Oct 9, 2022 • edited Loading

Xreki left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Xreki left a comment

Choose a reason for hiding this comment

Rayman96 commented Oct 10, 2022 • edited Loading

Xreki commented Oct 10, 2022 • edited Loading

Rayman96 commented Oct 10, 2022

Rayman96 commented Sep 10, 2022 •

edited

Loading

paddle-bot bot commented Sep 10, 2022 •

edited

Loading

Rayman96 commented Sep 15, 2022 •

edited

Loading

Rayman96 commented Sep 28, 2022 •

edited

Loading

CLAassistant commented Sep 29, 2022 •

edited

Loading

Xreki Oct 8, 2022 •

edited

Loading

Xreki commented Oct 9, 2022 •

edited

Loading

Xreki left a comment •

edited

Loading

Rayman96 commented Oct 10, 2022 •

edited

Loading

Xreki commented Oct 10, 2022 •

edited

Loading