-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【PaddlePaddle Hackathon 2】15 新增 API Nanmedian #42385
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
@TCChenlong 多谢龙哥,已经修改好, 现在只剩approved error了 |
PR格式检查通过,你的PR将接受Paddle专家以及开源社区的review,请及时关注PR动态。 |
python/paddle/tensor/stat.py
Outdated
@@ -253,6 +253,140 @@ def numel(x, name=None): | |||
return out | |||
|
|||
|
|||
def nanmedian(x, axis=None, ignore_nan=True, keepdim=True, name=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no parameter of ignore_nan
in nanmedian RFC. if we want to add this parameter, should change RFC first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This RFC doc was updated with a new PR: PaddlePaddle/community#124
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discussed this case with some paddle experts today.
In Paddle v2 3(coming this month) C++ API will be officially introduced, a program (such as model Networking) can be composed by basic C++ APIs instead of Python APIs, which is very useful for some performance pursuit scenarios.
In order to reduce the threshold for users to use C++ API, it is necessary to ensure that the experience of using C++ API is the same with Python, that is, the parameters of the two should be the same (the different ones are being modified in plan), and the documents of Python API can be reused without writing another document about C++ API.
Therefore, in this case, the RFC doc does not need to be modified, and the parameter ignore_nan
should be moved to a kernel function (e.g. BaseMedianLernel), and then both nanmediankernel and mediankernel call this function using different value of ignore_nan
. So the parameters of Python API, C++ API and kernel are the same, and the paddle Median and padding Nanmedian is used in the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your proposal is quite reasonable. I will move the logic into c++ implementation. I will adopt this structure, BaseMedianKernel(with parameters axis
, keepdim
), and then calculate the presence of nan values, and futher determine which kernel to pick (nanmediankernel and mediankernel). This param ignore_nan
wont be used any more. If I understood incorrectly, pls let me know, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
python/paddle/tensor/stat.py
Outdated
If ``axis`` is less than 0, it works the same way as :math:`axis + D`. | ||
If ``axis`` is None, median is calculated over all elements of ``x``. Default is None. | ||
ignore_nan (bool, optional): Whether to ignore nan values when median was calculated. | ||
If `ignore_nan` is True, the calculation process is the same as `median` operator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ignore_nan
is False, the calculation process is the same as median
operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add link of RFC(API design documents) in description
already added.
python/paddle/tensor/stat.py
Outdated
res_shape = [x for x in out_shape if x > 0] | ||
|
||
if _in_legacy_dygraph(): | ||
medians, out = _C_ops.nanmedian(x, 'ignore_nan', ignore_nan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the comments above, the parameters of Python API and C++ API should be the same, and the preprocessing logic in Python above should also be put into C++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A question: can c++ api support the python data type like tuple and None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
int64_t pos = (stride - num_nan - 1) / 2; | ||
std::nth_element(col_vec.begin(), | ||
col_vec.begin() + pos, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use col_vec.begin() + pos + 1 is better? no need to use std::nth_element again in if statement below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Learned from https://en.cppreference.com/w/cpp/algorithm/nth_element, the order of elements before the nth element seems to be somehow unreliable, so I fetched the n-1th element again. If the index of elements are needed, this std:nth_element would not be the good choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
请补充中文文档,并将链接附在 Describe 中 |
@TCChenlong 完成 |
@@ -241,6 +241,103 @@ def numel(x, name=None): | |||
return out | |||
|
|||
|
|||
def nanmedian(x, axis=None, keepdim=True, name=None): | |||
""" | |||
Compute the median along the specified axis, while ignoring NaNs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在文档前加r
,现在无法正常解析英文文档
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ligoml 已加
@@ -331,6 +331,7 @@ | |||
from .tensor.stat import var # noqa: F401 | |||
from .tensor.stat import numel # noqa: F401 | |||
from .tensor.stat import median # noqa: F401 | |||
from .tensor.stat import nanmedian # noqa: F401 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个api需要加在 __all__
中,否则不会被正常展示在api列表中
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ligoml 已修改
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for docs
@@ -0,0 +1,121 @@ | |||
/*Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
license换行缩进有点问题,建议参考别的改一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
老师 您好,已修改
paddle/phi/infermeta/unary.cc
Outdated
out->set_dims(make_ddim(out_dim)); | ||
} | ||
|
||
void NanmedianGradInferMeta(const MetaTensor& x, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
反向的麻烦放到backward.h/cc中
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
老师 您好,已修改
bool should_ignore_nan = ignore_nan; | ||
auto stream = dev_ctx.stream(); | ||
auto* ctx = | ||
reinterpret_cast<const paddle::platform::CUDADeviceContext*>(&dev_ctx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里好像不需要转换?直接使用dev_ctx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
老师 您好,已修改
for (int i = 0; i < ndims; i++) { | ||
trans_dim[i] = input_dim[perm[i]]; | ||
} | ||
x->mutable_data<T>(trans_dim, dev_ctx.GetPlace()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
麻烦换成dev_ctx.template Alloc()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
老师 您好,已修改 @chenwhql
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for change parallel_UT_rule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* nanmedian op * 修改cuda kernel的bug * 修复count_if在其他硬件平台不兼容 * 修复某些cpu硬件不兼容 * 修复某些cpu硬件不兼容 * 修复isnan判断 * 兼容numpy低版本不支持全部nan的情况 * 兼容numpy低版本不支持全部nan的情况 * fix code example * fix api comment error * 修改反向传播逻辑以及c++处理逻辑 * 完成修改建议 * typo pre_dim * update en docs, test=document_fix * remove numpy in en doc, test=document_fix * add r,test=document_fix * 添加api到all * follow advice from chenwhql
PR types
New features
PR changes
APIs
Describe
完成第二期第15项目开发任务: #40318
paddle.nanmedian 扩展了 paddle.median API 的功能,如果输入Tensor中有nan值, paddle.median 会将nan值考虑在内然后取中位数(有可能为nan值),而 paddle.nanmedian 只考虑非nan值的中位数。
RFC设计文档: PaddlePaddle/community#89
中文文档: PaddlePaddle/docs#4820