Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add paddle.device.cuda.stream_guard API #35623

Merged
merged 16 commits into from
Sep 15, 2021

Conversation

DesmonDay
Copy link
Contributor

@DesmonDay DesmonDay commented Sep 9, 2021

PR types

New features

PR changes

APIs

Describe

This API provide a way to switch Cuda Stream flexibly.
image

Offline Test

Async property test

# Test code
import threading
import paddle
import time
import numpy as np

numpy_data = np.random.rand(10000, 10000)
s1 = paddle.device.cuda.Stream()
s2 = paddle.device.cuda.Stream()
s3 = paddle.device.cuda.Stream()
m1 = paddle.to_tensor(numpy_data, place=paddle.CUDAPinnedPlace())
m2 = paddle.to_tensor(numpy_data)
m3 = paddle.to_tensor(numpy_data, place=paddle.CUDAPinnedPlace())
m4 = paddle.to_tensor(numpy_data)
m5 = paddle.to_tensor(numpy_data, place=paddle.CUDAPinnedPlace())
m6 = paddle.to_tensor(numpy_data)

data1 = paddle.to_tensor(numpy_data)
data2 = paddle.to_tensor(numpy_data)
paddle.device.cuda.synchronize()

for i in range(0, 40):
    if i == 10:
        paddle.fluid.core.nvprof_start()
    paddle.mm(data1, data2)
    with paddle.device.cuda.stream_guard(s1):
        m2._copy_to(paddle.CUDAPinnedPlace(), 0);
        m1._copy_to(paddle.CUDAPlace(0), 0);
    with paddle.device.cuda.stream_guard(s2):
        m4._copy_to(paddle.CUDAPinnedPlace(), 0);
        m3._copy_to(paddle.CUDAPlace(0), 0);
    with paddle.device.cuda.stream_guard(s3):
        m6._copy_to(paddle.CUDAPinnedPlace(), 0);
        m5._copy_to(paddle.CUDAPlace(0), 0);

    if i == 30:
        paddle.fluid.core.nvprof_stop()

image
From the picture above, we can see that CUDA Kernel and CUDA Memcpy can run asynchronously.

@paddle-bot-old
Copy link

paddle-bot-old bot commented Sep 9, 2021

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@@ -104,5 +105,32 @@ def test_cuda_event_methods(self):
self.assertTrue(event_query_2)


class TestStreamGuard(unittest.TestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

在PR上贴上验证的代码以及验证之后的效果


cur_stream = current_stream()
if stream is None or id(stream) == id(cur_stream):
yield
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里单测是不是要加上同样的stream

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

经讨论后不需要修改。

'''
Set the current stream.

Parameters:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameters->Args

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

询问了陈龙,Args 或者 Parameters 都可以,为了与本页面其他API 保持统一,不进行修改。


if (device == nullptr) {
int curr_device_id = platform::GetCurrentDeviceId();
auto device_tmp = platform::CUDAPlace(curr_device_id);
device = &device_tmp;
}

new (&self) paddle::platform::stream::CUDAStream(*device, prio);
new (&self) paddle::platform::stream::CUDAStream(*device, prio,
stream_flag);
#else
PADDLE_THROW(platform::errors::Unavailable(
"Class CUDAStream can only be initialized on the GPU platform."));
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的Stream方法,是不是可以默认non_blocking方式

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

enum class StreamFlag : uint8_t {
kDefaultFlag = 0x0,
kStreamNonBlocking = 0x1,
kStreamPerThread = 0x2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个kStreamPerThread 可以去掉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


A context manager that specifies the current stream context by the given stream.

Parameters:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paramters->Args

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

return core._set_current_stream(stream)


@signature_safe_contextmanager
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dygraph_only

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

待定

if stream is None or id(stream) == id(cur_stream):
yield
else:
pre_stream = _set_current_stream(stream)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stream 是否影响分布式环境?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

会进行线下测试,相关结果后续会贴在开头的 comment 中。

@@ -200,14 +212,16 @@ void BindCudaStream(py::module *m_ptr) {
"Priority should be 1(high) or 2(normal) "));
}
auto prio = paddle::platform::stream::Priority(priority);
auto stream_flag = paddle::platform::stream::StreamFlag(1);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the hard code 1 means?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 means non-blocking stream. We init CUDA Stream with default non-blocking property following pytorch implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using paddle::platform::stream::StreamFlag::kStreamNonBlocking instead of 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


if stream is None:
raise ValueError("input stream should not be None.")
if not isinstance(stream, paddle.device.cuda.Stream):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

下面的判断是否可以包含上面 None 的判断?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我想问,可不可以统一成 TypeError?(其实我不应该写成 ValueError,想统一改成 TypeError)

wawltor
wawltor previously approved these changes Sep 14, 2021
Copy link
Contributor

@wawltor wawltor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG API

Copy link
Contributor

@MingMingShangTian MingMingShangTian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wawltor wawltor merged commit 3218075 into PaddlePaddle:develop Sep 15, 2021
@DesmonDay DesmonDay changed the title Add paddle.cuda.device.stream_guard API Add paddle.device.cuda.stream_guard API Sep 26, 2021
AnnaTrainingG pushed a commit to AnnaTrainingG/Paddle that referenced this pull request Sep 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants