Add paddle.device.cuda.stream_guard API #35623

DesmonDay · 2021-09-09T10:54:52Z

PR types

New features

PR changes

APIs

Describe

This API provide a way to switch Cuda Stream flexibly.

Offline Test

Async property test

# Test code
import threading
import paddle
import time
import numpy as np

numpy_data = np.random.rand(10000, 10000)
s1 = paddle.device.cuda.Stream()
s2 = paddle.device.cuda.Stream()
s3 = paddle.device.cuda.Stream()
m1 = paddle.to_tensor(numpy_data, place=paddle.CUDAPinnedPlace())
m2 = paddle.to_tensor(numpy_data)
m3 = paddle.to_tensor(numpy_data, place=paddle.CUDAPinnedPlace())
m4 = paddle.to_tensor(numpy_data)
m5 = paddle.to_tensor(numpy_data, place=paddle.CUDAPinnedPlace())
m6 = paddle.to_tensor(numpy_data)

data1 = paddle.to_tensor(numpy_data)
data2 = paddle.to_tensor(numpy_data)
paddle.device.cuda.synchronize()

for i in range(0, 40):
    if i == 10:
        paddle.fluid.core.nvprof_start()
    paddle.mm(data1, data2)
    with paddle.device.cuda.stream_guard(s1):
        m2._copy_to(paddle.CUDAPinnedPlace(), 0);
        m1._copy_to(paddle.CUDAPlace(0), 0);
    with paddle.device.cuda.stream_guard(s2):
        m4._copy_to(paddle.CUDAPinnedPlace(), 0);
        m3._copy_to(paddle.CUDAPlace(0), 0);
    with paddle.device.cuda.stream_guard(s3):
        m6._copy_to(paddle.CUDAPinnedPlace(), 0);
        m5._copy_to(paddle.CUDAPlace(0), 0);

    if i == 30:
        paddle.fluid.core.nvprof_stop()

From the picture above, we can see that CUDA Kernel and CUDA Memcpy can run asynchronously.

paddle-bot-old · 2021-09-09T10:54:55Z

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

wawltor · 2021-09-13T10:50:48Z

python/paddle/fluid/tests/unittests/test_cuda_stream_event.py

@@ -104,5 +105,32 @@ def test_cuda_event_methods(self):
            self.assertTrue(event_query_2)


+class TestStreamGuard(unittest.TestCase):


在PR上贴上验证的代码以及验证之后的效果

wawltor · 2021-09-13T10:52:05Z

python/paddle/device/cuda/__init__.py

+
+    cur_stream = current_stream()
+    if stream is None or id(stream) == id(cur_stream):
+        yield


这里单测是不是要加上同样的stream

经讨论后不需要修改。

wawltor · 2021-09-13T10:53:24Z

python/paddle/device/cuda/__init__.py

+    '''    
+    Set the current stream.
+
+    Parameters:


Parameters->Args

询问了陈龙，Args 或者 Parameters 都可以，为了与本页面其他API 保持统一，不进行修改。

wawltor · 2021-09-13T10:56:01Z

paddle/fluid/pybind/cuda_streams_py.cc


             if (device == nullptr) {
               int curr_device_id = platform::GetCurrentDeviceId();
               auto device_tmp = platform::CUDAPlace(curr_device_id);
               device = &device_tmp;
             }

-             new (&self) paddle::platform::stream::CUDAStream(*device, prio);
+             new (&self) paddle::platform::stream::CUDAStream(*device, prio,
+                                                              stream_flag);
 #else
            PADDLE_THROW(platform::errors::Unavailable(
        "Class CUDAStream can only be initialized on the GPU platform."));
 #endif


这里的Stream方法，是不是可以默认non_blocking方式

wawltor · 2021-09-13T10:56:54Z

paddle/fluid/platform/stream/cuda_stream.h

+enum class StreamFlag : uint8_t {
+  kDefaultFlag = 0x0,
+  kStreamNonBlocking = 0x1,
+  kStreamPerThread = 0x2,


这个kStreamPerThread 可以去掉

wawltor · 2021-09-13T11:00:45Z

python/paddle/device/cuda/__init__.py

+
+    A context manager that specifies the current stream context by the given stream.
+
+    Parameters:


Paramters->Args

ZHUI · 2021-09-13T11:56:57Z

python/paddle/device/cuda/__init__.py

+    return core._set_current_stream(stream)
+
+
+@signature_safe_contextmanager


@dygraph_only

ZHUI · 2021-09-13T11:59:28Z

python/paddle/device/cuda/__init__.py

+    if stream is None or id(stream) == id(cur_stream):
+        yield
+    else:
+        pre_stream = _set_current_stream(stream)


stream 是否影响分布式环境？

会进行线下测试，相关结果后续会贴在开头的 comment 中。

ZHUI · 2021-09-13T12:02:58Z

paddle/fluid/pybind/cuda_streams_py.cc

@@ -200,14 +212,16 @@ void BindCudaStream(py::module *m_ptr) {
                   "Priority should be 1(high) or 2(normal) "));
             }
             auto prio = paddle::platform::stream::Priority(priority);
+             auto stream_flag = paddle::platform::stream::StreamFlag(1);



What's the hard code 1 means?

1 means non-blocking stream. We init CUDA Stream with default non-blocking property following pytorch implementation.

How about using paddle::platform::stream::StreamFlag::kStreamNonBlocking instead of 1?

ZHUI · 2021-09-14T02:38:34Z

python/paddle/device/cuda/__init__.py

+
+    if stream is None:
+        raise ValueError("input stream should not be None.")
+    if not isinstance(stream, paddle.device.cuda.Stream):


下面的判断是否可以包含上面 None 的判断？

我想问，可不可以统一成 TypeError？（其实我不应该写成 ValueError，想统一改成 TypeError）

… add_stream_guard

wawltor

LGTM

XiaoguangHu01

LG API

MingMingShangTian

LGTM

Add paddle.cuda.device.stream_guard API

add stream_guard and set stream default nonblocking

42b6c9a

DesmonDay added 2 commits September 9, 2021 13:48

add stream_guard test case

565da73

fix typo

6321ed1

ZHUI requested review from Aurelius84, MingMingShangTian, wawltor and ZHUI September 10, 2021 03:38

fix doc in stream_guard

c364df1

DesmonDay mentioned this pull request Sep 10, 2021

Add paddle.device.cuda.stream_guard doc PaddlePaddle/docs#3869

Merged

DesmonDay force-pushed the add_stream_guard branch from c00a85d to 4047772 Compare September 10, 2021 06:27

revise doc in stream_guard

19a17c9

DesmonDay force-pushed the add_stream_guard branch from 4047772 to 19a17c9 Compare September 10, 2021 06:30

DesmonDay added 2 commits September 13, 2021 07:45

add blocking flag

70a7cfe

bug fix for add flag

9d7dd65

wawltor reviewed Sep 13, 2021

View reviewed changes

delete pybind flag

11fecca

ZHUI reviewed Sep 13, 2021

View reviewed changes

change hard code to readable code

908d6d6

ZHUI reviewed Sep 14, 2021

View reviewed changes

DesmonDay added 5 commits September 14, 2021 03:17

add test cases

ecc413c

fix conflicts

7f4c12a

fix conflicts

9729e22

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

c13f7ee

… add_stream_guard

add stream_guard api

44b12d3

wawltor previously approved these changes Sep 14, 2021

View reviewed changes

fix stream_guard

2b28170

DesmonDay dismissed wawltor’s stale review via 2b28170 September 14, 2021 09:38

ZHUI requested a review from LemonNoel September 14, 2021 09:40

add condition

868ad77

XiaoguangHu01 approved these changes Sep 14, 2021

View reviewed changes

MingMingShangTian approved these changes Sep 15, 2021

View reviewed changes

TCChenlong approved these changes Sep 15, 2021

View reviewed changes

wawltor merged commit 3218075 into PaddlePaddle:develop Sep 15, 2021

DesmonDay changed the title ~~Add paddle.cuda.device.stream_guard API~~ Add paddle.device.cuda.stream_guard API Sep 26, 2021

AnnaTrainingG pushed a commit to AnnaTrainingG/Paddle that referenced this pull request Sep 29, 2021

Add paddle.cuda.device.stream_guard API (PaddlePaddle#35623)

57ba97b

Add paddle.cuda.device.stream_guard API

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add paddle.device.cuda.stream_guard API #35623

Add paddle.device.cuda.stream_guard API #35623

DesmonDay commented Sep 9, 2021 •

edited

Loading

paddle-bot-old bot commented Sep 9, 2021

wawltor Sep 13, 2021

wawltor Sep 13, 2021

DesmonDay Sep 13, 2021

wawltor Sep 13, 2021

DesmonDay Sep 13, 2021

wawltor Sep 13, 2021

DesmonDay Sep 13, 2021

wawltor Sep 13, 2021

DesmonDay Sep 13, 2021

wawltor Sep 13, 2021

DesmonDay Sep 13, 2021

ZHUI Sep 13, 2021

DesmonDay Sep 13, 2021

ZHUI Sep 13, 2021

DesmonDay Sep 13, 2021

ZHUI Sep 13, 2021

DesmonDay Sep 13, 2021

ZHUI Sep 13, 2021

DesmonDay Sep 13, 2021

ZHUI Sep 14, 2021

DesmonDay Sep 14, 2021

wawltor left a comment

XiaoguangHu01 left a comment

MingMingShangTian left a comment

		@@ -104,5 +105,32 @@ def test_cuda_event_methods(self):
		self.assertTrue(event_query_2)


		class TestStreamGuard(unittest.TestCase):


		A context manager that specifies the current stream context by the given stream.

		Parameters:

		return core._set_current_stream(stream)


		@signature_safe_contextmanager

Add paddle.device.cuda.stream_guard API #35623

Add paddle.device.cuda.stream_guard API #35623

Conversation

DesmonDay commented Sep 9, 2021 • edited Loading

PR types

PR changes

Describe

Offline Test

Async property test

paddle-bot-old bot commented Sep 9, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wawltor left a comment

Choose a reason for hiding this comment

XiaoguangHu01 left a comment

Choose a reason for hiding this comment

MingMingShangTian left a comment

Choose a reason for hiding this comment

DesmonDay commented Sep 9, 2021 •

edited

Loading