New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

【Hackathon 6th Article No.3】Tensor 索引的使用指南&学习心得 #838

Merged

luotao1 merged 8 commits into PaddlePaddle:master from AndSonder:index

Apr 9, 2024

Contributor

AndSonder commented Mar 26, 2024 •

edited

Loading

【Hackathon 6th】优秀稿件征集与传播 Paddle#62907

AndSonder added 2 commits

March 26, 2024 22:59


          add tensor index using guide

39a8309

add

982c622

paddle-bot bot commented Mar 26, 2024

你的PR提交成功，感谢你对开源项目的贡献!
请检查PR提交格式和内容是否完备，具体请参考示例和模版。
Your PR has been submitted. Thanks for your contribution!
Please check its format and content. For this, you can refer to Template and Demo.

paddle-bot bot added the contributor label


          update

da17f20

luotao1 mentioned this pull request

【Hackathon 6th】优秀稿件征集与传播 PaddlePaddle/Paddle#62907

Closed


          update

606bb24

luotao1 assigned luotao1 and zoooo0820


          update

e887b2a

Contributor Author

AndSonder commented Mar 28, 2024

@zoooo0820 初稿已经写完啦，麻烦研发老师帮忙 review ~

zoooo0820 reviewed

View reviewed changes

rfcs/Article/20240321_guide_to_using_index.md Outdated

+              #  [16 17 18 19]
+              #  [20 21 22 23]]
+              # 先选取第二个2x3子张量,再在次内部选取第一维度为0的元素

Contributor

zoooo0820 Apr 1, 2024

这里次内部的说法有点不太通顺

rfcs/Article/20240321_guide_to_using_index.md

+              # Tensor Output:
+              # [[[[0, 1, 2]],
+              #   [[3, 4, 5]]]]

Contributor

zoooo0820 Apr 1, 2024

这部分以None相关的例子，可以也打印一下结果的shape，只看输出的话不是很直观

rfcs/Article/20240321_guide_to_using_index.md


		None 索引操作为处理不同形状的 Tensor 数据提供了很大的方便,能够灵活地对 Tensor 的维度进行调整, 使之满足后续计算的需求, 非常实用。

		> 注意：在动态图模式下，通过基础索引取值时，输出将是原 Tensor 的 view，即如果对输出 Tensor 再进行修改，会影响原 Tensor 的值。而在静态图模式下，输出是一个新的 Tensor。由于在两种模式下存在差异，请谨慎使用这个特性。

Contributor

zoooo0820 Apr 1, 2024

能否增加下对view这部分的用例说明

rfcs/Article/20240321_guide_to_using_index.md Outdated

+. 当索引为布尔型的Tensor/Ndarray/List时:
+                  - 索引的rank必须小于或等于被索引Tensor的rank
+                  - 引的每一维度大小必须与被索引Tensor对应维度相同

Contributor

zoooo0820 Apr 1, 2024

引的每一维度大小必须与被索引Tensor对应维度相同

这里语句有漏字

rfcs/Article/20240321_guide_to_using_index.md Outdated

+              # Tensor Output: []
+              ```
+              需要注意的是,如果在布尔索引过程中没有任何元素被选中,输出将是一个 0 维 Shape Tensor,不包含具体数据。

Contributor

zoooo0820 Apr 1, 2024

输出将是一个 0 维 Shape Tensor

我们这里统一使用 0-size Tensor吧，表示tensor.numel = 0，不含具体数据，其shape中包含0值；作为对比，0维Tensor指 tensor.rank=0，tensor.numel=1，其shape = ()

rfcs/Article/20240321_guide_to_using_index.md

+              动态图模式下, 仍可使用 __setitem__, 底层提供动转静策略保证正确性。
+              ```python
+              import paddle

Contributor

zoooo0820 Apr 1, 2024

这个地方，可以再补充说明下第二个参数的写法，比如 ： 在这这里要怎么写，以及多个轴同时有索引的情况要怎么写

rfcs/Article/20240321_guide_to_using_index.md Outdated

+              在 Paddle 中, 索引操作是可以自动求导的, 系统会根据索引操作的输入输出自动计算出正确的梯度。 具体来说,对于一个形状为 (M, N) 的 Tensor a, 执行索引 b = a[index], 其中 index 的形状为 (X, Y), 则反向传播时会有以下规则:
+. 前向传播 假设 a 对应前向输出为 Out,则有: Out.shape = (X, Y, N) 其中, Out[i,j,:] 就是 a[index[i,j]] 的值。
+.反向传播 假设 Out 对应的梯度为 dOut, 下面的代码可以表达这个过程：

Contributor

zoooo0820 Apr 1, 2024

此处的显示格式可否优化下

rfcs/Article/20240321_guide_to_using_index.md Outdated

+              confidence_threshold = 0.8
+              selected_indices = paddle.nonzero(pred_scores > confidence_threshold)
+              selected_boxes = paddle.index_select(pred_boxes, selected_indices)
+              selected_classes = paddle.index_select(pred_classes, selected_indices)

Contributor

zoooo0820 Apr 1, 2024

这个地方能否换成 b = a[index]的形式，或者换一个例子呢。这里的用例应该介绍这个形式写法的，广义的索引类型的API不在这个范围内。

rfcs/Article/20240321_guide_to_using_index.md

		## 8. 学习心得

		索引操作是深度学习中非常重要的一部分，它可以帮助我们高效地处理和操作数据。个人认为，索引操作对于初学者来说是一个比较难以理解的概念，因为它涉及到很多细节和技巧。特别是在处理高纬度数据时，我们要清楚每个纬度的含义。比如一个四纬的图像张量 [B, C, H, W]。B 代表 batch size，C 代表 channel，H 代表 height，W 代表 width。如果对第一个纬度取索引 0，那就是取出第一个 batch 的数据。如果对第二个纬度取索引 0，那就是取出第一个 channel 的数据。同时对第一个和第二个纬度取索引 [0, 0]，那就是取出第一个 batch 的第一个 channel 的数据。这样一层一层的索引下去，我们就可以取出我们想要的数据。我们也可以结合空间来进行理解，比如一个二维的张量就是一个平面，三维的张量就是一个立方体，四维的张量就是一个立方体的堆叠。这样我们就可以更好的理解索引操作。纸上得来终觉浅，绝知此事要躬行。学习索引操作，最重要的是多动手实践，多写代码，写多了就会熟练了。

Contributor

zoooo0820 Apr 1, 2024

纬度 -> 维度

rfcs/Article/20240321_guide_to_using_index.md

		## 9. 总结

		索引操作是深度学习中非常重要的一部分，它可以帮助我们高效地处理和操作数据。本文从基础索引、高级索引、索引赋值、索引的梯度传播等方面介绍了索引操作的基本概念和使用方法，并结合实际案例展示了索引在不同领域的应用。希本本文的内容能够帮助大家更好地理解和使用索引操作，提高数据处理和模型开发的效率。

Contributor

zoooo0820 Apr 1, 2024

希本本文的内容能够

这里还有一点错别字

zoooo0820 reviewed

View reviewed changes

rfcs/Article/20240321_guide_to_using_index.md



		## 7. 索引的实战案例

Contributor

zoooo0820 Apr 1, 2024

这部分还可以补充一个最近Nlp领域的，例如llama模型，代码可以参考paddlenlp


          update

fe771ec

Contributor Author

AndSonder commented Apr 1, 2024

都按照老师的要求修改啦，麻烦老师再 review 一下

AndSonder requested a review from zoooo0820

April 1, 2024 07:41


          update

55c9953

zoooo0820 previously approved these changes

View reviewed changes

Contributor

zoooo0820 left a comment

LGTM

luotao1 reviewed

View reviewed changes

Collaborator

luotao1 left a comment

从github预览看，图片显示过大影响阅读感受，需要变小一点


          Update 20240321_guide_to_using_index.md

577e639

AndSonder dismissed zoooo0820’s stale review via

577e639

April 8, 2024 11:19

luotao1 approved these changes

View reviewed changes

luotao1 merged commit 9f6491e into PaddlePaddle:master

1 check passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels