Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【Hackathon 6th Article No.6】稀疏计算的使用指南 #880

Merged
merged 5 commits into from
Apr 30, 2024

Conversation

KeithMaxwell
Copy link
Contributor

No description provided.

Copy link

paddle-bot bot commented Apr 22, 2024

你的PR提交成功,感谢你对开源项目的贡献!
请检查PR提交格式和内容是否完备,具体请参考示例模版
Your PR has been submitted. Thanks for your contribution!
Please check its format and content. For this, you can refer to Template and Demo.


可见,ResNet 稀疏网络的代码和常规 ResNet 网络代码几乎没有差别。通过增加 import 路径替换,原网络代码基本都无需改动。通过 `from paddle.sparse import nn`,则可保持与原来的`nn.*`写法一致,更易于上手。

## 3. 3D点云 CenterPoint
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

改成:Paddle 稀疏计算实战案例 吧

可见,ResNet 稀疏网络的代码和常规 ResNet 网络代码几乎没有差别。通过增加 import 路径替换,原网络代码基本都无需改动。通过 `from paddle.sparse import nn`,则可保持与原来的`nn.*`写法一致,更易于上手。

## 3. 3D点云 CenterPoint

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

介绍一下,将以 3D点云CenterPoint 为例来介绍

PaddlePaddle支持的主要稀疏格式包括:

- COO格式(Coordinate Format):用坐标表示非零元素的稀疏矩阵,包括三个数组:行索引、列索引和值。
- CSR格式(Compressed Sparse Row):将稀疏矩阵的行压缩,以节省存储空间。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

包含 几个数组 介绍下

```python
import numpy as np
import paddle.sparse as sparse
def random_sparse_tensor(shape, density, sparse_type='coo'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个可以写简单一些吗,直接paddle.rand 然后随机mask或dropout,然后在 to_sparse_coo/to_sparse_csr 就可以

@KeithMaxwell
Copy link
Contributor Author

@zhwesky2010 已经全部按照要求修改了,麻烦研发老师再看看

- COO格式(Coordinate Format):用坐标表示非零元素的稀疏矩阵,包括三个数组:行索引、列索引和值。
- CSR格式(Compressed Sparse Row):将稀疏矩阵的行压缩,以节省存储空间。
- COO格式(Coordinate Format):此格式使用坐标来表示稀疏矩阵中的非零元素,涉及三个数组:行索引、列索引和值。
- CSR格式(Compressed Sparse Row):此格式通过压缩稀疏矩阵的行来节省存储空间。它包含三个数组,即Index Pointers,indices和Data数组。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

行指针信息、列坐标和值

@@ -15,15 +15,22 @@

PaddlePaddle支持的主要稀疏格式包括:

- COO格式(Coordinate Format):用坐标表示非零元素的稀疏矩阵,包括三个数组:行索引、列索引和值。
- CSR格式(Compressed Sparse Row):将稀疏矩阵的行压缩,以节省存储空间。
- COO格式(Coordinate Format):此格式使用坐标来表示稀疏矩阵中的非零元素,涉及三个数组:行索引、列索引和值。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

行坐标、列坐标和值

@@ -84,6 +91,12 @@ CSR格式也存储三个数组,分别是Index Pointers,indices以及Data数

例如,第一个index pointers对是`[0,2]`,那么这是表示稀疏矩阵第0行(0在Index Pointers数组中的索引是0)中元素的信息,并且表示第0行中共有两个非零元素。而使用`Indices[0,2]`可以获得这两个元素的列索引,使用`Data[0,2]`获取这两个元素具体的值。

在`paddle.sparse`的CSR实现中,我们也使用了三个列表:

- 一维列表`crows`对应于Index Pointers数组;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上


COO格式:

![COO Matrix](images/coo.gif)

COO格式存储三个数组:行索引(Row)、列索引(Column)以及值(Data)数组。使用Data数组中的元素的索引分别去访问Row数组和Column数组就可以得到该元素在原来矩阵中的位置。

在`paddle.sparse`的COO实现中,使用了两个列表:

- 二维列表`indices`,包含行索引和列索引;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上


dense_tensor = paddle.randn(shape)
dropout = paddle.nn.Dropout(p=density)
dense_tensor = dropout(dense_tensor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paddle.nn.functional.dropout

if sparse_type == 'coo':
return sparse.sparse_coo_tensor(indices, values.tolist(), shape)
sparse_tensor = dense_tensor.to_sparse_coo(sparse_dim=dense_tensor.dim())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sparse_dim用默认的就可以吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sparse_dim用默认的就可以吧
@zhwesky2010
如下图所示,我没有给定sparse_dim参数的值,代码报错了:

我的paddle版本为稳定版2.6.1。

@KeithMaxwell
Copy link
Contributor Author

@zhwesky2010
已经按照要求修改了,麻烦研发老师再看看

@@ -15,29 +15,31 @@

PaddlePaddle支持的主要稀疏格式包括:

- COO格式(Coordinate Format):此格式使用坐标来表示稀疏矩阵中的非零元素,涉及三个数组:行索引、列索引和值。
- CSR格式(Compressed Sparse Row):此格式通过压缩稀疏矩阵的行来节省存储空间。它包含三个数组,即Index Pointers,indices和Data数组。
- COO格式(Coordinate Format):此格式使用坐标来表示稀疏矩阵中的非零元素,涉及三个数组:行坐标、列坐标和值数组。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

包含三个数组


* Index Pointers数组中相邻的两个元素可以确定两个信息
* 指针对中第一个元素的坐标r是稀疏矩阵行号
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一句可以删掉


* 其次,这些值表示Indices数组的 [start: stop] 切片,它们的差是每行中非零元素的个数。使用指针查找索引以确定数据中每个元素的列
* 若a不等于b,可以用a和b两个值构造一个切片[a:b],那么使用该切片访问列坐标数组就可以得到第r行中非零元素的列坐标;若使用该切片访问值数组就可以得到第r行中非零元素的值
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

写简单一点,若a不等于b,那么 indices[a: b]就是第r行中非零元素的列坐标、Data[a: b] 就是第r行中非零元素的值


CSR格式也存储三个数组,分别是行指针信息(Index Pointers),列坐标(indices)以及值(Data)数组。

* 行指针信息数组中相邻的两个元素,假设它们的坐标分别是r和r+1,值分别为a和b,那么可以确定:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

行指针信息数组记录了....,假设其中两个相邻的元素坐标分别是...

* Indices数组记录了每个元素的列索引。
* Data数组记录了元素的值
* 列坐标数组记录了非零元素的列坐标。
* 值数组记录了非零元素的值

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个例子就不需要了,上面已经通过r、r+1举例

@KeithMaxwell
Copy link
Contributor Author

@zhwesky2010 已经按照要求修改了,麻烦研发老师再看看

Copy link
Contributor

@zhwesky2010 zhwesky2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhwesky2010 zhwesky2010 merged commit aabe7b2 into PaddlePaddle:master Apr 30, 2024
1 check passed
@KeithMaxwell KeithMaxwell deleted the sparse branch April 30, 2024 06:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants