Skip to content

Commit

Permalink
Update on "Switch CI to pytorch 1.13"
Browse files Browse the repository at this point in the history
Closes #515
Closes #514

Note:
`static_argnums`'s argument to `memory_efficient_fusion` is now removed, so had to update some code

[ghstack-poisoned]
  • Loading branch information
danthe3rd committed Nov 14, 2022
2 parents 985792f + 3362adf commit 9b8a17d
Show file tree
Hide file tree
Showing 89 changed files with 793 additions and 547 deletions.
4 changes: 3 additions & 1 deletion BENCHMARKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Some examples, generated with `python3 xformers/benchmarks/benchmark_encoder.py

## Benchmark the core sparse attention mechanisms

`python3 xformers./benchmarks/benchmark_core.py` will measure the speed of the core sparse attention mechanism. The current numbers are as follows (times in microseconds (us)):
`python3 xformers/benchmarks/benchmark_core.py` will measure the speed of the core sparse attention mechanism. The current numbers are as follows (times in microseconds (us)):

| | **matmul_with_mask** | | **softmax** | | **bmm** | |
| ---------------------- | --------------------- | ---------------------- | --------------------- | ---------------------- | --------------------- | ---------------------- |
Expand All @@ -37,6 +37,8 @@ Some examples, generated with `python3 xformers/benchmarks/benchmark_encoder.py

## Triton layers

Please not that as of November 2022 these layers are not optimized for typical production GPUs out there (not developed for some time and mostly tested on a laptop GPU), and that better performances are probably possible with some minor changes as proven in other libraries since xformers went out.

### Fused softmax

You can reproduce these numbers locally by running `python3 xformers/benchmarks/benchmark_triton_softmax.py`. The units are GB/s. These results are for a laptop nVidia 3080, Triton 2.0 and PyTorch 1.12.
Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## TBD
### Fixed
- Updated triton dependency [#418]

### Added

Expand Down
160 changes: 52 additions & 108 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
<br/><!--
![PyPI](https://img.shields.io/pypi/v/xformers)
![PyPI - License](https://img.shields.io/pypi/l/xformers)
-->
[![Documentation Status](https://github.com/facebookresearch/xformers/actions/workflows/gh-pages.yml/badge.svg)](https://github.com/facebookresearch/xformers/actions/workflows/gh-pages.yml/badge.svg)
-->
[![CircleCI](https://circleci.com/gh/facebookresearch/xformers.svg?style=shield)](https://app.circleci.com/pipelines/github/facebookresearch/xformers/)
[![Codecov](https://codecov.io/gh/facebookresearch/xformers/branch/main/graph/badge.svg?token=PKGKDR4JQM)](https://codecov.io/gh/facebookresearch/xformers)
[![black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
Expand All @@ -19,115 +19,52 @@
-->
--------------------------------------------------------------------------------

## Description

xFormers is a modular and field agnostic library to flexibly generate transformer architectures from interoperable and optimized building blocks. These blocks are not limited to xFormers and can also be cherry picked as the user see fit.

## Getting started

The full [documentation](https://facebookresearch.github.io/xformers/) contains instructions for getting started, deep dives and tutorials about the various APIs.
If in doubt, please check out the [HOWTO](HOWTO.md). Only some general considerations are laid out in the README.
## xFormers - Toolbox to Accelerate Research on Transformers

For recent changes, you can have a look at the [changelog](CHANGELOG.md)
xFormers is:
- **Customizable building blocks**: Independant/customizable building blocks that can be used without boilerplate code. The components are domain-agnostic and xFormers is used by researchers in vision, NLP and more.
- **Research first**: xFormers contains bleeding-edge components, that are not yet available in mainstream libraries like pytorch.
- **Built with efficiency in mind**: Because speed of iteration matters, components are as fast and memory-efficient as possible. xFormers contains its own CUDA kernels, but dispatches to other libraries when relevant.

## Installing xFormers

### Installation

To install xFormers, it is recommended to use a dedicated virtual environment, as often with python, through `python-virtualenv` or `conda` for instance.
PyTorch must be installed. Using conda for example:
* **(RECOMMENDED) Using binaries**: We provide binaries for Linux and recent PyTorch versions. Install xFormers with conda:

```bash
conda create --name xformers python=3.10
conda activate xformers
conda install -c pytorch -c conda-forge cudatoolkit=11.6 pytorch=1.12.1
conda install xformers -c xformers/label/dev
```

*Please note that Pytorch 1.12 or newer is required.

There are two ways you can install xFormers locally:

<details><summary> Conda dev packages </summary><p>

There are regular builds of xformers as it is developed on the `main` branch.
To use these, you must be on Linux and have a conda environment with Python 3.9 or 3.10, CUDA 11.3 or 11.6, and PyTorch 1.12.1.
You can install the latest with

```bash
conda install xformers -c xformers/label/dev
```

</p></details>

<details><summary> Build from source (dev mode) </summary><p>

These commands will fetch the latest version of the code and then install xFormers from source.
If you want to build the sparse attention CUDA kernels, please make sure that the next point is covered prior to running these instructions.

```bash
git clone [email protected]:facebookresearch/xformers.git
git submodule update --init --recursive
conda create --name xformer_env python=3.8
conda activate xformer_env
cd xformers
pip install -r requirements.txt
pip install -e .
# or, for OSX
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ pip install -e .
```

</p></details>

### Installing custom (non-pytorch) parts
* **From source**: Alternatively, if no binaries are available (for instance for windows), you can also install from source:

<details><summary> Sparse attention kernels </summary><p>

Installing the CUDA-based sparse attention kernels may require extra care, as this mobilizes the CUDA toolchain. As a reminder, these kernels are built when you run `pip install -e .` and the CUDA buildchain is available (NVCC compiler). Re-building can for instance be done via `python3 setup.py clean && python3 setup.py develop`, so similarly wipe the `build` folder and redo a pip install -e.

Some advices related to building these CUDA-specific components, tentatively adressing common pitfalls. Please make sure that:

* NVCC and the current CUDA runtime match. Depending on your setup, you may be able to change the CUDA runtime with `module unload cuda module load cuda/xx.x`, possibly also `nvcc`
* the version of GCC that you're using matches the current NVCC capabilities
* the `TORCH_CUDA_ARCH_LIST` env variable is set to the architures that you want to support. A suggested setup (slow to build but comprehensive) is `export TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.2;8.0;8.6"`

</p></details>

<details><summary> Triton </summary><p>

Some parts of xFormers use [Triton](http://www.triton-lang.org), and will only expose themselves if Triton is installed, and a compatible GPU is present (nVidia GPU with tensor cores). If Triton was not installed as part of the testing procedure, you can install it directly by running `pip install triton`. You can optionally test that the installation is successful by running one of the Triton-related benchmarks, for instance `python3 xformers/benchmarks/benchmark_triton_softmax.py`

Triton will cache the compiled kernels to `/tmp/triton` by default. If this becomes an issue, this path can be specified through the `TRITON_CACHE_DIR` environment variable.

</p></details>
```bash
# (Optional) Makes the build much faster
pip install ninja
# Set TORCH_CUDA_ARCH_LIST if running and building on different GPU types
pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
# (this can take dozens of minutes)
```

<details><summary> AOTAutograd/NVFuser </summary><p>
* **pip wheels**: There is no updated package available on pip, please install from conda or from source

Some parts of xFormers use AOT Autograd from the [FuncTorch](https://pytorch.org/functorch/stable/) library, and will only expose themselves if FuncTorch is installed, and a compatible GPU is present. If functorch was not installed as part of the testing procedure, you can install it directly through pip.

```bash
pip install functorch
```
## Results

Once installed, set the flag `_is_functorch_available = True` in `xformers/__init__.py`. You can optionally test that the installation is successful by running one of the functorch-related benchmarks `python3 xformers/benchmarks/benchmark_nvfuser.py`
**Memory-efficient MHA**
![Benchmarks for ViTS](./docs/plots/mha/mha_vit.png)
*Setup: A100 on f16, measured total time for a forward+backward pass*

If you are importing the xFormers library in a script, you can modify the flag as such:
Note that this is exact attention, not an approximation, just by calling [`xformers.ops.memory_efficient_attention`](https://facebookresearch.github.io/xformers/components/ops.html#xformers.ops.memory_efficient_attention)

```python
import xformers
xformers._is_functorch_available = True
```
**More benchmarks**

</p></details>
xFormers provides many components, and more benchmarks are available in [BENCHMARKS.md](BENCHMARKS.md).

### Testing the installation
### (Optional) Testing the installation

This will run a benchmark of the attention mechanisms exposed by xFormers, and generate a runtime and memory plot.
If this concludes without errors, the installation is successful. This step is optional, and you will need some extra dependencies for it to
be able to go through : `pip install -r requirements-benchmark.txt`.

Once this is done, you can run this particular benchmark as follows:
This command will provide information on an xFormers installation, and what kernels are built/available:

```python
python3 xformers/benchmarks/benchmark_encoder.py --activations relu --plot -emb 256 -bs 32 -heads 16
python -m xformers.info
```

## Using xFormers
Expand All @@ -147,6 +84,8 @@ Models are thus not implemented in monolithic files, which are typically complic
### Repo map

```bash
├── ops # Functional operators
└ ...
├── components # Parts zoo, any of which can be used directly
│ ├── attention
│ │ └ ... # all the supported attentions
Expand All @@ -156,11 +95,7 @@ Models are thus not implemented in monolithic files, which are typically complic
│ │ └ ... # all the supported positional embeddings
│ ├── activations.py #
│ └── multi_head_dispatch.py # (optional) multihead wrap
├── factory # Build model programatically
│ ├── block_factory.py # (optional) helper to programatically generate layers
│ └── model_factory.py # (optional) helper to programatically generate models
|
├── benchmarks
│ └ ... # A lot of benchmarks that you can use to test some parts
└── triton
Expand Down Expand Up @@ -258,26 +193,33 @@ Patrick et al., 2021](https://arxiv.org/abs/2106.05392)*

1. Many attention mechanisms, interchangeables
2. Optimized building blocks, beyond PyTorch primitives
1. sparse attention
2. block-sparse attention
3. fused softmax
4. fused linear layer
5. fused layer norm
6. fused dropout(activation(x+bias))
1. Memory-efficient exact attention - up to 10x faster
2. sparse attention
3. block-sparse attention
4. fused softmax
5. fused linear layer
6. fused layer norm
7. fused dropout(activation(x+bias))
8. fused SwiGLU
3. Benchmarking and testing tools
1. [micro benchnmarks](BENCHMARKS.md)
2. transformer block benchmark
3. [LRA](xformers/benchmarks/LRA/README.md), with SLURM suppot
3. [LRA](xformers/benchmarks/LRA/README.md), with SLURM support
4. Programatic and sweep friendly layer and model construction
1. Compatible with hierarchical Transformers, like Swin or Metaformer
5. Hackable
1. Not using monolithic CUDA kernels, composable building blocks
2. Using [Triton](https://triton-lang.org/) for some optimized parts, explicit, pythonic and user-accessible
3. Native support for SquaredReLU (on top of ReLU, LeakyReLU, GeLU, ..), extensible activations

### FAQ ?
### Install troubleshooting


* NVCC and the current CUDA runtime match. Depending on your setup, you may be able to change the CUDA runtime with `module unload cuda; module load cuda/xx.x`, possibly also `nvcc`
* the version of GCC that you're using matches the current NVCC capabilities
* the `TORCH_CUDA_ARCH_LIST` env variable is set to the architures that you want to support. A suggested setup (slow to build but comprehensive) is `export TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.2;7.5;8.0;8.6"`
* If the build from source OOMs, it's possible to reduce the parallelism of ninja with `MAX_JOBS` (eg `MAX_JOBS=2`)

We've tried to collect a relatively exhaustive list of explanations in the [HOWTO](HOWTO.md)

### License

Expand All @@ -288,11 +230,11 @@ xFormers has a BSD-style license, as found in the [LICENSE](LICENSE) file.
If you use xFormers in your publication, please cite it by using the following BibTeX entry.

``` bibtex
@Misc{xFormers2021,
author = {Benjamin Lefaudeux and Francisco Massa and Diana Liskovich and Wenhan Xiong and Vittorio Caggiano and Sean Naren and Min Xu and Jieru Hu and Marta Tintore and Susan Zhang},
@Misc{xFormers2022,
author = {Benjamin Lefaudeux and Francisco Massa and Diana Liskovich and Wenhan Xiong and Vittorio Caggiano and Sean Naren and Min Xu and Jieru Hu and Marta Tintore and Susan Zhang and Patrick Labatut and Daniel Haziza},
title = {xFormers: A modular and hackable Transformer modelling library},
howpublished = {\url{https://github.com/facebookresearch/xformers}},
year = {2021}
year = {2022}
}
```

Expand All @@ -308,3 +250,5 @@ The following repositories are used in xFormers, either in close to original for
* [Nystromformer](https://github.com/mlpen/Nystromformer)
* [FairScale](https://github.com/facebookresearch/fairscale/)
* [Pytorch Image Models](https://github.com/rwightman/pytorch-image-models)
* [CUTLASS](https://github.com/nvidia/cutlass)
* [Flash-Attention](https://github.com/HazyResearch/flash-attention)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/plots/fused_linear/FusedLinear_fp16_FW_BW_gelu.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/plots/fused_linear/FusedLinear_fp16_FW_BW_leaky_relu.png
Binary file modified docs/plots/fused_linear/FusedLinear_fp16_FW_BW_none.png
Binary file modified docs/plots/fused_linear/FusedLinear_fp16_FW_BW_relu.png
Binary file modified docs/plots/fused_linear/FusedLinear_fp16_FW_BW_smelu.png
Binary file modified docs/plots/fused_linear/FusedLinear_fp16_FW_BW_squared_relu.png
Binary file modified docs/plots/fused_linear/FusedLinear_fp16_FW_gelu.png
Binary file modified docs/plots/fused_linear/FusedLinear_fp16_FW_leaky_relu.png
Binary file modified docs/plots/fused_linear/FusedLinear_fp16_FW_none.png
Binary file modified docs/plots/fused_linear/FusedLinear_fp16_FW_relu.png
Binary file modified docs/plots/fused_linear/FusedLinear_fp16_FW_smelu.png
Binary file modified docs/plots/fused_linear/FusedLinear_fp16_FW_squared_relu.png
Binary file modified docs/plots/fused_linear/FusedLinear_fp32_FW_BW_gelu.png
Binary file modified docs/plots/fused_linear/FusedLinear_fp32_FW_BW_leaky_relu.png
Binary file modified docs/plots/fused_linear/FusedLinear_fp32_FW_BW_none.png
Binary file modified docs/plots/fused_linear/FusedLinear_fp32_FW_BW_relu.png
Binary file modified docs/plots/fused_linear/FusedLinear_fp32_FW_BW_squared_relu.png
Binary file modified docs/plots/fused_linear/FusedLinear_fp32_FW_gelu.png
Binary file modified docs/plots/fused_linear/FusedLinear_fp32_FW_leaky_relu.png
Binary file modified docs/plots/fused_linear/FusedLinear_fp32_FW_none.png
Binary file modified docs/plots/fused_linear/FusedLinear_fp32_FW_relu.png
Binary file modified docs/plots/fused_linear/FusedLinear_fp32_FW_squared_relu.png
Binary file modified docs/plots/fused_softmax/Softmax_Bandwidth_FW_BW_fp16.png
Binary file modified docs/plots/fused_softmax/Softmax_Bandwidth_FW_BW_fp32.png
Binary file modified docs/plots/fused_softmax/Softmax_Bandwidth_FW_fp16.png
Binary file modified docs/plots/fused_softmax/Softmax_Bandwidth_FW_fp32.png
Binary file modified docs/plots/layer_norm/LayerNorm_FW+BW_torch.float16.png
Binary file modified docs/plots/layer_norm/LayerNorm_FW+BW_torch.float32.png
Binary file modified docs/plots/layer_norm/LayerNorm_FW_torch.float16.png
Binary file modified docs/plots/layer_norm/LayerNorm_FW_torch.float32.png
Binary file removed docs/plots/mha/MHA_FW+bw_torch.float16.png
Diff not rendered.
Binary file removed docs/plots/mha/MHA_FW_torch.float16.png
Diff not rendered.
Binary file removed docs/plots/mha/MHA_FW_torch.float32.png
Diff not rendered.
Binary file added docs/plots/mha/mha_vit.png
4 changes: 1 addition & 3 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
recommonmark==0.5.0
docutils==0.17.1
sphinx==3.2.1
sphinx_rtd_theme==0.4.3
sphinxcontrib-programoutput==0.16
sphinx==5.0.0
git+https://github.com/pytorch/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme
torch>=1.6.0
numpy>=1.19.5
Expand Down
1 change: 1 addition & 0 deletions docs/source/components/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ API Reference
.. toctree::
:maxdepth: 2

ops
attentions
feedforward
position_embedding
Expand Down
8 changes: 8 additions & 0 deletions docs/source/components/ops.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Operators
======================

.. automodule:: xformers.ops
:members:
:show-inheritance:
:imported-members:
:member-order: bysource
3 changes: 1 addition & 2 deletions requirements-test.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,5 +27,4 @@ hydra-core >= 1.1
fairscale >= 0.4.5

# Dependency for fused layers, optional
triton == 2.0.0.dev20221105
networkx
triton==2.0.0.dev20221105
7 changes: 6 additions & 1 deletion tests/test_mem_eff_attention.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,7 @@ def create_tensors(
k,
kv,
*,
requires_grad=False,
attn_bias_type=None,
fmt: str = "BMK",
):
Expand Down Expand Up @@ -231,6 +232,7 @@ def create_tensors(
dispatch = xformers.ops.AttentionOpDispatch.from_arguments(
query=query, key=key, value=value, attn_bias=attn_bias
)
dispatch.requires_grad = requires_grad
if not op.supports(dispatch):
# Ensure we free memory to avoid OOMs
del query, key, value, attn_bias
Expand Down Expand Up @@ -498,7 +500,10 @@ def test_backward(
kv,
) = op_device_dtype_B_Mq_Mkv_H_K_Kv
query, key, value, attn_bias = create_tensors(
*op_device_dtype_B_Mq_Mkv_H_K_Kv, attn_bias_type=attn_bias_type, fmt=fmt
*op_device_dtype_B_Mq_Mkv_H_K_Kv,
requires_grad=True,
attn_bias_type=attn_bias_type,
fmt=fmt,
)
qkv = None

Expand Down
47 changes: 47 additions & 0 deletions tests/test_triton_basics.py
Original file line number Diff line number Diff line change
Expand Up @@ -131,3 +131,50 @@ def test_sum_strided_asserts():
with pytest.raises(AssertionError):
# This kernel expects 2D tensors, assert to prevent misuse
sum_2d_dim_0(a)

@triton.jit
def k_rand(X, Y, SEED_X, SEED_Y, stride_x, stride_y, N: tl.constexpr):
# fmt: on
"""
Check the random number generation
"""

row = tl.program_id(0)

# Generate random numbers with seed A
rand_offsets = tl.arange(0, N)
seed_x = tl.load(SEED_X + row)
randx, _, _, _ = tl.randint4x(seed_x, rand_offsets)

rand_offsets = tl.arange(0, N)
seed_y = tl.load(SEED_Y + row)
randy, _, _, _ = tl.randint4x(seed_y, rand_offsets)

# Move to this row
tl.store(X + row * stride_x + tl.arange(0, N), randx)
tl.store(Y + row * stride_y + tl.arange(0, N), randy)

def test_rand():
# Check that the random generator used in triton works fine
torch.random.manual_seed(0)
x = torch.zeros((512, 32), device=torch.device("cuda"), dtype=torch.int32)
y = torch.zeros((512, 32), device=torch.device("cuda"), dtype=torch.int32)

M, N = x.shape

seeds_x = torch.randint(65536, (M,), device=x.device)
seeds_y = torch.randint(65536, (M,), device=x.device)

assert not torch.allclose(seeds_x, seeds_y)

# enqueue kernels, one per line
# fmt: off
k_rand[(M,)](
x, y,
seeds_x, seeds_y,
x.stride(0), y.stride(0),
N,
)
# fmt: on

assert not torch.allclose(x, y)
Loading

0 comments on commit 9b8a17d

Please sign in to comment.