Gradient from taichi ndarray does not match torch for euclidean distance (latest nightly) #8101

oliver-batchelor · 2023-05-29T13:53:11Z

Hi,

An example below trying to use a taichi kernel on torch tensors with autograd on the most recent nightly (taichi-nightly-1.7.0.post20230529). Latest torch from pip (2.01), python 3.10.

Outputs (distances) is the same in both cases.

Gradients from torch (x.grad):

tensor([[-0.5774, -0.5774, -0.5774],
        [-0.5774, -0.5774, -0.5774]])

Gradients from taichi (x.grad):

tensor([[-1.1547, -1.1547, -1.1547],
        [-1.1547, -1.1547, -1.1547]])

A couple of other problems:

Running this with CPU tensors (and arch=ti.cpu) causes a memory allocation crash.
There does not seem to be an error if I do not call contiguous - i.e. grad_output.contiguous() - the result just seems (more) different

I am only moderately confident I'm using this the right way so please excuse me if I'm abusing the system somehow!

import taichi as ti
import torch
from taichi.types import ndarray
device=torch.device('cuda', 0)

@ti.kernel
def pairwise_distances_kernel(x:ndarray(ti.math.vec3), y:ndarray(ti.math.vec3), distances:ndarray(ti.f32)):
  for i in range(x.shape[0]):
    distances[i] = (x[i] - y[i]).norm()
  
class PointDistance(torch.autograd.Function):
  @staticmethod
  def forward(ctx, x:torch.Tensor, y:torch.Tensor):
      distances = torch.zeros((x.shape[0],), device=x.device, dtype=torch.float32)
      pairwise_distances_kernel(x, y, distances)
      ctx.save_for_backward(x, y, distances)
      return distances

  @staticmethod
  def backward(ctx, grad_output):
      x, y, distances = ctx.saved_tensors
      distances.grad = grad_output.contiguous()      
      pairwise_distances_kernel.grad(x, y, distances)
      return x.grad, y.grad

def inputs():
  x = torch.arange(0, 6).reshape(2, 3).float().requires_grad_(True)
  y = torch.arange(6, 12).reshape(2, 3).float().requires_grad_(True)  
  return x, y

def test_torch():
  x, y = inputs() 
  distances = torch.norm(x - y, p=2, dim=1)
  loss = distances.sum()
  loss.backward()
  print(x.grad, y.grad)

def test_taichi():
  x, y = inputs()
  distances = PointDistance.apply(x, y)
  loss = distances.sum()
  loss.backward()
  print(x.grad, y.grad)

if __name__ == "__main__":
  ti.init(arch=ti.cuda, debug=True, log_level=ti.DEBUG)
  test_torch()
  test_taichi()

The text was updated successfully, but these errors were encountered:

ailzhang · 2023-06-02T08:27:15Z

@oliver-batchelor The reason that taichi returns doubled result is from


  @staticmethod
  def backward(ctx, grad_output):
      x, y, distances = ctx.saved_tensors
      distances.grad = grad_output.contiguous()      
      pairwise_distances_kernel.grad(x, y, distances)
      return x.grad, y.grad

After running pairwise_distances_kernel.grad() kernel the gradient is already accumulated on x.grad and y.grad. Say if you print them here you'll see their results are actually correct.

But when you return x.grad and y.grad to torch it'll accumulate them to inputs' gradient again. That's why you saw the results are doubled.

The root cause here is that torch expected the backward function to have no side-effect to gradients (but only return them), however running taichi kernels does have side effect.

So there're two ways to work around it, remove taichi side effect or just return zero grad to torch.
E.g.

  # actually this may work for this example, manipulating gradient is dangerous so the solution below is recommended.
  @staticmethod
  def backward(ctx, grad_output):
      x, y, distances = ctx.saved_tensors
      distances.grad = grad_output.contiguous()
      x.grad.fill_(0)
      y.grad.fill_(0)
      pairwise_distances_kernel.grad(x, y, distances)
      x_grad, y_grad = x.grad.clone(), y.grad.clone()
      x.grad.fill_(0)
      y.grad.fill_(0)
      print(x_grad, y_grad)
      return x_grad, y_grad

or

  # recommended
  @staticmethod
  def backward(ctx, grad_output):
      x, y, distances = ctx.saved_tensors
      distances.grad = grad_output.contiguous()      
      pairwise_distances_kernel.grad(x, y, distances)
      return torch.zeros_like(x), torch.zeros_like(y)

oliver-batchelor · 2023-06-07T12:22:10Z

Ah ha, thanks!

It did seem a little weird to me returning the .grad, because normally the .grad attribute is None at that point.

I must admit I just followed the recipe from here (which I assume has also double gradients) - but this wouldn't necessarily cause big problems, just a doubled learning rate... depending on what you used it for.

https://github.com/taichi-dev/taichi-nerfs/blob/main/modules/hash_encoder.py

bobcao3 · 2023-06-14T21:26:29Z

We might want to fix the taichi-nerfs one to ensure hyper parameter compatibility hmmm. But I will mark this one resolved

oliver-batchelor · 2023-08-31T07:47:18Z

From further testing both these methods seem flawed - I'll create some test cases but in essence:

the direct previous node(s) in the graph seem to end up with zero grad always in this case.
direct previous nodes have correct grad but the grad further up the graph is all zero

Seems it's not safe to mess with the .grad nor is it safe to return zeros!!

oliver-batchelor · 2023-08-31T07:58:00Z

On second glance I think method 1 seems to do the right thing - because normally the direct predecessor node won't have grad anyway unless retain_grad is set to True - and it matches the pytorch behaviour here. The only difference in this case is that it will have zeros as grad rather than "None" which is what it is under pytorch autograd.

oliver-batchelor · 2023-08-31T07:59:22Z

Is it possible to create another attribute on the tensor object like taichi_grad or some such that means we don't have to mess with the torch .grad thus avoiding all these problems?

taichi-gardener added this to Taichi Lang May 29, 2023

github-project-automation bot moved this to Untriaged in Taichi Lang May 29, 2023

oliver-batchelor changed the title ~~Gradient from taichi ndarray does not match torch for euclidean distance~~ Gradient from taichi ndarray does not match torch for euclidean distance (latest nightly) May 29, 2023

jim19930609 assigned ailzhang Jun 2, 2023

jim19930609 moved this from Untriaged to Todo in Taichi Lang Jun 2, 2023

bobcao3 closed this as completed Jun 14, 2023

github-project-automation bot moved this from Todo to Done in Taichi Lang Jun 14, 2023

bobcao3 mentioned this issue Jun 14, 2023

Gradient might be doubled causing doubled learning rate taichi-dev/taichi-nerfs#75

Open

oliver-batchelor mentioned this issue Aug 31, 2023

(No) safe way to wrap taichi kernels in pytorch #8339

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient from taichi ndarray does not match torch for euclidean distance (latest nightly) #8101

Gradient from taichi ndarray does not match torch for euclidean distance (latest nightly) #8101

oliver-batchelor commented May 29, 2023 •

edited

Loading

ailzhang commented Jun 2, 2023 •

edited

Loading

oliver-batchelor commented Jun 7, 2023 •

edited

Loading

bobcao3 commented Jun 14, 2023

oliver-batchelor commented Aug 31, 2023

oliver-batchelor commented Aug 31, 2023

oliver-batchelor commented Aug 31, 2023

Gradient from taichi ndarray does not match torch for euclidean distance (latest nightly) #8101

Gradient from taichi ndarray does not match torch for euclidean distance (latest nightly) #8101

Comments

oliver-batchelor commented May 29, 2023 • edited Loading

ailzhang commented Jun 2, 2023 • edited Loading

oliver-batchelor commented Jun 7, 2023 • edited Loading

bobcao3 commented Jun 14, 2023

oliver-batchelor commented Aug 31, 2023

oliver-batchelor commented Aug 31, 2023

oliver-batchelor commented Aug 31, 2023

oliver-batchelor commented May 29, 2023 •

edited

Loading

ailzhang commented Jun 2, 2023 •

edited

Loading

oliver-batchelor commented Jun 7, 2023 •

edited

Loading