From 9a343a0ad291e95417038d9c99e07d8cc1d945a7 Mon Sep 17 00:00:00 2001
From: Rachel Guo <guorachel@meta.com>
Date: Wed, 5 Feb 2025 18:20:41 -0800
Subject: [PATCH] loose unit test `atol` `rtol` tolerance to eliminate ut
 flakiness (#3664)

Summary:
Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/3664

X-link: https://github.com/facebookresearch/FBGEMM/pull/739

upon further local testings, detected around 1 out of 10 iterations, the fast_gemv unit test can fail due to a few (0.1%~ 0.2%) noisy outputs. Loose the tolerance level to reduce flakiness.

as an example failed test e.g.

```
Mismatched elements: 1 / 1280 (0.1%)
Greatest absolute difference: 0.001953125 at index (0, 1064) (up to 0.001 allowed)
Greatest relative difference: 0.007415771484375 at index (0, 1064) (up to 0.001 allowed)
```

Reviewed By: q10

Differential Revision: D69208318

fbshipit-source-id: b2f820b4e7ab5f5e8fcfc59e398f9b3328a963d6
---
 fbgemm_gpu/experimental/gen_ai/test/quantize/quantize_test.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fbgemm_gpu/experimental/gen_ai/test/quantize/quantize_test.py b/fbgemm_gpu/experimental/gen_ai/test/quantize/quantize_test.py
index 04b337971..d5d8b378c 100644
--- a/fbgemm_gpu/experimental/gen_ai/test/quantize/quantize_test.py
+++ b/fbgemm_gpu/experimental/gen_ai/test/quantize/quantize_test.py
@@ -1131,7 +1131,7 @@ def test_bf16_gemv(self) -> None:
             z = torch.ops.fbgemm.bf16_fast_gemv(x, w)
             z_ref = (x @ w.T).to(torch.bfloat16).to("cuda")
 
-            torch.testing.assert_close(z, z_ref, atol=1.0e-3, rtol=1.0e-3)
+            torch.testing.assert_close(z, z_ref, atol=9.0e-3, rtol=9.0e-3)
 
     @unittest.skipIf(
         torch.version.hip, "Skip on AMD: cuda quantize op is yet supported."