Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix: Replace P2ALIGN with P2ALIGN_TYPED and delete P2ALIGN. #15940

Merged
merged 1 commit into from
May 10, 2024

Conversation

chenqiuhao1997
Copy link
Contributor

@chenqiuhao1997 chenqiuhao1997 commented Feb 28, 2024

Motivation and Context

Recently, our zfs-2.2.3-1 on 5.4.265-1.el8.elrepo.x86_64 always hung on write requests without hardware problems. Restarting server made no effect.
dmesg gave logs below:

[Mon Feb 26 13:18:56 2024] INFO: task txg_quiesce:44850 blocked for more than 122 seconds.
[Mon Feb 26 13:18:56 2024]       Tainted: P           OE     5.4.265-1.el8.elrepo.x86_64 #1
[Mon Feb 26 13:18:56 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Mon Feb 26 13:18:56 2024] txg_quiesce     D    0 44850      2 0x80004080
[Mon Feb 26 13:18:56 2024] Call Trace:
[Mon Feb 26 13:18:56 2024]  __schedule+0x284/0x6d0
[Mon Feb 26 13:18:56 2024]  schedule+0x2f/0xa0
[Mon Feb 26 13:18:56 2024]  cv_wait_common+0xf8/0x130 [spl]
[Mon Feb 26 13:18:56 2024]  ? wait_woken+0x80/0x80
[Mon Feb 26 13:18:56 2024]  txg_quiesce+0x1c1/0x250 [zfs]
[Mon Feb 26 13:18:56 2024]  txg_quiesce_thread+0xe7/0x170 [zfs]
[Mon Feb 26 13:18:56 2024]  ? txg_sync_thread+0x3e0/0x3e0 [zfs]
[Mon Feb 26 13:18:56 2024]  ? spl_taskq_fini+0x70/0x70 [spl]
[Mon Feb 26 13:18:56 2024]  thread_generic_wrapper+0x6f/0x80 [spl]
[Mon Feb 26 13:18:56 2024]  kthread+0x10c/0x130
[Mon Feb 26 13:18:56 2024]  ? kthread_park+0x80/0x80
[Mon Feb 26 13:18:56 2024]  ret_from_fork+0x1f/0x40

After adding debug messages in codes, build and reinstall, we found our write requests hung in the for loop in dmu_object_alloc_impl. With more logs below and analyzing, we finally found this bug in P2ALIGN.

1709025269   dmu_object.c:102:dmu_object_alloc_impl(): txg 556663, dmu_tx 00000000be568d82, object 32640, dnodes_per_chunk 128, dn_slots 2, cond true
1709025269   dmu_object.c:178:dmu_object_alloc_impl(): txg 556663, dmu_tx 00000000be568d82, object 32640, dn_slots 2
1709025269   dmu_object.c:189:dmu_object_alloc_impl(): txg 556663, dmu_tx 00000000be568d82, object 32640, dnode_hold_impl error 28
1709025269   dmu_object.c:102:dmu_object_alloc_impl(): txg 556663, dmu_tx 00000000be568d82, object 32768, dnodes_per_chunk 128, dn_slots 2, cond true
1709025269   dmu_object.c:157:dmu_object_alloc_impl(): txg 556663, dmu_tx 00000000be568d82, object 32768, dnode_next_offset error 0
1709025269   dmu_object.c:178:dmu_object_alloc_impl(): txg 556663, dmu_tx 00000000be568d82, object 4294969024, dn_slots 2
1709025269   dmu_object.c:189:dmu_object_alloc_impl(): txg 556663, dmu_tx 00000000be568d82, object 4294969024, dnode_hold_impl error 28
1709025269   dmu_object.c:102:dmu_object_alloc_impl(): txg 556663, dmu_tx 00000000be568d82, object 4294969056, dnodes_per_chunk 128, dn_slots 2, cond false
1709025269   dmu_object.c:178:dmu_object_alloc_impl(): txg 556663, dmu_tx 00000000be568d82, object 4294969056, dn_slots 2
1709025269   dmu_object.c:189:dmu_object_alloc_impl(): txg 556663, dmu_tx 00000000be568d82, object 4294969056, dnode_hold_impl error 28
1709025269   dmu_object.c:102:dmu_object_alloc_impl(): txg 556663, dmu_tx 00000000be568d82, object 4294969088, dnodes_per_chunk 128, dn_slots 2, cond true
1709025269   dmu_object.c:178:dmu_object_alloc_impl(): txg 556663, dmu_tx 00000000be568d82, object 1792, dn_slots 2
1709025269   dmu_object.c:189:dmu_object_alloc_impl(): txg 556663, dmu_tx 00000000be568d82, object 1792, dnode_hold_impl error 28
1709025269   dmu_object.c:102:dmu_object_alloc_impl(): txg 556663, dmu_tx 00000000be568d82, object 1920, dnodes_per_chunk 128, dn_slots 2, cond true
1709025269   dmu_object.c:178:dmu_object_alloc_impl(): txg 556663, dmu_tx 00000000be568d82, object 1920, dn_slots 2
1709025269   dmu_object.c:189:dmu_object_alloc_impl(): txg 556663, dmu_tx 00000000be568d82, object 1920, dnode_hold_impl error 28
			/*
			 * Note: if "restarted", we may find a L0 that
			 * is not suitably aligned.
			 */
			os->os_obj_next_chunk =
			    P2ALIGN(object, dnodes_per_chunk) +
			    dnodes_per_chunk;
			(void) atomic_swap_64(cpuobj, object);
			mutex_exit(&os->os_obj_lock);
#define        P2ALIGN(x, align)               ((x) & -(align))

In P2ALIGN, the result would be incorrect when align is unsigned integer and x is larger than max value of the type of align. In that case, -(align) would be a positive integer, which means high bits would be zero and finally stay zero after & when align is converted to a larger integer type.

Description

Replace P2ALIGN with P2ALIGN_TYPED and delete P2ALIGN.

How Has This Been Tested?

After fixed with this commit, our zfs has never hung again.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@chenqiuhao1997 chenqiuhao1997 changed the title Type conversion for align type promotion in P2ALIGN Bug fix: type conversion for align type promotion in P2ALIGN Feb 28, 2024
Copy link
Contributor

@youzhongyang youzhongyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice! I tested using a simple C program:

root@ubuntu-dev:~# cat test.c
#include <stdio.h>
#include <sys/types.h>
#include <stdint.h>

typedef unsigned int            uint_t;

#define P2ALIGN(x, align)       ((x) & -(align))
#define P2ALIGNNEW(x, align)    ((x) & -((sizeof(x)>sizeof(align))?(typeof(x))(align):(align)))

int main()
{
   uint64_t i64 = 0xabcdef00123456ff;
   uint_t align = 128;

   uint64_t aligned_value = P2ALIGN(i64, align);
   uint64_t aligned_value_new = P2ALIGNNEW(i64, align);

   printf("i64 = 0x%016lx\n", i64);
   printf("P2ALIGN(0x%016lx, %u) = 0x%016lx\n", i64, align, aligned_value);
   printf("P2ALIGNNEW(0x%016lx, %u) = 0x%016lx\n", i64, align, aligned_value_new);

   return(0);
}


root@ubuntu-dev:~# ./test
i64 = 0xabcdef00123456ff
P2ALIGN(0xabcdef00123456ff, 128) = 0x0000000012345680
P2ALIGNNEW(0xabcdef00123456ff, 128) = 0xabcdef0012345680

@chenqiuhao1997
Copy link
Contributor Author

I have updated my code to meet the OpenZFS code style requirements.
Is there anything else I need to do before merging the PR?

@chenqiuhao1997
Copy link
Contributor Author

Do you have any more suggestions? @youzhongyang

We feel that this is a serious bug. It makes zfs inevitably hang on write requests without hardware problems when there are lots of files. In our case, the number of files per zfs exceeds 100 million. 3 of our 6 zfs hung in sequence, which made our 4+2 EC cluster down. We also tried remaking zpool and zfs, but the result remained the same.

@amotin
Copy link
Member

amotin commented Mar 8, 2024

Just for a note, there is already a bunch of *_TYPED() macros, like P2ALIGN_TYPED().

@chenqiuhao1997
Copy link
Contributor Author

@amotin Thanks for your reply.
We have found the macro P2ALIGN_TYPED before, but we have no idea of which option below is better.

  1. Replace P2ALIGN with P2ALIGN_TYPED where necessary.
  2. Replace P2ALIGN with P2ALIGN_TYPED everywhere and delete P2ALIGN .
  3. Convert type for align in P2ALIGN when necessary.

The 1st option could accurately fix the bug, but it could not prevent future mistakes from occurring. The 2nd option could require developers to treat the type of P2ALIGN_TYPED carefully to avoid this mistake, but its impact is a bit big. The 3rd option could avoid this mistake through code, but the code is a bit dirty.

Because options 1 and 2 require us to read and understand the code in different places, which could cause omissions or mistakes, we chose option 3 as our temporary patch. But in this PR, depending on your opinions and based on your review, options 1 and 2 can also be chosen.

So, what do you think about these three options?

@youzhongyang
Copy link
Contributor

@amotin Thanks for your reply. We have found the macro P2ALIGN_TYPED before, but we have no idea of which option below is better.

  1. Replace P2ALIGN with P2ALIGN_TYPED where necessary.
  2. Replace P2ALIGN with P2ALIGN_TYPED everywhere and delete P2ALIGN .
  3. Convert type for align in P2ALIGN when necessary.

The 1st option could accurately fix the bug, but it could not prevent future mistakes from occurring. The 2nd option could require developers to treat the type of P2ALIGN_TYPED carefully to avoid this mistake, but its impact is a bit big. The 3rd option could avoid this mistake through code, but the code is a bit dirty.

Because options 1 and 2 require us to read and understand the code in different places, which could cause omissions or mistakes, we chose option 3 as our temporary patch. But in this PR, depending on your opinions and based on your review, options 1 and 2 can also be chosen.

So, what do you think about these three options?

I would suggest go with option #2, which requires you to review all instances of P2ALIGN and make changes accordingly. It's a bit of work, but it will make sure the same type of mistake won't be repeated in the future.

@chenqiuhao1997 chenqiuhao1997 changed the title Bug fix: type conversion for align type promotion in P2ALIGN Bug fix: Replace P2ALIGN with P2ALIGN_TYPED and delete P2ALIGN. Mar 14, 2024
@chenqiuhao1997
Copy link
Contributor Author

@youzhongyang Thanks for your suggestion.
I have finished option 2 and updated the title and description of this PR. Please review it when you have time. Thanks a lot.

@youzhongyang
Copy link
Contributor

It looks good to me.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Apr 4, 2024
@behlendorf
Copy link
Contributor

@chenqiuhao1997 could you rebase this and force update the PR. That will let us a get a clean CI run so this can be merged.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels May 9, 2024
In P2ALIGN, the result would be incorrect when align is unsigned
integer and x is larger than max value of the type of align.
In that case, -(align) would be a positive integer, which means
high bits would be zero and finally stay zero after '&' when
align is converted to a larger integer type.

Signed-off-by: Qiuhao Chen <[email protected]>
@chenqiuhao1997
Copy link
Contributor Author

@chenqiuhao1997 could you rebase this and force update the PR. That will let us a get a clean CI run so this can be merged.

@behlendorf the rebased commit 2d11c79 has been force pushed. Thanks for your follow up and view.

@behlendorf behlendorf merged commit 41ae864 into openzfs:master May 10, 2024
24 checks passed
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request May 10, 2024
In P2ALIGN, the result would be incorrect when align is unsigned
integer and x is larger than max value of the type of align.
In that case, -(align) would be a positive integer, which means
high bits would be zero and finally stay zero after '&' when
align is converted to a larger integer type.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Youzhong Yang <[email protected]>
Signed-off-by: Qiuhao Chen <[email protected]>
Closes openzfs#15940
behlendorf pushed a commit to behlendorf/zfs that referenced this pull request May 10, 2024
In P2ALIGN, the result would be incorrect when align is unsigned
integer and x is larger than max value of the type of align.
In that case, -(align) would be a positive integer, which means
high bits would be zero and finally stay zero after '&' when
align is converted to a larger integer type.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Youzhong Yang <[email protected]>
Signed-off-by: Qiuhao Chen <[email protected]>
Closes openzfs#15940
@chenqiuhao1997 chenqiuhao1997 deleted the debug-for-P2ALIGN branch May 11, 2024 04:38
behlendorf pushed a commit that referenced this pull request May 13, 2024
In P2ALIGN, the result would be incorrect when align is unsigned
integer and x is larger than max value of the type of align.
In that case, -(align) would be a positive integer, which means
high bits would be zero and finally stay zero after '&' when
align is converted to a larger integer type.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Youzhong Yang <[email protected]>
Signed-off-by: Qiuhao Chen <[email protected]>
Closes #15940
defaziogiancarlo pushed a commit to LLNL/zfs that referenced this pull request May 22, 2024
In P2ALIGN, the result would be incorrect when align is unsigned
integer and x is larger than max value of the type of align.
In that case, -(align) would be a positive integer, which means
high bits would be zero and finally stay zero after '&' when
align is converted to a larger integer type.

Cherry-picked-from: openzfs#16186

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Youzhong Yang <[email protected]>
Signed-off-by: Qiuhao Chen <[email protected]>
Closes openzfs#15940
Signed-off-by: Gian-Carlo DeFazio <[email protected]>
@amotin
Copy link
Member

amotin commented May 22, 2024

One of our customers just hit this problem with billions of files in a wild. It would be good to merge this patch (may be without removal of the macros for compatibility) into 2.2.5 release.

@behlendorf
Copy link
Contributor

It's already queued up in the staging branch for 2.2.5, #16186. That version does drop the macros, is there some other known consumer of the macros?

@amotin
Copy link
Member

amotin commented May 25, 2024

It's already queued up in the staging branch for 2.2.5, #16186.

Thanks. I've already noticed that.

That version does drop the macros, is there some other known consumer of the macros?

No. I'd just prefer to not break APIs in stable branches without need, but whatever.

lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
In P2ALIGN, the result would be incorrect when align is unsigned
integer and x is larger than max value of the type of align.
In that case, -(align) would be a positive integer, which means
high bits would be zero and finally stay zero after '&' when
align is converted to a larger integer type.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Youzhong Yang <[email protected]>
Signed-off-by: Qiuhao Chen <[email protected]>
Closes openzfs#15940
0mp pushed a commit to 0mp/zfs that referenced this pull request Dec 3, 2024
In P2ALIGN, the result would be incorrect when align is unsigned
integer and x is larger than max value of the type of align.
In that case, -(align) would be a positive integer, which means
high bits would be zero and finally stay zero after '&' when
align is converted to a larger integer type.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Youzhong Yang <[email protected]>
Signed-off-by: Qiuhao Chen <[email protected]>
Closes openzfs#15940
(cherry picked from commit 9edf6af)
0mp pushed a commit to 0mp/zfs that referenced this pull request Dec 3, 2024
In P2ALIGN, the result would be incorrect when align is unsigned
integer and x is larger than max value of the type of align.
In that case, -(align) would be a positive integer, which means
high bits would be zero and finally stay zero after '&' when
align is converted to a larger integer type.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Youzhong Yang <[email protected]>
Signed-off-by: Qiuhao Chen <[email protected]>
Closes openzfs#15940
(cherry picked from commit 9edf6af)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants