Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mono]: Potential deadlock during EventPipe rundown using interpreter. #58996

Merged

Conversation

lateralusX
Copy link
Member

Identified potential deadlock between finalizer thread and rundown enumerating all interpreter method. Since rundown will query for method name in callback when iterating interpreter methods, interp_jit_info_foreach, it might lead to additional loader activity. The hash map in interpreter keeping the methods is locked with default JIT memory manager, but since the callback might end up in mono_class_create_from_typedef that will take loader lock, we get the following lock order on that code path, memory manager->loader lock. Finalizer thread invokes OnThreadExiting using interpreter and that might end up with a reverse lock order, loader lock->memory manager on that code path, so these two have a potential to deadlock. This is not a problem under JIT or AOT since the JIT hash table is lock free therefore not causing any deadlocks due to lock order between memory manager and loader lock.

Issue hit on CI by at least:

#58781
#58599

Could be fixed by changing into a lock free hash table in interpreters for interp_code_hash but might be to risky at this point. A more safe fix is to take a copy of the pointers while holding lock and then iterate using local copy (simple array of pointers). Since this method is only called during rundown, only when using interpreter, and only include the pointers (InterpMethod *) we use with the callback, it will have some temporary memory impact (allocating an array of pointers), but will mitigate the deadlock since we can safely call iterator callback without holding the lock. It will also improve interpreter performance in situations where we run session rundown, since lock will be held a much shorter amount of time.

Identified deadlock between finalizer thread and rundown enumerating
all interpreter method. Since rundown will query for method name
in callback when iterating interpreter methods, interp_jit_info_foreach,
that might lead to additional loader activity. The hash map in
interpreter keeping the methods is locked with default JIT memory manager,
but since the callback might end up in mono_class_create_from_typedef
that will take loader lock, we get the following lock order on that
code path, memory manager->loader lock. Finalizer thread invokes
OnThreadExiting using interpreter and that might end up with a reverse
lock order, loader lock->memory manager on that code path, so these
two have a potential to deadlock. This is not a problem under JIT
or AOT since the JIT hash table is lock free therefore not causing
any deadlocks due to lock order between memory manager and loader lock.

Could be fixed by changing into a lock free hash table in interpreters
for interp_code_hash might be to risky at this point. A more safe fix
is to take a copy of the pointers while holding lock and then iterate
using local copy (simple array of pointers). Since this method is
only called during rundown, only when using interpreter, and only
include the pointers (InterpMethod *) we use with the callback,
it will have some temporary memory impact (allocating an array
of pointers), but will mitigate the deadlock since we can safely
call iterator callback without holding the lock. It will also
improve interpreter performance in situations where we run session
rundown, since lock will be held a much shorter amount of time.
@ghost
Copy link

ghost commented Sep 12, 2021

Tagging subscribers to this area: @BrzVlad
See info in area-owners.md if you want to be subscribed.

Issue Details

Identified potential deadlock between finalizer thread and rundown enumerating all interpreter method. Since rundown will query for method name in callback when iterating interpreter methods, interp_jit_info_foreach, it might lead to additional loader activity. The hash map in interpreter keeping the methods is locked with default JIT memory manager, but since the callback might end up in mono_class_create_from_typedef that will take loader lock, we get the following lock order on that code path, memory manager->loader lock. Finalizer thread invokes OnThreadExiting using interpreter and that might end up with a reverse lock order, loader lock->memory manager on that code path, so these two have a potential to deadlock. This is not a problem under JIT or AOT since the JIT hash table is lock free therefore not causing any deadlocks due to lock order between memory manager and loader lock.

Issue hit on CI by at least:

#58781
#58599

Could be fixed by changing into a lock free hash table in interpreters for interp_code_hash but might be to risky at this point. A more safe fix is to take a copy of the pointers while holding lock and then iterate using local copy (simple array of pointers). Since this method is only called during rundown, only when using interpreter, and only include the pointers (InterpMethod *) we use with the callback, it will have some temporary memory impact (allocating an array of pointers), but will mitigate the deadlock since we can safely call iterator callback without holding the lock. It will also improve interpreter performance in situations where we run session rundown, since lock will be held a much shorter amount of time.

Author: lateralusX
Assignees: -
Labels:

area-Codegen-Interpreter-mono

Milestone: -

@BrzVlad
Copy link
Member

BrzVlad commented Sep 12, 2021

@lateralusX Is it possible that this also fixes #56449 ?

@lateralusX
Copy link
Member Author

lateralusX commented Sep 12, 2021

@lateralusX Is it possible that this also fixes #56449 ?

@BrzVlad Test must start/stop EventPipe sessions with rundown enabled to potential hit this deadlock, normally there are only specific EventPipe tests that does that on CI, so unless this test runs EventPipe session, it won't trigger these code paths.

@lateralusX
Copy link
Member Author

lateralusX commented Sep 12, 2021

/backport to release/6.0

1 similar comment
@marek-safar
Copy link
Contributor

/backport to release/6.0

@github-actions
Copy link
Contributor

Started backporting to release/6.0: https://github.com/dotnet/runtime/actions/runs/1228737819

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants