Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad file descriptor when using NVIDIA Messaging Accelerator (VMA) #6923

Open
boranby opened this issue Oct 21, 2024 · 7 comments
Open

Bad file descriptor when using NVIDIA Messaging Accelerator (VMA) #6923

boranby opened this issue Oct 21, 2024 · 7 comments
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug.

Comments

@boranby
Copy link

boranby commented Oct 21, 2024

Version
tokio v1.40.0
│ └── tokio-macros v2.4.0 (proc-macro)

Platform
The output of uname -a (UNIX), or version and 32 or 64-bit (Windows)
Linux server2 5.14.0-427.40.1.el9_4.x86_64+rt #1 SMP PREEMPT_RT Fri Oct 4 15:00:44 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux

Description
While building a single thread tokio runtime, I get a Bad file descriptor error.

The crashing line:
let rt = tokio::runtime::Builder::new_current_thread().enable_all().build().unwrap();
The error:
called Result::unwrap() on an Err value: Os { code: 9, kind: Uncategorized, message: "Bad file descriptor" }

The structure of the code is like:

std::thread::spawn(move || {
    core_affinity::set_for_current(core_affinity::CoreId { id: 7 });
    publisher::run_publisher();
});
pub fn run_publisher() {
    // build a single thread tokio runtime
    let rt = tokio::runtime::Builder::new_current_thread().enable_all().build().unwrap();

    rt.block_on(async move {...});
...
}

I expected to see runtime will be built and run the async part of the code.
Note: When I run a test using #[tokio::test], the code runs without any issues.

Instead, this happened: called Result::unwrap() on an Err value: Os { code: 9, kind: Uncategorized, message: "Bad file descriptor" }

@boranby boranby added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Oct 21, 2024
@Darksonn
Copy link
Contributor

Are you using forking?

@boranby
Copy link
Author

boranby commented Oct 21, 2024

I am not using forking.
I have 5 sync threads pinned to the cores and need only 1 async thread for Nanomsg. I want to use tokio for the async runtime.

@boranby
Copy link
Author

boranby commented Oct 21, 2024

Because of that I don't want to build the tokio runtime for the whole project.

@Darksonn
Copy link
Contributor

Using Tokio on a single thread is perfectly fine. Unfortunately, it's unclear what the issue is. Could you share a minimal reproducible example, or any other information about what you are doing that is unusual?

@boranby
Copy link
Author

boranby commented Oct 21, 2024

use tokio::runtime::Builder;

fn main() {
    core_affinity::set_for_current(core_affinity::CoreId { id: 3 });

    std::thread::spawn(|| {
        core_affinity::set_for_current(core_affinity::CoreId { id: 4 });
        run_publisher();
    });

    loop {}
}

fn run_publisher() {
    let rt = Builder::new_current_thread().enable_all().build().unwrap();

    rt.block_on(async move {
        println!("Hello, world!");
    });
}

As you said, when I run this it runs without any issue.
However, in the production code, we use https://github.com/Mellanox/libvma .
If you also run the code above with LD_PRELOAD=libvma.so, runtime builder crashes called Result::unwrap() on an Err value: Os { code: 9, kind: Uncategorized, message: "Bad file descriptor" }

@Darksonn Darksonn changed the title "Bad file descriptor" on tokio::runtime::Builder::new_current_thread().enable_all().build() Bad file descriptor when using NVIDIA Messaging Accelerator (VMA) Oct 21, 2024
@Darksonn
Copy link
Contributor

It sounds like libvma does something weird to the process. This isn't something I'm able to debug myself. Can you figure out which operation within Tokio is emitting the bad fd error? Does disabling the process/signal features of Tokio fix the error?

@boranby
Copy link
Author

boranby commented Oct 21, 2024

For the code snippet above, changing the dependency features as tokio = {version = "1.40.0", features = ["rt"]} fixed the issue. It was tokio = {version = "1.40.0", features = ["full"]} before. I even added "macros", "time" features successfully.

Thank you very much for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug.
Projects
None yet
Development

No branches or pull requests

2 participants