Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't create more than 64 client connections #449

Closed
AlfonsoSanz opened this issue Jul 28, 2023 · 10 comments
Closed

Can't create more than 64 client connections #449

AlfonsoSanz opened this issue Jul 28, 2023 · 10 comments

Comments

@AlfonsoSanz
Copy link

AlfonsoSanz commented Jul 28, 2023

Clients after 64 connections return an MQTT error:
After creating 100 clients, if I try to connect them one by one to the broker, clients from 64 to 99 will give a Connect error [-1] on client.connect()->wait().

mqtt::connect_options connect_options = mqtt::connect_options();
std::vector<mqtt::async_client*> clients;
for (int i = 0; i < 100; i++) {
    clients.push_back(new mqtt::async_client("localhost", "cpp-mqtt-" + std::to_string(i)));
    clients[i]->start_consuming();
    try {
        clients[i]->connect(connect_options)->wait();
    } catch (const mqtt::exception& exc) {
        std::cerr << "\n  " << exc << std::endl;
        return 1;
    }
    clients[i]->subscribe("test-" + std::to_string(i), 0)->wait();
    std::cout << "Connected " + std::to_string(i) << std::endl;
}

I can not find any option or solution to increase the amount of possible connections.

@fpagliughi
Copy link
Contributor

OK... I have to ask... What use case are you looking at where you need to be simultaneously connected to more than 64 different MQTT brokers?!? :-)

Honestly curious.

The thinking when these libraries were created was that you would probably need one connection for a typical device; two for a bridge application; maybe a few more for something esoteric. So the thought was never for a large number of connections.

But actually, this was question was discussed recently, in another GitHub issue, but I can't find it at the moment. It was either in this library, the Paho C lib, or the Rust one. This library and Rust both wrap the C lib. That's where the limit is likely getting hit, in the C lib. Or in the OS.

What platform are you using?

@AlfonsoSanz
Copy link
Author

AlfonsoSanz commented Aug 2, 2023

I am not connecting to more than 64 brokers, I am connecting more than 64 different clients to the same broker.

The application is kind of complex, in summary it is for monitorization and control of simulated IoT devices and real physical models. For example, each simulated device needs to have a client_id and connection to the broker as a real device has, and also the interfaces that connect real/simulated devices require a connection to receive the real data and send instructions.
Basically, I have a separated thread for each device and all of them have an async_client. But apparently, I cannot connect them all. I may need to do a single client and use callbacks to each thread to redirect the messages, but it will require a rework and it is not a “digital twin” of a real device connection.

I am using Windows, I could try on Linux, but I doubt there would be any difference, also the Python MQTT Paho library was able to connect 100 clients without problems, which I do not know if it also uses the C library.

Any solution or help would be appreciated.

@fpagliughi
Copy link
Contributor

Hmmm.... actually it may be a Windows issue after all. But first:

I found the discussions about this. I could have sworn they happened this past spring, but it was actually a year and a half ago!

First, someone mentioned that the Paho Rust client was crashing when trying to create more than 1,000 clients:
eclipse-paho/paho.mqtt.rust#143

I was able to verify this on Linux, but only after increasing my system/OS default limit of 1024 sockets per process.

This led back to a related Issue in the Paho C lib:
eclipse-paho/paho.mqtt.c#1033

It turned out that C library was using the select() statement for ease of portability across platforms. It has a hard limit of 1024 sockets (at least on Linux), and there was a bug in the C lib near this limit.

Version 1.3.10 of Paho C changed to use poll() by default, and the bug was fixed so it shouldn't crash, but no one working on the libraries bothered testing any hard limits on the various platforms to see what the max limits might be. Again this is not a use case for "normal" client apps.

So, step #1 - Make sure you're using a recent version of the Paho C library. I recommend the latest v1.3.12.

And then, according to my past self, you should theoretically be able to open at least 1,000 clients in a single app. If your OS allows it...

The number 64 is exceedingly suspicious. I think you may be hitting a limit with Windows. I found this:

https://learn.microsoft.com/en-us/windows/win32/winsock/maximum-number-of-sockets-supported-2

The maximum number of sockets that a Windows Sockets application can use is not affected by the manifest constant FD_SETSIZE. This value defined in the Winsock2.h header file is used in constructing the FD_SET structures used with select function. The default value in Winsock2.h is 64. If an application is designed to be capable of working with more than 64 sockets using the select and WSAPoll functions, the implementor should define the manifest FD_SETSIZE in every source file before including the Winsock2.h header file.

If that is the case, you might need to update the Paho C library source code to update the limit; probably with a build flag or something like that.

But an easier solution would likely be to just use a *nix system like Linux or Mac.

An alternate solution is to create an app that just opens one (or a few) connections, and then spawn as many instances of that app as you need to exercise your server.

@fpagliughi
Copy link
Contributor

fpagliughi commented Aug 2, 2023

Oh, and the Python library is not based on the C lib. So its implementation is totally different and apparently not based on using select() or [WSA]poll() to wait on any/all incoming connections at the same time. Therefore it's apparently not subject to this specific limit. But quickly googling around indicates it would hit some OS limit. Not sure what that is, though.

@AlfonsoSanz
Copy link
Author

Interesting! If it is platform related it might not be a problem since, even if this program is intended to run in a Windows Server for now, we are moving all our projects to Linux/Docker.
So I will try to run some tests in Linux with the library, and if it solves it, try to accelerate the migration to Linux.
Thanks for the suggestion @fpagliughi, I will comment on the results as soon as I run some test.

@AlfonsoSanz
Copy link
Author

You were correct @fpagliughi, it was a problem with Windows, the same code connected more than 64 devices on Linux (WSL).

The next step is either migrating to Linux, or finding a way to increase that limit on Windows. I am going to try the second option first since apparently some parts of the service require to run on the Windows machine and migration is not so easy.

I tried #define FD_SETSIZE 128 and other Windows configuration options, but did not make any difference, the 65th client onward fail to connect.

I am open to any ideas or options that can be done to fix this limitation on Windows.

@fpagliughi
Copy link
Contributor

Sorry I don't know. I'm not much of a Windows programmer myself.

Did you set FDSET_SIZE in a rebuild of the Paho C library? It sounds like that might just be step 1 to get this to work. From that same page above:

It must be emphasized that defining FD_SETSIZE as a particular value has no effect on the actual number of sockets provided by a Windows Sockets service provider. This value only affects the FD_XXX macros used by the select and WSAPoll functions.

Unfortunately I don't even have a Windows machine to test this out at the moment.

But if you pursue the Windows idea and find a solution, please do update this issue. I will leave it open, then update the docs/README appropriately to help out the next person. Either way, let me know.

@fpagliughi
Copy link
Contributor

@AlfonsoSanz Any resolution to this?

@AlfonsoSanz
Copy link
Author

Sorry @fpagliughi no solution on my part, the project was migrated to Linux where this limitation does not appear.
Tried defining FDSET_SIZE and a couple of other quick solutions found on search without fixing it, but I did not spend much time trying to be honest.

@fpagliughi
Copy link
Contributor

OK. Thanks for reporting it and keeping up with it as long as you did!
I'm going to say this is a platform issue beyond my control and close the issue. Please feel free to re-open if you discover anything new.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants