Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-17240: Fix Http2SolrClient maxConnectionsPerDestination usage #2410

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

HoustonPutman
Copy link
Contributor

Copy link
Member

@markrmiller markrmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, I thought it was just defaulting to 4 and you couldn't necessarily change that everywhere in internal use because I saw the setMaxConnectionsPerHost in the builder. I didn't realize that that is a complete noop for http2 mode. That seems like a silly choice in any scenario.

@magibney
Copy link
Contributor

javadoc should be updated on HttpSolrClientBuilderBase.withMaxConnectionsPerHost(int), explaining the new difference between how this is handled for http1 vs http2.

I wonder, to approximate similar high-level behavior for http1 vs http2, could the http2 client, under the hood, interpret this setting as ((maxConnectionsPerDest - 1) / 300) + 1, or something? I don't like having that 300 constant hardcoded there, as presumably the multiplexing factor in jetty could change ... but idk, something along those lines might address Sanjay's concern.

@HoustonPutman
Copy link
Contributor Author

Ok I've updated the docs, and used the maxConcurrentPushedStreams to get the multiplexing factor.

This should probably be good to go?

@markrmiller
Copy link
Member

under the hood, interpret this setting as ((maxConnectionsPerDest - 1) / 300) + 1, or something? I don't like having that 300 constant hardcoded there, as presumably the multiplexing factor in jetty could change ... but idk, something along those lines might address Sanjay's concern.

I don't think you want to do that, this should do what it says and set the max connections and not relate to multiplexing. Multiplexing should be it's own setter if you want it be configurable but trying to tie the two together really wouldn't do the user any favors.

@HoustonPutman
Copy link
Contributor Author

under the hood, interpret this setting as ((maxConnectionsPerDest - 1) / 300) + 1, or something? I don't like having that 300 constant hardcoded there, as presumably the multiplexing factor in jetty could change ... but idk, something along those lines might address Sanjay's concern.

I don't think you want to do that, this should do what it says and set the max connections and not relate to multiplexing. Multiplexing should be it's own setter if you want it be configurable but trying to tie the two together really wouldn't do the user any favors.

Fair point mark. I'll make these things independent, add documentation, and make the defaults across solr make sense for both http1 and http2.

@epugh
Copy link
Contributor

epugh commented Jun 19, 2024

Anything I can do to help this move forward? (Testing wise!).

@iamsanjay
Copy link
Contributor

I came across this article where they tested with various different combinations of concurrent Streams and number of http connections.

https://blog.vespa.ai/http2/

To compare the throughput of HTTP/1.1 vs HTTP/2 in the Vespa container, we measured the rate of HTTP requests (QPS) we could obtain on a single node with 48 CPU cores and 256GB RAM, using h2load as a benchmarking tool. As expected, single connection throughput increased significantly with HTTP/2. HTTP/2 with 256 concurrent streams gave a throughput of 115 000 requests per seconds compared to 6 500 for HTTP/1.1. Increasing the number of connections from 1 to 128 increased throughput to 115 000 for HTTP/1.1, while HTTP/2 gave 125 000 with the same setup (without request multiplexing). HTTP/2 was also more efficient, with lower CPU utilization—a consequence of its compact protocol representation and header compression. The highest throughput for HTTP/2 was observed with 4 clients and 256 concurrent streams. This configuration resulted in 225 000 requests per seconds—roughly double the best case for HTTP/1 (128 connections). The h2load tool was CPU constrained for the single connection benchmark as it could only utilize a single CPU core per connection. Having 4 connections removed the CPU bottleneck. Increasing beyond 4 connections resulted in gradually more overhead and degraded throughput.

For them 4 clients and 256 concurrent streams gives highest throughput.

We should try to test it with different configuration and try to find the right configuration that works for us. Also, I believe, most of the Recovery code in main branch is now using HTTP2 and we can start with testing that part.

NOTE :- maxConcurrentStreams is a server level configuration.

Copy link

This PR has had no activity for 60 days and is now labeled as stale. Any new activity or converting it to draft will remove the stale label. To attract more reviewers, please tag people who might be familiar with the code area and/or notify the [email protected] mailing list. Thank you for your contribution!

@github-actions github-actions bot added the stale PR not updated in 60 days label Aug 20, 2024
@epugh
Copy link
Contributor

epugh commented Aug 20, 2024

@iamsanjay did you get a chance to look at this one from a testing perspective? Seems like such a great ticket...

@github-actions github-actions bot removed the stale PR not updated in 60 days label Aug 23, 2024
@dsmiley
Copy link
Contributor

dsmiley commented Sep 6, 2024

Curious if this change has been tested/deployed in any way?

Copy link

This PR has had no activity for 60 days and is now labeled as stale. Any new activity will remove the stale label. To attract more reviewers, please tag people who might be familiar with the code area and/or notify the [email protected] mailing list. To exempt this PR from being marked as stale, make it a draft PR or add the label "exempt-stale". If left unattended, this PR will be closed after another 60 days of inactivity. Thank you for your contribution!

@github-actions github-actions bot added the stale PR not updated in 60 days label Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client:solrj stale PR not updated in 60 days
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants