Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FTP: Enable SFTP connector to issue more than one unconfirmed read request #2567

Merged
merged 3 commits into from
Feb 10, 2021
Merged

FTP: Enable SFTP connector to issue more than one unconfirmed read request #2567

merged 3 commits into from
Feb 10, 2021

Conversation

conorgriffin
Copy link
Contributor

Fixes: #2557

This change allows the SFTP connector to send multiple parallel read requests to an SFTP server, significantly improving throughput.

A new API has been added to SftpSettings - withMaxUnconfirmedReads(value: Int). When this value is >=2 it will result in the SFTP client sending the corresponding number of read requests to the server without waiting synchronously on an ACK for each one. This significantly improves performance and is particularly important for connections with higher latency.

The timings below show the time taken in milliseconds to download a 1GB file at various latencies between an Alpakka SFTP client and an openssh SFTP server.

0ms
64 reads: 32683ms
32 reads: 35815ms
16 reads: 35298ms
8 reads: 38902ms


20ms
64 reads: 42407ms
32 reads: 48456ms
16 reads: 52239ms
8 reads: 87970ms


40ms
64 reads: 47277ms
32 reads: 53954ms
16 reads: 88914ms
8 reads: 161923ms


60ms
64 reads: 66015ms
32 reads: 73512ms
16 reads: 127010ms
8 reads: 236830ms


80ms
64 reads: 91981ms
32 reads: 96287ms
16 reads: 166619ms
8 reads: 309943ms


100ms
64 reads: 94496ms
32 reads: 116160ms
16 reads: 206884ms
8 reads: 384467ms

@lightbend-cla-validator
Copy link

At least one pull request committer is not linked to a user. See https://help.github.com/en/articles/why-are-my-commits-linked-to-the-wrong-user#commits-are-not-linked-to-any-user

@conorgriffin
Copy link
Contributor Author

I believe I was added to the Workday CCLA to allow me to contribute this fix, please confirm.

@conorgriffin
Copy link
Contributor Author

conorgriffin commented Feb 2, 2021

It would be great if this fix could be released with a minor version bump of Alpakka V2.

@lightbend-cla-validator
Copy link

Hi @conorgriffin,

Thank you for your contribution! We really value the time you've taken to put this together.

Before we proceed with reviewing this pull request, please sign the Lightbend Contributors License Agreement:

https://www.lightbend.com/contribute/cla

@conorgriffin conorgriffin changed the title FTP: Enable SFTP connector to issue more than one unconfirmed read request #2557 FTP: Enable SFTP connector to issue more than one unconfirmed read request Feb 2, 2021
@conorgriffin conorgriffin reopened this Feb 3, 2021
@conorgriffin
Copy link
Contributor Author

Closed and reopened to trigger the CLA validator task

@seglo
Copy link
Contributor

seglo commented Feb 3, 2021

It would be great if this fix could be released with a minor version bump of Alpakka V2.

Would a snapshot do? We've been progressing toward a 3.0.0-M1 release soon.

@conorgriffin
Copy link
Contributor Author

It would be great if this fix could be released with a minor version bump of Alpakka V2.

Would a snapshot do? We've been progressing toward a 3.0.0-M1 release soon.

It's not that big a deal, I could always use a locally built patch of v2.0.2 if needed. I wasn't sure of the level of inertia around a minor release.

@seglo
Copy link
Contributor

seglo commented Feb 3, 2021

We produce timestamped snapshots every successful master build, so you could reference that in lieu of the next milestone release.

@conorgriffin
Copy link
Contributor Author

What's the expected timeframe for the next milestone release?

@seglo
Copy link
Contributor

seglo commented Feb 4, 2021

We don't have a date planned, but I think we could release 3.0.0-M1 within the month.

@conorgriffin
Copy link
Contributor Author

This is my first PR. Should I solicit feedback somewhere or should I just wait for people to get to it in time? ☺️

@seglo
Copy link
Contributor

seglo commented Feb 5, 2021

I have it on my backlog to review. I'll probably get to it next Tuesday, unless someone else does first.

@conorgriffin
Copy link
Contributor Author

Thanks, have a great weekend 👍🏻

Copy link
Contributor

@seglo seglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice find. Is sftp doing something similar to speed up transfers? I wonder if this should be the default way of retrieving files if we can come up with a reasonable default. Is there a catch?

We should add a few lines to the FTP docs page too.

@conorgriffin
Copy link
Contributor Author

conorgriffin commented Feb 9, 2021

if you mean command-line sftp then yes, it does 64 parallel reads of 32k.

From the man pages

     -B buffer_size
             Specify the size of the buffer that sftp uses when transferring files.  Larger buffers require fewer round trips at the cost of higher memory consumption.  The default is 32768 bytes.
     -R num_requests
             Specify how many requests may be outstanding at any one time.  Increasing this may slightly improve file transfer speed but will increase memory usage.  The default is 64 outstanding requests.

I'm not aware of a catch doing parallel reads like this by default, with the exception of some additional memory usage. In testing I've done with sftp and some other clients 64/32k seems pretty optimal under a range of latency conditions.

@seglo
Copy link
Contributor

seglo commented Feb 9, 2021

Thanks @conorgriffin. Ok, let's keep the current state for now and let users discover this feature if they need it. Once you add a mention in the docs I think this PR will be ready. Maybe you could suggest the defaults found in the sftp man page as reasonable defaults. Feel free to add a usage example as well, but I don't think that's required.

Copy link
Contributor

@seglo seglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, a few docs suggestions.

docs/src/main/paradox/ftp.md Outdated Show resolved Hide resolved
docs/src/main/paradox/ftp.md Outdated Show resolved Hide resolved
@conorgriffin
Copy link
Contributor Author

Thanks, suggestions merged and last build was green so 🤞

Copy link
Contributor

@seglo seglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@seglo seglo merged commit 10c268d into akka:master Feb 10, 2021
@seglo seglo added this to the 3.0.0-M1 milestone Feb 10, 2021
@conorgriffin conorgriffin deleted the 2557-slow-sftp branch February 10, 2021 16:37
@seglo
Copy link
Contributor

seglo commented Feb 10, 2021

You should be able to use this in your build by adding the Akka snapshots artifact repo and referencing 2.0.2+71-10c268dd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Why is the SFTP connector so slow compared to the standalone SFTP client?
3 participants