Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement intelligent response -> minipass stream interface #442

Open
nlf opened this issue Mar 14, 2022 · 4 comments
Open

feat: implement intelligent response -> minipass stream interface #442

nlf opened this issue Mar 14, 2022 · 4 comments
Labels

Comments

@nlf
Copy link

nlf commented Mar 14, 2022

Is your feature request related to a problem? Please describe.

Our reported Socket timeout issues all seem to stem from errors being raised on a response body stream after the initial response has been received correctly. While we are attempting to pipe from the http response directly to both the consumer (eventually tar) as well as the cache (cacache), slow downs in disk speed, cpu usage, and other factors can potentially stall the consumption of the response body allowing the socket's idle timeout to be triggered.

Today, we attempt to ignore back pressure between the http response stream and the minipass stream by simply calling stream.write() in a data event handler. This is less than ideal for several reasons, not the least of which is that it has made the problem slightly more rare but not eliminated it.

Describe the solution you'd like

I propose we create a new stream as a subclass of Minipass that is aware of idle timeouts and will automatically read from the source http stream on an interval into an internal buffer. We will then ensure that events are ordered appropriately on the new stream, which gives us a more consistent source stream following minipass conventions rather than node core stream conventions.

Describe alternatives you've considered

The most obvious alternative is to stop using minipass entirely, however this would necessitate a refactor/rewrite of lots of core modules and would potentially expose us to race conditions that were mitigated for us by minipass's synchronous behavior.

@simllll
Copy link

simllll commented Mar 21, 2022

Regarding the alternatives. minipass-fetch is an identical interface to node-fetch (This is a fork (or more precisely, a reimplementation) of node-fetch. All streams have been replaced with minipass streams.) Therefore replacing minipass with node-fetch shouldn't be too bad right? I even played around with it while debugging the timeout issue, and got it up and running just by replacing the import.

Furthermore adding another layer complicates debugging and "understanding" of "what is going on" even more, and besides that, bringing in a buffer (maybe even with a watermark then?;-)) sounds like exactly what the node core streams are doing already?

As this is something so fundamental I see two points of views here:
1.) Why don't use something that is already very popular, used by millions and has a great community behind?
2.) Switching something so essential can bring up new issues, is that worth it?

I don't know the answer to this, but I would at least bring it up for discussion: is it worth it replacing minipass at all.

@nlf nlf removed their assignment Mar 24, 2022
@nlf
Copy link
Author

nlf commented Mar 24, 2022

Therefore replacing minipass with node-fetch shouldn't be too bad right?

in theory, you're right. however we have some significant concerns since we do use minipass streams internally everywhere else. because of the differing semantics in how data is processed, i'm concerned about introducing race conditions or other problems caused by minipass' synchronous nature. it's something that's on the table, but we're going to have to proceed with caution if we do go that route.

Why don't use something that is already very popular, used by millions and has a great community behind?

this is the eventual idea, hopefully.

Switching something so essential can bring up new issues, is that worth it?

this is exactly our largest concern. it's a very major change, and not one we can take lightly.

@nlf
Copy link
Author

nlf commented Mar 24, 2022

i think npm/minipass-fetch#53 is going to fix enough of our immediate concerns that this issue can be backlogged. it's still something we should consider though, as today we're potentially buffering the entirety of every response in memory and that's less than ideal.

@simllll
Copy link

simllll commented Mar 28, 2022

@nlf Something else I stumpled upon, I have npm installs inside a docker image that takes up to 30 minutes. Well today, it took even longer, and out of curiosity I was looking into that again. Since I added already the log level http to the npm instlal command, to see what's going on, I found something super weird:
image
It downloads all packages several times, some of them 2x..some of them up to .. I don't know 50x or more. As I remember correclty make-fetch-happens has an auto retry in it, can this somehow be related with all this timeout issues? That it somehow triggers the retry, even though the download was actually successful? Or is the download not successful and it can just not be seen in the log?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants