Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put *Sync methods behind a flag in some future major version #1665

Closed
ChALkeR opened this issue May 9, 2015 · 64 comments
Closed

Put *Sync methods behind a flag in some future major version #1665

ChALkeR opened this issue May 9, 2015 · 64 comments
Labels
discuss Issues opened for discussions and feedbacks.

Comments

@ChALkeR
Copy link
Member

ChALkeR commented May 9, 2015

No, I am not writing this from a mental hospital, please read till the bottom.

The purpose of this is to discourage using *Sync versions of the methods that could be async.

I propose to put all or at least a part of *Sync methods behind a runtime flag in some future major version of io.js and to split documentation, moving all *Sync methods to a separate page. This of course would be semver-major.

Atm, there are *Sync methods defined in zlib, fs, child_process, crypto modules and additionaly used in repl and module modules.
To allow usage inside io.js itself (i.e. for require), they could be moved to «private» methods (beginning with a _ sign).

  1. *Sync methods suggest bad practices (see 2), but when someone with full understanding of the consequences needs them, they could be constructed from userspace. See https://github.com/abbr/deasync, it creates synchronous methods from async methods.
    If deasync module is not good enough for this, this could be done when there is would be a good enough solution.

  2. When a newcomer begins writing something using io.js, he or she goes to the documentation looking how to do something, sees *Sync methods without any warnings there and almost certanly begins with using them, because that's what he or she is used to. That's easier for a newcomer than spending a few minutes reading how he or she should actually do stuff. And don't blame the newcomer, it's the presense of *Sync methods in the documentation that suggests to him or her that it's an ok way to do things. This results in a big pile of bad code by the time when the person understands that it should be rewritten. And people don't like to rewrite code for no visible reson, leaving this code to be legacy (see 3).

  3. When someone writes synchronous code (see 2) it limits how he or she can use async functions without rewriting most part of the logic, so he or she comes complaining about that there should be a *Sync version of everything out there (as the presense of *Sync versions in the core suggests it). See Is there a sync version encapsulating libmagic in node.js ? mscdex/mmmagic#32 and motivation behind https://github.com/abbr/deasync.

  4. It's completely broken either way. Even for «simple one-time scripts». See zlib: memory leak with gunzipSync #1479 — a person was doing something like files.forEach( …zlib.gzipSync(…) …), in what I suppose was a simple script. What could possibly go wrong? Memory usage has gone completely bad in his script. And even manual calls to gc() do not help. Testcase:

    'use strict';
    
    var zlib = require('zlib');
    var data = 'abcdefghijklmnopqrstuvwxyz';
    var gzipped = zlib.gzipSync(data);
    
    function call() {
      var contents = zlib.gunzipSync(gzipped);
    }
    
    for (var i = 0; i < 100000; i++) {
       call();
       if (i % 1000 === 0) {
           gc();gc();gc();gc();
           console.log(i + ' ' + JSON.stringify(process.memoryUsage()));
       }
    }
  5. People go to the doc, see *Sync versions (see 2), use them — then everyone are telling them that they are using io.js wrong just based on that fact: gripe: deprecating fs.exists/existsSync #1592 (comment).

    if you're using existsSync inside your server I'd argue you're not using io.js correctly to begin with

  6. One can argue again that using *Sync versions of methods is ok in scripts and that that's simplier, but:

    1. See 4.
    2. For one-time, throw-away scripts you can turn that flag on.
    3. Promises-based code is as clean as *Sync-based code. Promises-based code with accurate error handling is much cleaner that *Sync-based code with accurate error handling. Using Promisify won't be needed once Promises go to the core (see Feature Request: Every async function returns Promise #11).

This will require all public modules that are using *Sync methods either to rewrite things using async methods or require/include something like https://github.com/abbr/deasync to be compatible.

This can be done separately for various core modules/methods. For example, zlib.*Sync are maybe the worst of them and it looks to me that they are not actively used in public modules.

@cjihrig
Copy link
Contributor

cjihrig commented May 9, 2015

I think synchronous methods do have their place, and shouldn't be moved behind a flag. You just shouldn't be using synchronous calls in hot code. require() itself is synchronous.

@vkurchatkin
Copy link
Contributor

-1. Sync functions are very useful in various cases.

deasync is definitely the worst idea ever. 1. It doesn't actually make asynchronous calls blocking 2. It's inherently broken.

@ChALkeR
Copy link
Member Author

ChALkeR commented May 9, 2015

@cjihrig I was not talking about require, I was talking about *Sync methods — synchoronous variants of those methods that are async by themselves.

@vkurchatkin Checked the code. Isn't it possible to create an actual lock instead of a loop?

By the way, zlib.*Sync seems to be almost unused in comparison with fs.*Sync. For example, in node_modules of the project I have currently opened, fs.*Sync is used in 636 files while zlib.*Sync is not used at all.

@vkurchatkin
Copy link
Contributor

@ChALkeR I'm not sure how lock can help? To run async functions you need an event loop. It can't be the main loop, as unrelated requests would be processed as well. Something like this would work: nodejs/node-v0.x-archive#7323

@rlidwka
Copy link
Contributor

rlidwka commented May 9, 2015

+1

Only good use for sync functions is for one-line scripts, so adding a flag for it seems reasonable.

But usage of sync functions in any public modules should definitely be discouraged. Not only because of the event loop blocking, but because sync functions can't be replaced with async functions should the need for it arise.

For example, Jade uses readFileSync to read templates internally. What if I want to load them from a database instead? If it was an async readFile, I'd just monkey-patched require('fs'), but with readFileSync it just ain't happening.

Even synchronous require is a big pain in the ass if you want to require something from a remote http server or a zip archive instead of a filesystem. We've talked about that before.

I believe sync functions should be removed completely when ES7 async/await support lands in v8. Until then we should be gradually deprecating them.

@ChALkeR
Copy link
Member Author

ChALkeR commented May 9, 2015

@vkurchatkin nodejs/node-v0.x-archive#7323 looks like a cleaner solution to me than the current deasync module.

Once again: I do not propose to remove *Sync just now, I am talking about masking them in some future major version, when Promises will be treated as first-class citizens etc. And this is not about all synchronous methods, it's about those methods that have a normal async version.

This might require a better way to build a sync version of an async function than the deasync module, if it is not good enough.

@brendanashworth brendanashworth added discuss Issues opened for discussions and feedbacks. future labels May 9, 2015
@jonathanong
Copy link
Contributor

I'd rather have an async-only mode that warns/throws (an option for both). Creating CLI scripts that require runtime flags is a pain in the ass.

@piscisaureus
Copy link
Contributor

I'm -1 on putting Sync methods behind a flag. There are plenty of valid use cases for them.
If anything, maybe we could add --trace-blocking that prints a message whenever the event loop is blocked on i/o.

@ChALkeR
Copy link
Member Author

ChALkeR commented May 9, 2015

@jonathanong That woldn't do anything. It would not lower *Sync usage in modules, it wouldn't discourage newcomers from using *Sync methods, it wouldn't motivate people to port their legacy code to async. It wouldn't solve any problem highlighted in the inital post.

You could just grep for [a-z]*Sync\( if you want to just find all usages of them, there is no need for that to be traced.

@ChALkeR
Copy link
Member Author

ChALkeR commented May 9, 2015

@piscisaureus They introduce problems because they seem to be an easy solution at a first glance but result in huge problems later (see 2, 3). Even for use cases that look like valid there might be problems (see 4). Such flag might be useful, but this is a different question from the one that I want to discuss here. A relevant thing would be to propose some other way of discouraging newcomers (and not-so-newcomers) from (mis-)using *Sync methods.

@piscisaureus, @vkurchatkin Please, list specific use cases for this to be constructive.

@ChALkeR
Copy link
Member Author

ChALkeR commented May 9, 2015

Relevant: #5 (comment)

@vkurchatkin
Copy link
Contributor

To name some: scripting, bootstrapping, logging, using IO inside synchronous code. Sometimes blocking is fine. async-await + promises would be nice, but it doesn't solve all the problems.

Also, point by point:

1 -

when someone with full understanding of the consequences needs them, they could be constructed from userspace

No, they couldn't

2 - let's fix the docs

3 -

When someone writes synchronous code (see 2) it limits how he or she can use async functions without rewriting most part of the logic

This point as against removing Sync functions, if I understand it right

4 - this looks bad, actually

5 - See 2

6 -

Promises-based code is as clean as *Sync-based code.

No, it's not.

@ChALkeR
Copy link
Member Author

ChALkeR commented May 9, 2015

@vkurchatkin

  1. when someone with full understanding of the consequences needs them, they could be constructed from userspace

    No, they couldn't

    By this (added some time ago, after you shared deasync considerations):

    If deasync module is not good enough for this, this could be done when there is would be a good enough solution.

    I meant that this whole issue would have to wait until that problem (constructing sync methods from async methods) is solved. When and if that problem would be solved, the answer would be «Yes, they could».

  2. Yes, fixing the docs is part of the issue, it's mentioned at the very top. Doing only that would not solve the issue completely, but will work as a partial solution.

  3. No, it's a point for discouraging being contaminated by *Sync at the first place and against making it look like every async function should have a sync variant. No one is going to do that in all the user-contributed modules, they are generally async-only (for stuff that should be async), without sync variants.

  4. Yes. zlib.*Sync is bad, broken and not actively used, it should be the first candidate for deprecating IMO. But the same should be true for all heavy memory-using *Sync code.
    It's just an example for that you can not predict when sync code goes wild. Everything was made with garbage collection and async in mind.

  5. See 2.

  6. When Promises would be first-class citizens in io.js, it would be. And do not forget about error handling, if you take error handling into account, Promises-based code even now looks much cleaner that *Sync-based code. The latter would be a bunch of try-catches.

@benjamingr
Copy link
Member

I think we're forgetting how much userland code this sort of hard-deprecation and putting behind a flag breaks. Tons of packages use the Sync variant.

Wouldn't it be nicer to just discourage them in the docs more or soft-deprecate them? Our developers aren't retards, most io developers understand what blocking and non-blocking code means. Removing these just seems cruel and I'm not really sure what problem it actually solves.

@ChALkeR
Copy link
Member Author

ChALkeR commented May 9, 2015

@benjamingr Ok, I guess we need some stats here. fs.*Sync is indeed used by very many packages out there at the moment, but how about zlib.*Sync?

Discouraging in the docs is required, but it would not solve the problem completely. For example, there could be some intermediate step where those methods cause runtime warnings (for the first usage of every function) if the program was started without a flag.

Another way would be to allow enabling that flag per package.json, but doing so would be quite ugly and it would probably slow things down.

@benjamingr
Copy link
Member

I don't understand something here though:

  • We agree that these methods can't be moved to under a flag right away since they're used very often in libraries and frameworks that people use.
  • These functions don't have a significant maintenance overhead - so keeping them in core is free.
  • The issue you posted about is developer education. I'm both not sure it's a problem, and if it's a problem - isn't soft-deprecation from the docs just as effective?

There are tens of thousands of usages for fs.statSync only on GitHub. Do you really think moving them under a flag is pragmatic at this stage?

@ChALkeR
Copy link
Member Author

ChALkeR commented May 9, 2015

@benjamingr I repeat: this is not about right away, it is about changing the docs at some point, then moving that under a flag in some future version when all the prerequisites will be fullfilled (Promises in the core, a good deasync implementation). It should be done gracefully enough and after making sure that all problems could be solved.

About breaking compatibility — check how many packages are there that require a verion of nan lower than 1.8.0 atm. All that packages do not work with iojs 2.0.

And for reference, there are 302 results for zlib.gzipSync on GitHub, 197 results for zlib.gunzipSync, 204 for zlib.deflateSync, and 146 for zlib.deflateRawSync. We could start with that. And those are more problematic than fs.*Sync.

And no, soft-deprecation will not be as effective. People will still use *Sync, because they would not see a reason for not using them. And it is not going to fix legacy code. There obviously should be an intermediate soft-deprecation stage, but just stopping on soft-deprecation would not serve as a complete solution.

@evanlucas
Copy link
Contributor

I agree with @piscisaureus. Im -1 on putting them behind a flag as well

@CrabDude
Copy link

Regarding @piscisaureus's --trace-blocking, a BIG 👍

But it's not sufficient. It gets a little old telling every new developer to NEVER EVER EVER, no seriously, NEVER use a *Sync API in their code. It gets even more confusing when you have to explain why it's bad and even more confusing when you need to explain why they even exist in core in the first place.

My solution is to use a module like safeguards that deletes require and *Sync from the runtime (primarily for non-production environments), but justifying its use still requires the above explanations. What I'd really like to see is a flag that disables enables all blocking calls after the first tick and disables them by default, --sync-after-startup, which is the only time they should be used anyways. Bonus points for ensuring they're enabled in process.on('exit').

Further, if someone did want/need to use a *Sync version, place it in a separate module: require('fs-sync') or something that discourages lazily using *Sync because it's familiar which inevitably happens, especially in large codebases with JavaScript-as-a-second-language developers, and undermines the entire runtime by compounding latency across all concurrent requests.

@ChALkeR
Copy link
Member Author

ChALkeR commented May 10, 2015

@CrabDude Do I read it correct that you propose to loosen the restriction a bit compared to what I initially proposed and to allow using *Sync before the first tick completes by default, but put later usage of *Sync methods behind a flag?

If so, sounds like a reasonable change to me for the transition period (which could be indefinetely long).

But it would not fix the scripts like the one mentioned in (4). Maybe allowing fs.*Sync before the first tick completes would be fine, but zlib.*Sync should not be used even in the first tick.

@Qard
Copy link
Member

Qard commented May 10, 2015

-1 from me.

What about all the people pushing blocking code to child processes? They have to wait for async messages to come in, so only allowing them in the first and last tick doesn't work.

Some things simply don't work effectively as async. Certain crypto functions or machine learning code are good examples as there's so much data interaction that the thread hops kill performance.

My suggestion would be to simply wait for async/await and a promisified core, then deprecate whichever sync functions are no longer useful.

Regardless of what we decide to do, this would probably not happen for quite awhile.

@ChALkeR
Copy link
Member Author

ChALkeR commented May 10, 2015

Some things simply don't work effectively as async. Certain crypto functions or machine learning code are good examples as there's so much data interaction that the thread hops kill performance.

But some functions that have a *Sync version atm should not be used in sync, see (4).

Regardless of what we decide to do, this would probably not happen for quite awhile.

That's why it's titled «in some future major version» ☺.

@meandmycode
Copy link

I'm -1, whilst I agree the sync functions are bad. There is no advantage of doing this, because you still have the added complexity to support sync in iojs (only now with some additional flag logic).

Plus, whilst I know sync is bad- there are tons of times when performance is not a priority. Admittedly its a systemic issue in my experience (that I use sync because something I want to use was also sync).

I think I would be +1 in deprecating sync when ES7 await keyword lands.

@ChALkeR
Copy link
Member Author

ChALkeR commented May 10, 2015

@Qard Check this out:

Sync:

'use strict';

var zlib = require('zlib');
var data = 'abcdefghijklmnopqrstuvwxyz';
var gzipped = zlib.gzipSync(data);

var time = process.hrtime();
var count = 0, limit = 800000, block = 20000;

function call() {
    var contents = zlib.gunzipSync(gzipped);
    count++;
    if (count % block === 0) {
        var t = process.hrtime(time);
        console.log(count + ': ' + block / (t[0] + t[1] / 1e9) + ' per second ' + JSON.stringify(process.memoryUsage()));
        time = process.hrtime();
    }
}

for (var i = 0; i < limit; i++) {
    call();
}

Results:

20000: 32120.80078126396 per second {"rss":205680640,"heapTotal":92850176,"heapUsed":63149376}
40000: 24712.24496668192 per second {"rss":350953472,"heapTotal":142383104,"heapUsed":120764952}
60000: 19157.88530349938 per second {"rss":511909888,"heapTotal":208427008,"heapUsed":178824320}
80000: 23988.490399069695 per second {"rss":658661376,"heapTotal":262087680,"heapUsed":238506976}
100000: 13955.300721034313 per second {"rss":810508288,"heapTotal":320908032,"heapUsed":294681592}
120000: 11802.70418860893 per second {"rss":968253440,"heapTotal":384216832,"heapUsed":352865224}
140000: 17205.336577525715 per second {"rss":1116225536,"heapTotal":437877504,"heapUsed":411303744}
160000: 9461.70492469758 per second {"rss":1263710208,"heapTotal":491538176,"heapUsed":469886448}
180000: 13786.434617075574 per second {"rss":1419427840,"heapTotal":554499072,"heapUsed":529087376}
200000: 7811.700167343334 per second {"rss":1575280640,"heapTotal":616054528,"heapUsed":585289896}
220000: 7248.086406705255 per second {"rss":1722687488,"heapTotal":669715200,"heapUsed":643872584}
240000: 11829.481919493635 per second {"rss":1870372864,"heapTotal":723375872,"heapUsed":702309872}
260000: 6332.793425330343 per second {"rss":2026881024,"heapTotal":787368960,"heapUsed":762719752}
280000: 5705.23982507333 per second {"rss":2182406144,"heapTotal":848240384,"heapUsed":818202352}
300000: 9706.131023447047 per second {"rss":2330054656,"heapTotal":901901056,"heapUsed":876640232}
320000: 5173.066611812158 per second {"rss":2483449856,"heapTotal":961753344,"heapUsed":933799040}
340000: 8919.413918066206 per second {"rss":2631696384,"heapTotal":1016445952,"heapUsed":992301384}
360000: 4704.477186487043 per second {"rss":2788585472,"heapTotal":1079394048,"heapUsed":1049912904}
380000: 3790.5482333723803 per second {"rss":2942103552,"heapTotal":1139246336,"heapUsed":1108449624}
400000: 6695.01229537669 per second {"rss":3091365888,"heapTotal":1193419264,"heapUsed":1168547384}
420000: 3492.310557624931 per second {"rss":3240374272,"heapTotal":1249143808,"heapUsed":1224869560}
440000: 3331.5123270296226 per second {"rss":3404460032,"heapTotal":1318283520,"heapUsed":1283252336}
460000: 5279.244829029712 per second {"rss":3551862784,"heapTotal":1370912256,"heapUsed":1341891912}
480000: 3403.8314172820515 per second {"rss":3701026816,"heapTotal":1425604864,"heapUsed":1400620848}
500000: 4717.663996136406 per second {"rss":3850481664,"heapTotal":1482361344,"heapUsed":1457350928}
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory

Async:

'use strict';

var zlib = require('zlib');
var data = 'abcdefghijklmnopqrstuvwxyz';
var gzipped = zlib.gzipSync(data);

var time = process.hrtime();
var count = 0, limit = 800000, block = 20000;

function call() {
    zlib.gunzip(gzipped, function(err, contents) {
        count++;
        if (count % block === 0) {
            var t = process.hrtime(time);
            console.log(count + ': ' + block / (t[0] + t[1] / 1e9) + ' per second ' + JSON.stringify(process.memoryUsage()));
            time = process.hrtime();
        }
        if (count > limit) {
            process.exit(0);
        }
        setImmediate(call);
    });
}

for (var i = 0; i < 20; i++) {
    call();
}

Results:

20000: 15113.640220276682 per second {"rss":88940544,"heapTotal":44349184,"heapUsed":18067736}
40000: 15716.648838171723 per second {"rss":91709440,"heapTotal":44349184,"heapUsed":19365512}
60000: 14816.801385692159 per second {"rss":92160000,"heapTotal":44349184,"heapUsed":11115672}
80000: 15593.690573858403 per second {"rss":92631040,"heapTotal":44349184,"heapUsed":14862000}
100000: 15499.849684782734 per second {"rss":92827648,"heapTotal":44349184,"heapUsed":18603656}
120000: 15381.351118785531 per second {"rss":93102080,"heapTotal":44349184,"heapUsed":22419896}
140000: 15284.311806215188 per second {"rss":93237248,"heapTotal":44349184,"heapUsed":14339840}
160000: 15330.161436572198 per second {"rss":93364224,"heapTotal":44349184,"heapUsed":17981152}
180000: 15373.701251145398 per second {"rss":93470720,"heapTotal":44349184,"heapUsed":21744832}
200000: 15248.122554516112 per second {"rss":93536256,"heapTotal":44349184,"heapUsed":13545864}
220000: 15496.739672321237 per second {"rss":93642752,"heapTotal":44349184,"heapUsed":17286144}
240000: 15400.387513790854 per second {"rss":93646848,"heapTotal":44349184,"heapUsed":21046432}
260000: 15223.149619541675 per second {"rss":93675520,"heapTotal":44349184,"heapUsed":12787648}
280000: 15338.51160615975 per second {"rss":94035968,"heapTotal":44349184,"heapUsed":16607568}
300000: 15342.204833380902 per second {"rss":94023680,"heapTotal":44349184,"heapUsed":20498856}
320000: 15122.923957702411 per second {"rss":94068736,"heapTotal":44349184,"heapUsed":12187392}
340000: 15534.886625375988 per second {"rss":94126080,"heapTotal":44349184,"heapUsed":15917656}
360000: 15000.372545502447 per second {"rss":94232576,"heapTotal":44349184,"heapUsed":19576480}
380000: 15085.150261428105 per second {"rss":94248960,"heapTotal":44349184,"heapUsed":11382568}
400000: 15613.73561681802 per second {"rss":94253056,"heapTotal":44349184,"heapUsed":15091648}
420000: 15435.109100008664 per second {"rss":94236672,"heapTotal":44349184,"heapUsed":18774624}
440000: 15591.3070028562 per second {"rss":94310400,"heapTotal":44349184,"heapUsed":22455808}
460000: 15324.172756886877 per second {"rss":94302208,"heapTotal":44349184,"heapUsed":14327928}
480000: 15439.228484398942 per second {"rss":94318592,"heapTotal":44349184,"heapUsed":17929568}
500000: 15307.755028863501 per second {"rss":94334976,"heapTotal":44349184,"heapUsed":21877488}
520000: 14844.738156231802 per second {"rss":94339072,"heapTotal":44349184,"heapUsed":13607360}
540000: 15327.062465488294 per second {"rss":94363648,"heapTotal":44349184,"heapUsed":17284000}
560000: 15224.138038325284 per second {"rss":94371840,"heapTotal":44349184,"heapUsed":21057816}
580000: 14928.484074440235 per second {"rss":94355456,"heapTotal":44349184,"heapUsed":12795936}
600000: 14800.075890645147 per second {"rss":94433280,"heapTotal":44349184,"heapUsed":16486952}
620000: 15520.718679925192 per second {"rss":94453760,"heapTotal":44349184,"heapUsed":20235368}
640000: 15225.910148116009 per second {"rss":94519296,"heapTotal":44349184,"heapUsed":11959856}
660000: 15396.864526001433 per second {"rss":94494720,"heapTotal":44349184,"heapUsed":15681008}
680000: 15249.35174605432 per second {"rss":94523392,"heapTotal":44349184,"heapUsed":19363664}
700000: 15136.876003516221 per second {"rss":94543872,"heapTotal":44349184,"heapUsed":11101480}
720000: 15318.090473066199 per second {"rss":94552064,"heapTotal":44349184,"heapUsed":14775424}
740000: 15367.46907189277 per second {"rss":94556160,"heapTotal":44349184,"heapUsed":18518832}
760000: 15346.850775039215 per second {"rss":94568448,"heapTotal":44349184,"heapUsed":22182320}
780000: 15088.295501986055 per second {"rss":94572544,"heapTotal":44349184,"heapUsed":13954824}
800000: 15567.230348709105 per second {"rss":94572544,"heapTotal":44349184,"heapUsed":17677656}

Guess which one is faster (putting aside the fact that the sync one consumes 3.8 GiB of memory and crashes)?

@ChALkeR
Copy link
Member Author

ChALkeR commented May 10, 2015

I think I would be +1 in deprecating sync when ES7 await keyword lands.

@meandmycode That would work, I think, and will also solve the problem. I would be happy if that happens.

I agree that there is nowhere to rush on this issue now, this has to wait until there is good Promises support in the core.

@Qard
Copy link
Member

Qard commented May 10, 2015

To be fair, gunzipping a single thing nearly a million times synchronously is not very realistic code. I can take down the process even easier with a simple while (true);. That's not the fault of the language or the platform. It's the fault of the programmer for doing something they shouldn't.

I could fork bomb my own dev server. Does that mean I should be telling the bash devs to remove spawn support?

@ChALkeR
Copy link
Member Author

ChALkeR commented May 10, 2015

@Qard #1479:

@targos commented 20 days ago
I was running a script to read sequentially a lot of gzipped files, and print on stdout the result of a computation for each file.
After about 16'000 files, the process just stopped and Killed was printed on the terminal.

And that's not explained in the docs. Re-read the reasoning again — the methods are there, people expect them to work.

gunzipping a single thing nearly a million times

That's called a testcase, that's why it's a single short thing gunzipped nearly a millon times.

@Qard
Copy link
Member

Qard commented May 10, 2015

They do work. It's the garbage collector that does not work as expected. Removing the sync methods does nothing to stop the thousands of other ways I can blow up a process by blocking the garbage collector. The only fix for that is documentation.

@ChALkeR
Copy link
Member Author

ChALkeR commented May 11, 2015

@trevnorris

Let's not assume we're more intelligent than the developers writing their own apps and remove functionality because it's more theoretically correct.

I accept that point.

@bnoordhuis

Deprecating all Sync methods seems like a no go, the fallout would be enormous. I rate the chances of it making it past the TC approximately nil.

Well, that was expected. I'm glad that there is some discussion on this, though.

A flag to trace *Sync calls after the first tick would be indeed useful for debugging.

But the issue is initially about discouraging using *Sync calls, especially by newcomers.
Can something else be done with that?
What do you think about moving all the *Sync calls to a separate page of the docs, perhaps keeping a short reference on the corresponding modules pages?

@bnoordhuis
Copy link
Member

I labeled it tc-agenda. An unanimous 'no' from the TC (if that is what happens) might nip a lot of discourse (and discord) in the bud.

@Fishrock123
Copy link
Contributor

But the issue is initially about discouraging using *Sync calls, especially by newcomers.
Can something else be done with that?

Perhaps better intro docs on how to use async javascript?

@trevnorris
Copy link
Contributor

Let's just put the performance issue to bed. Depending on the use case, it can be faster to use the *Sync() functionality. The following test script reads data from /tmp/tmp.tmp, which is just a random file, as quickly as possible:

'use strict';
const ITER = 1e4;
const LEN = 1024 * 64;
var fs = require('fs');
var fd = fs.openSync('/tmp/tmp.tmp', 'r');
var b = new Buffer(LEN);
var cntr = 0;
var t, i;

// Async Test
/*
t = process.hrtime();
(function r() {
  if (++cntr === ITER)
    return printTime();
  fs.read(fd, b, 0, LEN, 0, r);
}());
/* */

// Sync Test
/*
t = process.hrtime();
for (i = 0; i < ITER; i++) {
  fs.readSync(fd, b, 0, LEN, 0);
}
printTime();
/* */

function printTime() {
  t = process.hrtime(t);
  t = t[0] * 1e9 + t[1];
  console.log((t / ITER).toFixed(1) + ' ns/op');
}

fs.read() time: ~14000 ns/op
fs.readSync() time: 4200 ns/op

So the only real question is should we allow an operation that blocks the event loop. IMO developers are intelligent enough to make the call whether they should or not.

And don't forget one important use case. If there's an uncaught exception and I want to log information about the state of the application before it's brought down. I can't create an async write because the event loop is possibly in a bad state. It's critical that I'm able to synchronously log this information out.

@awalgarg
Copy link

@ChALkeR The issue you raise about a beginner using Sync variants blindly is valid, but putting them behind a flag in whatever graceful manner doesn't solve the issue; it just shifts the problem elsewhere.

Since there are way better ways to solve this issue (suggested above like better documentation, proper warnings¹ etc.), I also -1 this suggestion.

A lot of functions in the standard library could be misused. We can't deprecate them because beginners are likely to misuse them.

¹ - See how browsers warn in the console about the use of synchronous ajax requests?

@benjamingr
Copy link
Member

@awalgarg for opt-in warnings see #1674

@Qard
Copy link
Member

Qard commented May 11, 2015

It'd be pretty trivial to write a userland module that, when required,
monkey-patches all the sync functions in core to log warnings when called.
I might even just write that tonight. :)
On May 11, 2015 11:22 AM, "Benjamin Gruenbaum" [email protected]
wrote:

@awalgarg https://github.com/awalGarg for opt-in warnings see #1674
#1674


Reply to this email directly or view it on GitHub
#1665 (comment).

@rlidwka
Copy link
Contributor

rlidwka commented May 11, 2015

Writing such a module is trivial. However, forcing all the people who misuse Sync to install it is not as easy. :(

@benjamingr
Copy link
Member

@rlidwka honestly if @Qard manages to solve the issues trev mentioned in the other issue (about process.binding) well I always prefer userland solutions to this sort of thing. I'm just not sure it's possible.

@trevnorris
Copy link
Contributor

@Qard I'm sure you're aware, but please read this thread on why that still isn't full-proof: #1674 . Here's a crazy hack as a proof-of-concept to get around that:

['read', 'write'].forEach(function(m) {
  var old_method = process.binding('fs')[m];
  process.binding('fs')[m] = function() {
    if (typeof arguments[arguments.length] !== 'function')
      console.warn('did not pass callback', (new Error()).stack.substr(6));
    return old_method.apply(null, arguments);
  };
});

@CrabDude
Copy link

@trevnorris Your example demonstrates my point on the common misperception and doesn't address the reasoning for this issue (performance degradation in high-concurrency scenarios).

Here's an example that does:

var http = require('http')
var foo = require('./packageWithBuriedSync').foo

// Note the ignorant all-to-common omission of Sync in the name
var bar = require('./packageWithBuriedSync').bar

function asyncHandler(req, res) {
  var i = 3
  foo(sendEnd)
  foo(sendEnd)
  foo(sendEnd)

  function sendEnd() {
    if (!--i) res.end()
  }
}

function syncHandler(req, res) {
  bar()
  bar()
  bar()
  res.end()

  // To eliminate performance questions about closure creation
  function sendEndNoop() {}
}

http.createServer(asyncHandler).listen(8000)
// packageWithBuriedSync.js
var child_process = require('child_process')
var exec = child_process.exec
var execSync = child_process.execSync

module.exports = {
  foo: function (callback) {
    exec('ls', callback)
  },
  bar: function () {
    return execSync('ls')
  }
}

Using ab -k -n 10000 -c 100 http://127.0.0.1:8000/

exec('ls') Time taken for tests:

1 calls: 26.429 seconds
2 calls: 48.974 seconds
3 calls: 77.823 seconds
4 calls: 99.995 seconds

execSync('ls') Time taken for tests:

1 calls: 53.748 seconds
2 calls: 103.953 seconds
3 calls: 154.733 seconds
4 calls: 218.769 seconds

Conclusion: For IO-bound cases[1], an ignorant buried *Sync consistently decreased performance by 100%.

@Qard Yes. See @rlidwka's comment though.

@benjamingr This is solved. The issue is the current stance is hostile to performance and an anti-pattern for new developers, thus the reasoning for proposing inclusion in core.

[1] Due to my MBA's flash drive, filesystem latency was extremely low and async performance had relative parity with *Sync with -c 100. An appropriate test should determine the point at which IO latency and concurrency begin to penalize performance.

@CrabDude
Copy link

My preference is to close this in favor of a more pragmatic solution like #1674.

@Qard
Copy link
Member

Qard commented May 11, 2015

Indeed. It doesn't really stop people from using sync functions where they
shouldn't it's just a way for people that care about that sort of thing to
get warnings and maybe a stack trace of where it came from, so they can
just avoid using those bad modules.

Like I said, not a fix, but it's something that can be done right now to
help optimise the performance of your code.
On May 11, 2015 1:01 PM, "Alex Kocharin" [email protected] wrote:

Writing such a module is trivial. However, forcing all the people who
misuse Sync to install it is not as easy. :(


Reply to this email directly or view it on GitHub
#1665 (comment).

@Fishrock123
Copy link
Contributor

Please let's not have an argument about who does or doesn't know more node..

@benjamingr
Copy link
Member

@Fishrock123 you're right, for what it's worth I was not making fun of @CrabDude nor was I particularly critical of his level of expertise. I was just really amused by the tone of his comments. Still am, but going to keep it at a more professional tone from now on.

@CrabDude I'm sorry if I offended you, but you got to see where I'm coming from. I never once criticized you or your ability to write serverside JavaScript. The only thing my comments were about were the tone you replied to trev (and later me).

@ChALkeR for what it's worth it was never personal, I don't recall interacting with him before.

@trevnorris
Copy link
Contributor

@CrabDude You successfully demonstrated a case where using a sync alternative is slower. Those examples are plentiful. The point of my benchmark is to 1) demonstrate that async is not always faster and 2) show we shouldn't pretend we know what's best for the app developers. There are legitimate use cases for sync operations. Recall again that if you want to log information in an uncaughtException before the process goes down that it must be done synchronously.

execSync() was added after a lot of community members reached out and requested it. Even then it took a while to make it into core because of the scale of the change that was required (see fa4eb47 and e8df267). Removing it after all that would be to spit in the community's face.

@benjamingr
Copy link
Member

I don't think you two actually disagree about anything at this point.

@trevnorris demonstrated the Sync isn't a performance anti-pattern in non-concurrent situations.
@CrabDude demonstrated that Sync is a performance anti-pattern in concurrent situations.

Neither of you disagreed about those points. Both of you agree that it would be nice to track these performance issues in real code.

I think we should ofcus on #1674 at this point.

@ChALkeR
Copy link
Member Author

ChALkeR commented May 12, 2015

I think that @trevnorris is correct here:

Let's not assume we're more intelligent than the developers writing their own apps and remove functionality because it's more theoretically correct.

I initially wanted some discussion on this and I am glad that happened and that #1674 was created.

But it misses the problem behind this issue that I specially highligthed in the first (actually second) sentance of the issue. And the discussion somewhy almost missed the documentation changes that I proposed, so I opened a separate issue for that: #1684.

I guess this issue could be closed now if there are no objections.

@CrabDude
Copy link

@trevnorris

Let's make sure we're talking about the same things...

It's stupid for us to force this type of paradigm on the user.

Stupid to remove *Sync completely? Yes. I differ from the OP in that I'm advocating for requiring opt-in in all scenarios where it wouldn't be "stupid" (i.e., first tick and exit).

execSync() was added after a lot of community members reached out and requested it. [...] Removing it after all that would be to spit in the community's face.

Unnecessary inflammatory language aside, *Sync calls are a liability and an asynchronous IO-bound anti-pattern, what node.js is optimized for. It's a "spit in the community's face" to put anti-patterns that undermine the runtime in core in the first place. Appeasing a vocal minority to support non-concurrent, non-asynchronous scenarios is harmful to node.js primary use case. This is precisely what I mean when I point to your unfamiliarity with application developers' dilemma. What's "stupid" is the proliferation of liabilities and anti-patterns in core without considerations for education, allowances for documentation, and meaningful support for avoidance. I think we're now all on the same page here.

Let's not assume we're more intelligent than the developers writing their own apps and remove functionality because it's more theoretically correct.

First, facts are not theory. It is factually correct, demonstrated above and acknowledge by yourself that blocking calls are a major performance penalty for IO-bound tasks, which arguably account for the vast majority of node's use (numbers would be useful here). Second, the addition of *Sync to appease a vocal minority and the pernicious liabilities it creates for the ecosystem is naive. domains, of which you are intimately familiar, perfectly illustrates another core anti-pattern that developers consistently ignorantly misused resulting in its removal despite "the scale of the change that was required." In your opinion, was that also a "spit in the community's face"?

Let's just put the performance issue to bed. [...] You successfully demonstrated a case where using a sync alternative is slower. Those examples are plentiful.

Precisely. So we're in agreement that *Sync is anti-performant in high-concurrency scenarios, which again is node's core focus, and only faster in non-concurrent scenarios.

we shouldn't pretend we know what's best for the app developers.

At face value, this seems reasonable, yet in this context, it's misguided. Reductio ad absurdum:

  • Who are we to say developers prefer performance over features?
  • Who are we to say developers prefer non-blocking calls in a non-blocking runtime?
  • Who are we to say http should be in core when it's perfectly implementable in userland?
  • Who are we to say how modules or packages should be loaded?

You'll notice for all of the above, core does not preclude userland from implementing their own features, blocking calls, http alternatives, or module/package systems. It does however ship solutions to further enable the best non-blocking JavaScript IO runtime, which *Sync calls after startup are deleterious to. Did you equally object that execSync belonged in userland and if so, why do you now defend its inclusion with such forceful language?

There are legitimate use cases for sync operations.

Addressed at the beginning of the conversation.

#1674 and #1684 are excellent steps in the right directly, though I would prefer to see the logic that continues to support the proliferation of *Sync calls in core to die. It was not long ago when the node community loudly advocated that a primary reason for node's success over Twisted, EventMachine, etc... was the lack of legacy (blocking) IO calls. It seems we're prone to repeat the mistakes of the past in the name of pragmatism, un-opinionatedness or "community".

@ChALkeR
Copy link
Member Author

ChALkeR commented May 15, 2015

Got replaced by #1674 (#1707) for the opt-in warnings for using *Sync after the first tick and #1684 for the docs.

TC voted against *Sync «deprecation» and (if I got things correctly) for the opt-in warnings and documentation changes.

If there are any other ideas, those could go to separate issues.

@ChALkeR ChALkeR closed this as completed May 15, 2015
@Fishrock123 Fishrock123 added child_process Issues and PRs related to the child_process subsystem. and removed future child_process Issues and PRs related to the child_process subsystem. labels May 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues opened for discussions and feedbacks.
Projects
None yet
Development

No branches or pull requests