Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: readFile function adds the chunkSize option #41647

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

mawaregetsuka
Copy link
Contributor

According to the discussion in pr 41436 adding a configurable parameter is probably the best solution

@nodejs-github-bot nodejs-github-bot added fs Issues and PRs related to the fs subsystem / file system. needs-ci PRs that need a full CI run. labels Jan 22, 2022
lib/fs.js Outdated Show resolved Hide resolved
doc/api/fs.md Outdated
@@ -3205,11 +3205,15 @@ changes:
* `encoding` {string|null} **Default:** `null`
* `flag` {string} See [support of file system `flags`][]. **Default:** `'r'`.
* `signal` {AbortSignal} allows aborting an in-progress readFile
* `kPerRead` {integer} The number of kb per read, -1 means no limit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The options are in alphabetical order. Let's keep them that way by moving this up one line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `kPerRead` {integer} The number of kb per read, -1 means no limit.
* `kPerRead` {integer} The number of KB per read. Use `-1` for no limit.

Copy link
Contributor

@mscdex mscdex Jan 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we're using kilobytes instead of just bytes? Using the latter is more sensible and more flexible IMO.

Also, I think a name like chunkSize or similar would be a better name, especially since the /^k[A-Z]/ prefix is used internally in code for private/internal field/symbol names.

@Trott
Copy link
Member

Trott commented Jan 22, 2022

This will need tests.


if (defaultOptions.kPerRead) {
if (options.kPerRead === undefined) {
options.kPerRead = defaultOptions.kPerRead;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this mutating the end user's options object?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean by this is that I believe options is the object passed in by the user. We shouldn't overwrite/set values on the object they pass in. AFAICT the only time options is not an end user's object is when they don't supply one.

@mawaregetsuka
Copy link
Contributor Author

I've solved the alphabetical order problem @Trott
About KB vs Byte I don't think users need that kind of precision, in fact, the existing configuration is also in kilobytes
About name I haven't consider about naming conventions yet maybe chunkSize is great
About

Isn't this mutating the end user's options object?

I didn't catch your meaning maybe I need an example
@mscdex

@mscdex
Copy link
Contributor

mscdex commented Jan 23, 2022

About KB vs Byte I don't think users need that kind of precision

I think using bytes would make it more consistent with other storage-based input values used in node core. For example: highWaterMark in streams, Buffer.alloc*(), etc. I'm not aware of any commonly used APIs in core that use units larger than bytes. Besides, it's not as if the range of a double will be insufficient for representing a byte-based chunk size that anyone would ever want to use.

If someone wants to use something less than 1KB, they have to use awkward fractional values and even then because of the floating point representation they might not be able to get the exact value they want.

in fact, the existing configuration is also in kilobytes

The value currently being used (kReadFileBufferLength) is measured in bytes.

@mawaregetsuka
Copy link
Contributor Author

Okay, I'll change it to byte later
In this function if I don't use options then need to add an extra parameter, is that reaally necessary?
options changes after input should be common with nodejs

@mawaregetsuka
Copy link
Contributor Author

It looks like the absence of a period would cause the test/parallel/test-cluster-primary-error test to time out, but why? I cannot reproduce timeout in CI locally.

doc/api/fs.md Outdated
@@ -3202,6 +3202,8 @@ changes:

* `path` {string|Buffer|URL|integer} filename or file descriptor
* `options` {Object|string}
* `bPerRead` {integer} The number of bytes per read. Use `-1` for no limit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/bPerRead/chunkSize

@aduh95
Copy link
Contributor

aduh95 commented Mar 12, 2022

@mawaregetsuka are you still interested to implement this? Can you address the comments above regarding the name of the option please?
We'd also need to add an entry to the YAML list in the docs.

node/doc/api/fs.md

Lines 3315 to 3320 in 80bfb9b

changes:
- version: REPLACEME
pr-url: https://github.com/nodejs/node/pull/41678
description: Passing an invalid callback to the `callback` argument
now throws `ERR_INVALID_ARG_TYPE` instead of
`ERR_INVALID_CALLBACK`.

@aduh95 aduh95 added the semver-minor PRs that contain new features and should be released in the next minor version. label Mar 12, 2022
@mawaregetsuka
Copy link
Contributor Author

Sure,I'll finish it as soon as possible
@aduh95

Copy link
Contributor

@aduh95 aduh95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would need a test for this new option.

Also, could you please rebase to avoid the merge commit? Merge commits tend to break our tooling unfortunately. Let me know if you need help for that.

doc/api/fs.md Outdated Show resolved Hide resolved
@mawaregetsuka
Copy link
Contributor Author

@aduh95
Thanks for your quick reply,I've considered adding test cases but I'm not sure if I want to add them in this PR.
about merge I'll make sure to add --rebase option every time, thank you for your notice.

@aduh95
Copy link
Contributor

aduh95 commented Mar 16, 2022

about merge I'll make sure to add --rebase option every time

To be clear, it doesn't have to be every time, it's only need when fixing a git conflict.

I've considered adding test cases but I'm not sure if I want to add them in this PR

I'm sure we do :) if we add a feature with no tests, we cannot guarantee it will still be working next time we change something in the source code (which happens often in a project the size of Node.js). Let us know if you need help writing the tests of course.

@mawaregetsuka
Copy link
Contributor Author

This is my first time writing tests for nodejs and I'd love to get some suggestions.

Copy link
Contributor

@aduh95 aduh95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions to improve the testing. Should we add a similar option to fs.promises.readFile?

Comment on lines 98 to 103
for (const e of fileInfo) {
fs.readFile(e.name, { chunkSize: e.chunkSize },
common.mustCall((err, buf) => {
assert.deepStrictEqual(buf, e.contents);
}));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First I think we want to make sure invalid values are not silently ignored, but is raising a useful error. Then we want to make sure the file is read as expected for valid values of chunkSize:

Suggested change
for (const e of fileInfo) {
fs.readFile(e.name, { chunkSize: e.chunkSize },
common.mustCall((err, buf) => {
assert.deepStrictEqual(buf, e.contents);
}));
}
['', Symbol(), () => {}, 1n, true, false, [], {}].forEach((chunkSize) => {
assert.throws(
() => fs.readFile(fileInfo[0].name, { chunkSize }, common.mustNotCall()),
{ code: 'ERR_INVALID_ARG_TYPE' }
);
});
[-1, 0, -Infinity, Infinity, NaN].forEach((chunkSize) => {
assert.throws(
() => fs.readFile(fileInfo[0].name, { chunkSize }, common.mustNotCall()),
{ code: 'ERR_INVALID_RANGE' }
);
});
[null, undefined, 1, 64, Number.MAX_SAFE_INTEGER].forEach((chunkSize) => {
fs.readFile(fileInfo[0].name, { chunkSize: e.chunkSize },
common.mustSucceed((buf) => {
assert.deepStrictEqual(buf, fileInfo[0].contents);
}));
});

Comment on lines 17 to 26
{ name: path.join(tmpdir.path, `${prefix}-1K.txt`),
len: 1024 },
len: 1024, chunkSize: 64 },
{ name: path.join(tmpdir.path, `${prefix}-64K.txt`),
len: 64 * 1024 },
len: 64 * 1024, chunkSize: -64 },
{ name: path.join(tmpdir.path, `${prefix}-64KLessOne.txt`),
len: (64 * 1024) - 1 },
len: (64 * 1024) - 1, chunkSize: 'string' },
{ name: path.join(tmpdir.path, `${prefix}-1M.txt`),
len: 1 * 1024 * 1024 },
len: 1 * 1024 * 1024, chunkSize: 0 },
{ name: path.join(tmpdir.path, `${prefix}-1MPlusOne.txt`),
len: (1 * 1024 * 1024) + 1 },
len: (1 * 1024 * 1024) + 1, chunkSize: -1 },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest reverting those changes.

@mawaregetsuka
Copy link
Contributor Author

I've improved the tests but there are still a few problems
1 The TypeError about Symbol() and BigInt(1n) does not have error code, should we improve this in other PR?
2 chunkSize is a parameter passed into libuv and I have no idea how to check if it works in the test phase.

@tniessen tniessen changed the title fs: readFile function adds the kPerRead option fs: readFile function adds the chunkSize option Mar 19, 2022
@aduh95
Copy link
Contributor

aduh95 commented Sep 20, 2023

@nodejs/fs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system. needs-ci PRs that need a full CI run. semver-minor PRs that contain new features and should be released in the next minor version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants