Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunking NG: SHA1 checksumming of whole file #11811 #26582

Closed
wants to merge 1 commit into from

Conversation

guruz
Copy link
Contributor

@guruz guruz commented Nov 8, 2016

@mention-bot
Copy link

@guruz, thanks for your PR! By analyzing the history of the files in this pull request, we identified @DeepDiver1975, @PVince81 and @icewind1991 to be potential reviewers.

if ($checksum && isset($_SERVER['HTTP_OC_CHECKSUM'])) {
if (strpos($_SERVER['HTTP_OC_CHECKSUM'], $checksum) === false) {
// We use "strpos" because client might send multiple checksums
throw new BadRequest('invalid checksum computed ' . $checksum . ' got ' . $_SERVER['HTTP_OC_CHECKSUM']);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a users PoV:

Please make sure that this message has some more context as the well known "expected filesize xxx, got yyy" message was just confusing and people didn't know what to do with that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a proposal? :) "Corrupted file stream"? "Corrupted upload"?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not as i don't know what is causing that message. If it will the caused by a Corrupted upload i would name it like this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Corrupted upload, likely related to network issues. Please try again" ? 😉

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If thats the explanation for the reason i would go for this or a similar text. This will safe us/you from user questions about the actual meaning of the current text.

@PVince81
Copy link
Contributor

PVince81 commented Nov 8, 2016

Code looks good, as discussed. Great work!

But also as discussed, I'm a bit worried that it might decrease the performance a bit like observed in the past #11811 (comment) (that was with md5 though).

We could add a switch to disable checksumming altogether. Or at least disable the calculation if the client did not send any checksum (client could be any other Webdav client).

@DeepDiver1975 @butonic @PhilippSchaffrath thoughts ?

@PVince81 PVince81 added this to the 9.2 milestone Nov 8, 2016
@guruz
Copy link
Contributor Author

guruz commented Nov 9, 2016

if the client did not send any checksum (client could be any other Webdav client).

Not in this case because chunking ng is only used by the oC client (and hopefully the webinterface at some point)

(The table in the linked comment is from 2y ago, now might also be much better)

@PVince81
Copy link
Contributor

PVince81 commented Nov 9, 2016

@guruz hmm indeed. And now I realize that we actually have yet another mechanism to upload file to the new endpoint: simple PUT. Will the client still use chunking ng for small files with the benefit of checksums ?
If not, we might want to add the checksum checking also for regular PUT operations to the new DAV endpoint.

@PVince81
Copy link
Contributor

See my comment on the approach: #26655 (comment)
If we do it as stream wrapper we can checksum in many more situations, including the assembly stream indirectly.

@DeepDiver1975
Copy link
Member

@mrow4a can I ask you to do a performance comparison? What is the overhead for this? THX

@mrow4a
Copy link
Contributor

mrow4a commented Dec 8, 2016

I belive you are mostly interested in worst case scenario? So what will be the overhead using standard VM with standard HDD and 2-4 CPUs? Will do that, I am actualy also currious.

@mrow4a
Copy link
Contributor

mrow4a commented Dec 12, 2016

I did some tests. I have been running that on test machine with :
client: 2x2,4GHz and 2GB RAM standard HDD with fresh client build from github master
server1: 4x2,4GHz and 2GB RAM standard HDD with stable-master TAR
server2: 4x2,4GHz and 2GB RAM standard HDD with stable-master TAR with patched changes

I couldnt belive actualy the results, but since you do it in memory.. :>

I rerun the test for 1x100MB files, so 10 chunks, 20 times:
image

I did not belive that and I actualy even started printing checksums.. But it is like that. Seems in-memory checksums are nearly for free, due to pipelining of requests the difference is nearly unspottable.

The way you do it also probably will not show anything revolutionary in blackfire, since you do it in-memory anyway.

@butonic
Copy link
Member

butonic commented Dec 12, 2016

I would prefer to calculate the hash by wrapping the php://input stream. typically network is even slower than disk access. I tried to grasp when the AssemblyStream is used ... but it is far to late for me now. n8

Oh that does not mean you should change this right now. This is awesome!

@PVince81
Copy link
Contributor

@butonic yes, as discussed in #26655 we need to discard this PR and use a stream wrapper to calculate the hash instead.

@PVince81 PVince81 closed this Dec 13, 2016
@DeepDiver1975 DeepDiver1975 deleted the enable_sha1_chunking_ng branch January 30, 2017 22:17
@lock
Copy link

lock bot commented Aug 4, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants