-
Notifications
You must be signed in to change notification settings - Fork 669
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File Transport Checksums Between Server and Desktop #3735
Comments
What needs to be done in the client?
|
It currently always returns the empty list and thus has no effect.
@ckamm the patch you're pointing to checks the expected checksum type as in Adler32 or SHA or so. If that is empty, we do not have checksumming and assume validated() - which means all is ok. |
@dragotin Yes, but why would the checksum type configured in the config file matter when one validates a checksum the server sends? And shouldn't an empty checksum header still yield |
@moscicki I'm touching the checksumming code and wonder: Do you use |
We treat the request header name as case-insensitive. So all these are accepted: Adler32, adler32, ADLER32 Response given by the server: Adler32 |
@moscicki Great, then the client will standardize on EDIT: Note that with 'standardize' I meant the spelling of this particular checksum type (instead of also sometimes sending |
The problem with Adler32 is that it is not safe against collision. And so we cannot use it to detect copies or moves. (Or can we?) |
@moscicki can you please enlighten us? |
What exactly do you want to use the checksum for? As Etag? Or deduplicate I think the answer depends on what you want to do with it ;-) On Wed, Oct 21, 2015 at 6:24 PM, Klaas Freitag [email protected]
Best regards, |
This is not decided yet. But if we have checksum in the database we might as well use it to detect copies and move more reliably than now. |
I think this requires an architectural discussion, also in the context of new storage architecture promised for owncloud 9 server. This is to say that IMO sync client should be flexible enough to handle storage backends which come with checksums already (e.g. S3 with MD5, EOS with Alder32,...). This is the storage backend behind the owncloud server which tells the client what the checksum is - you cannot change that unless you want to compute it yourself again in the owncloud server. For file downloads the checksum type comes in the response header -- so client just uses this information. For file uploads client may want to discover what is the checksum type supported by the server (this may be per-directory!) in order to compute the expected checksum and add it to the PUT header. Except for the discovery this is what has already been implemented in the sync client and for protecting the file transfer itself this should be sufficient. Another point is what kind of checksum you store locally in the sync state db and if this could/should be different from the transmission checksum (at the expense of computing the checksum twice) and how that can help you to optimize operations. If this can help you at all then it should be a very strong checksum that would guarantee that if checksum1==checksum2 is equivalent to content1=content2. This is not true for Adler32. Here are some pointers:
Having said that, it probably makes sense to first try to understand what kind of optimizations you may want to achieve (deduplicating transfer in case of propagating a local copy -- yes, that would be one such case -- but is it worth it?). For moves, I really do not see how this can help the client to track them. And, in addition to all that, checksuming should be considered in the context of delta sync -- this is potentially a big win. But this needs to be verified on real data. The bottom line is that one should analyze the real usage of the system to have some idea about how important these things are. We do this kind of data mining on CERNBox currently so any ideas about what pieces of information to extract to provide a useful, real-life information as input to this discussion are welcome. I propose to prepare the grounds and aim at the discussion in Zurich CS3 Workshop in ETH in January. |
Yes, actually there are three areas where we can use checksums, and often these are confused in discussions: Checksums for Transportation Verification (#3735)The idea here is to verify the integrity of a file's content after it has been transmitted over any kind of network, usually after it has been downloaded from the server or uploaded from a client to the server. A checksum is added to the header of the GET or PUT request and the receiving site recalculates the content checksum of the file on it's final location and compares it to the expected value from the header. This adds another level of safety to the transmission. For this usecase, any available checksum from the storage backends should be used, to keep the effort for the server as low as possible. The header that contains the checksum information for this reason prefixes the checksum with its type, for example If the type is unkown to the receiving site, the transmission verification can not be done. This is currently not considered an error, but on the longer run, it could become a configuration option if it should. Checksums to Support Discovering ChangesThis usecase has two aspects:
These usecase is mainly isolated on a single site, so any checksum that is available and stored in the journal could be used. Checksums to Enable Delta SyncThe main difference of this compared to the two other usecases is that here checksums over blocks of a file are needed, and not checksums over the entire file. The usual delta sync algorithms usually split a file into blocks of equal size, calculate the checksums over these blocks and try to only transmit these blocks that have really changed. The tricky part is how the file is split to have a maximum high probability of finding unchanged blocks. (See #179) For this usecase, strong checksums are needed. Both client and server need to be able to recalculate and possibly store the checksum lists. |
That's a very nice summary! I add this link for future reference: |
Thanks @labkode I am not sure if this is really an error. I mean, md5 != MD5, so I think this is an acceptable behaviour. This feature needs attention from the administrator, so configuring it correctly should be possible. So we can consider this working I assume? |
@dragotin The client works correctly on Mac OSX. I have to try on Linux and Windows. But if the codebase is the same they should also work correctly. |
@dragotin Tested from Win and Linux. Both working against EOS with Adler32. |
Reopening because we have to test that again against ownCloud server. |
@bboule Are you saying you're currently working on this? Should I assign this to you? |
I don't think so. This is just a label of the tool he is using 🙈 |
@MorrisJobke Thanks, that explains it. As far as I know, testing this can only start when owncloud/core#18716 is done. |
See owncloud/core#21997 for a WIP PR... it can store provided checksums from the OC-Checksum Header. Will return the OC-Checksum header if there is a checksum. And on propfind the checksums will be returned as well The format for the header field is basically: |
owncloud/core#21997 is in. However slimmed down. So right now only 1 checksum can be send. But all should work for 9.0 |
The acceptance criteria of this issue is not met as the server in the current implementation (9.0) does not validate the checksum that comes with the client's file upload, but only stores it. In fact, with that we only have file transport verification only on downloads. Also the server stores checksums without verifying them against the uploaded file first. |
these criteria above should have been added to the corresponding server ticket owncloud/core#18716 @MTRichards @karlitschek @cmonteroluque this somehow got lost while coordinating client and server requirements - hopefully we can avoid this in the future |
for the sake of completeness, here is the current state with 9.0. With not having the server verifying the uploaded checksum, this can happen:
At the same time UserA thinks his file is properly upload to the server. Which is not the case. |
Hi hi: Test 1Steps Executed 1.- Connect desktop client to a server that has cksum capabilities enabled Results Right now as could be expected server does not validate cksum at uploads so in this case we could have a corrupted file updated to the server what is described right above could happend. Test 2Steps Executed 1.- Connect desktop client to a server that has cksum capabilities enabled Results An error is raised to let us know that the file1 does not match expected cksum and sync for that file ir retried at next sync. Test 3Steps executed 1.- Connect desktop client to a server that has cksum capabilities enabled Results File2 is not synced again. Test 4Steps executed 1.- Connect desktop client to a server that has cksum capabilities enabled Results File2.txt is uploaded again. Test 5Steps executed 1.- Connect desktop client to a server that has cksum capabilities enabled Results It works fine, checksum matches. @dragotin what do you think we can do with this issue ¿? ServerOS: Ubuntu Client OS: OS X El capitán 10.11.4 |
@mcastroSG very good work, all tests make sense and succeed. Thanks. I think you can close this. |
Thanks a lot !! 😃 |
What is the config option for enabling support of file transport checksums on the server side? It does not seem to be documented anywhere. |
@meekjt The option is not there yet on the server side. It will come with one of the next versions. |
As an ownCloud admin, I want ownCloud to ensure that files transported from the desktop client to the server are not corrupted in transit so that we can catch bugs and avoid corrupting customer data
Acceptance Criteria:
Note: this links to server issue XXX
The text was updated successfully, but these errors were encountered: