You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now, suppose the initial entity counter in my-project was 1000. In Alice's version, 1001 will correspond to file-a.png, in Bob's version, 1001 will be file-b.png. This breaks the ECS.
We need a solution to this problem.
It may be possible to create another command xvc merge to merge two histories. Adding a new command for this frequent occasion makes user-education a must. Making users frustrated in such a common situation is bad.
It's possible to randomize entities. Currently they are usize integers. Randomization requires to check previous values for possible collisions. Although it's a low probability, two 64-bit integers may collide (due to Birthday paradox) if we pass a few billion mark for entities. Also, randomization for each entity is expensive.
It's possible use a timestamp (e.g. nanoseconds since 2000-1-1) as an XvcEntity. When we think about thousands of branches running in different boxes with the same base XvcEntity base value, this may lead to collisions as well. Also, not all boxes support nanosecond accuracy and I'd not want to depend on something like time which may go back and forth depending on your locale.
It's possible to use something from environment, e.g. branch name, user name etc. But these can also fail and very easy to duplicate.
We only have a basic counter now. It's 32-bit on 32-bit systems, and 64-bit on 64-bit systems. I think we need to add some randomization to this, but this randomization shouldn't lead to collisions easily.
The solution I'm able to come up is making XvcEntity a tuple of 64-bit integers. The first integer will be random and created when the command is run. The second is the counter we already use. So the EC file won't be needed to change and the collisions (even after a few billion elements) will be negligible. In each run, the initial 64-bit integer will be renewed and the second 64-bit will be loaded.
It might be simpler to use a single 128-bit integer and randomize it at the beginning. I think keeping the counter is better for sorting by creation of these objects. (e.g. when we need to have a consistent sort across entities, but the actual sort is not that important.)
This brings another problem to the light. Suppose, in above scenario, Alice and Bob add the same files.
alice: xvc file track file-a.png
bob: xvc file track file-a.png
Now, as the entities are randomized, we have two entities for file-a.png. When we build an index in XvcStore, this will fail. There will be two entities for the same XvcPath.
An extra command (called xvc fsck) may be required in this case. It will check entities and merge duplicates. Also, merge all store files to a single file for quick loading. Its capabilities may increase in the future.
The text was updated successfully, but these errors were encountered:
The thing with the second problem may be solved by using a hash for the first element in the tuple. But it doesn't solve the collision problem. We can check collisions and ask for another value (or append a counter) in case of a collision, but checking collisions require to load all values. Also, it's an expensive solution for a comparatively rare problem. Hashing all values to get their entity values is a constant factor to remedy an infrequent problem.
If we'd go to this route, we'd skip entity generation completely and hash the values to get their entity values. I think this is rather a large change that I'd not decide at the moment.
Having a semantically neutral way (like counters) to generate the hash values seems a better approach at the moment.
Suppose Alice and Bob work on a project.
Alice adds a file to the project.
alice: git clone my-project ; xvc file track file-a.png
And Bob adds another file:
bob: git clone my-project ; xvc file track file-b.png
Now, suppose the initial entity counter in
my-project
was1000
. In Alice's version,1001
will correspond tofile-a.png
, in Bob's version,1001
will befile-b.png
. This breaks the ECS.We need a solution to this problem.
xvc merge
to merge two histories. Adding a new command for this frequent occasion makes user-education a must. Making users frustrated in such a common situation is bad.usize
integers. Randomization requires to check previous values for possible collisions. Although it's a low probability, two 64-bit integers may collide (due to Birthday paradox) if we pass a few billion mark for entities. Also, randomization for each entity is expensive.XvcEntity
. When we think about thousands of branches running in different boxes with the same baseXvcEntity
base value, this may lead to collisions as well. Also, not all boxes support nanosecond accuracy and I'd not want to depend on something like time which may go back and forth depending on your locale.The solution I'm able to come up is making XvcEntity a tuple of 64-bit integers. The first integer will be random and created when the command is run. The second is the counter we already use. So the EC file won't be needed to change and the collisions (even after a few billion elements) will be negligible. In each run, the initial 64-bit integer will be renewed and the second 64-bit will be loaded.
It might be simpler to use a single 128-bit integer and randomize it at the beginning. I think keeping the counter is better for sorting by creation of these objects. (e.g. when we need to have a consistent sort across entities, but the actual sort is not that important.)
This brings another problem to the light. Suppose, in above scenario, Alice and Bob add the same files.
alice: xvc file track file-a.png
bob: xvc file track file-a.png
Now, as the entities are randomized, we have two entities for
file-a.png
. When we build an index inXvcStore
, this will fail. There will be two entities for the sameXvcPath
.An extra command (called
xvc fsck
) may be required in this case. It will check entities and merge duplicates. Also, merge all store files to a single file for quick loading. Its capabilities may increase in the future.The text was updated successfully, but these errors were encountered: