Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor ResourceIndex and Associated Methods #79

Merged
merged 6 commits into from
Aug 8, 2024
Merged

Conversation

tareknaser
Copy link
Collaborator

@tareknaser tareknaser commented Jun 30, 2024

Description

This PR outlines a series of improvements and refactors to the ResourceIndex and its associated methods, as discussed in this comment.

Changes

  1. Refactor ResourceIndex API to align with the proposed changes discussed in the linked comment.
  2. Move unit tests to a separate file from index.rs.
  3. Implement a custom Serialize implementation for ResourceIndex that avoids writing double mappings.
  4. Revise ResourceIndex::update_all() to rebuild the index and compare it with the old index.
  5. Update id_to_resources to be of type HashMap<HashType, Vec<IndexedResource>> instead of HashMap<ResourceId, CanonicalPathBuf> to handle cases where one ID maps to multiple resources due to collisions or identical content.

Related Issues:

Additional Changes:

  • Define new rules for rustfmt to wrap long comments.
  • Use the nightly build in CI for rustfmt experimental rules.
  • Change target_os to target_family.

Todo:

Copy link

Benchmark for aee18b1

Click to view benchmark
Test Base PR %
blake3_resource_id_creation/compute_from_bytes:large 249.2±1.10µs 254.3±1.13µs +2.05%
blake3_resource_id_creation/compute_from_bytes:medium 15.8±0.14µs 15.8±0.06µs 0.00%
blake3_resource_id_creation/compute_from_bytes:small 1398.5±26.77ns 1388.0±4.95ns -0.75%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg 197.7±0.75µs 198.3±0.42µs +0.30%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf 1730.1±6.72µs 1742.3±21.84µs +0.71%
crc32_resource_id_creation/compute_from_bytes:large 86.8±0.61µs 87.0±2.55µs +0.23%
crc32_resource_id_creation/compute_from_bytes:medium 5.4±0.02µs 5.4±0.01µs 0.00%
crc32_resource_id_creation/compute_from_bytes:small 92.4±0.41ns 92.4±0.50ns 0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg 65.4±0.30µs 65.4±0.80µs 0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf 954.9±8.85µs 963.1±5.07µs +0.86%
index_build/index_build/../test-assets/ 1047.5±39.38µs N/A N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksPMLazq 1073.4±44.60µs N/A N/A
resource_index/index_get_resource_by_id 12.9±0.04ns N/A N/A
resource_index/index_get_resource_by_path 60.0±0.40ns N/A N/A
resource_index/index_track_addition 1215.8±27.23µs N/A N/A
resource_index/index_track_modification 1364.4±26.55µs N/A N/A
resource_index/index_track_removal 1188.5±19.72µs N/A N/A
resource_index/index_update_all 10.2±0.44ms N/A N/A

@tareknaser tareknaser marked this pull request as ready for review July 2, 2024 14:26
@tareknaser tareknaser requested a review from kirillt July 2, 2024 14:27
Copy link

github-actions bot commented Jul 2, 2024

Benchmark for cb94ee1

Click to view benchmark
Test Base PR %
blake3_resource_id_creation/compute_from_bytes:large 253.3±0.83µs 250.1±1.29µs -1.26%
blake3_resource_id_creation/compute_from_bytes:medium 15.7±0.07µs 15.7±0.21µs 0.00%
blake3_resource_id_creation/compute_from_bytes:small 1387.9±5.89ns 1386.0±7.03ns -0.14%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg 198.7±0.40µs 198.2±2.96µs -0.25%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf 1735.6±6.81µs 1743.0±14.31µs +0.43%
crc32_resource_id_creation/compute_from_bytes:large 87.5±5.44µs 86.6±0.29µs -1.03%
crc32_resource_id_creation/compute_from_bytes:medium 5.4±0.03µs 5.4±0.05µs 0.00%
crc32_resource_id_creation/compute_from_bytes:small 92.5±0.77ns 92.6±1.44ns +0.11%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg 65.2±1.47µs 65.6±0.54µs +0.61%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf 974.7±2.93µs 964.2±16.65µs -1.08%
index_build/index_build/../test-assets/ 1072.7±7.28µs N/A N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksDIqEnR 1050.7±8.68µs N/A N/A
resource_index/index_get_resource_by_id 11.8±0.15ns N/A N/A
resource_index/index_get_resource_by_path 45.2±0.54ns N/A N/A
resource_index/index_track_addition 1194.2±21.98µs N/A N/A
resource_index/index_track_modification 1430.8±69.23µs N/A N/A
resource_index/index_track_removal 1192.5±43.15µs N/A N/A
resource_index/index_update_all 9.8±0.48ms N/A N/A

fs-index/src/tests.rs Outdated Show resolved Hide resolved
fs-index/src/index.rs Outdated Show resolved Hide resolved
fs-index/src/index.rs Outdated Show resolved Hide resolved
fs-index/src/index.rs Outdated Show resolved Hide resolved
fs-index/src/index.rs Outdated Show resolved Hide resolved
fs-index/src/index.rs Outdated Show resolved Hide resolved
fs-index/src/index.rs Outdated Show resolved Hide resolved
fs-index/src/index.rs Outdated Show resolved Hide resolved
fs-index/src/index.rs Outdated Show resolved Hide resolved
fs-index/src/index.rs Outdated Show resolved Hide resolved
dev-hash/benches/crc32.rs Outdated Show resolved Hide resolved
dev-hash/benches/blake3.rs Outdated Show resolved Hide resolved
data-resource/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Member

@kirillt kirillt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly good, but update_all might be different from what it was. Not sure about modified field in the IndexUpdate struct. Overall, the index should provide API in terms of resource, not paths. Ideally, the user should not care about locations, only about content.

Also, let's move Track API into separate PR because it should be considered more carefully.

Copy link

github-actions bot commented Jul 5, 2024

Benchmark for 5d6d084

Click to view benchmark
Test Base PR %
blake3_resource_id_creation/compute_from_bytes:large 249.1±1.33µs 247.7±1.07µs -0.56%
blake3_resource_id_creation/compute_from_bytes:medium 15.8±0.22µs 15.7±0.06µs -0.63%
blake3_resource_id_creation/compute_from_bytes:small 1391.8±13.43ns 1385.1±27.55ns -0.48%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg 197.8±0.64µs 198.2±1.00µs +0.20%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf 1752.4±12.09µs 1744.2±34.78µs -0.47%
crc32_resource_id_creation/compute_from_bytes:large 86.6±0.16µs 86.6±0.15µs 0.00%
crc32_resource_id_creation/compute_from_bytes:medium 5.4±0.04µs 5.4±0.07µs 0.00%
crc32_resource_id_creation/compute_from_bytes:small 92.5±0.90ns 92.4±0.77ns -0.11%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg 65.7±0.68µs 65.7±0.22µs 0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf 961.2±7.11µs 963.7±8.38µs +0.26%
index_build/index_build/../test-assets/ 1074.5±3.18µs N/A N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksAI18h8 1069.4±8.79µs N/A N/A
resource_index/index_get_resource_by_id 11.9±0.06ns N/A N/A
resource_index/index_get_resource_by_path 45.0±0.52ns N/A N/A
resource_index/index_update_all 10.1±0.40ms N/A N/A

@tareknaser tareknaser force-pushed the fs_index_refactor branch from 82a02c6 to 41d2da4 Compare July 6, 2024 14:54
Copy link

github-actions bot commented Jul 6, 2024

Benchmark for 34eac0f

Click to view benchmark
Test Base PR %
blake3_resource_id_creation/compute_from_bytes:large 233.7±5.99µs 232.5±4.12µs -0.51%
blake3_resource_id_creation/compute_from_bytes:medium 15.1±0.47µs 15.0±0.42µs -0.66%
blake3_resource_id_creation/compute_from_bytes:small 1286.0±10.92ns 1302.8±31.33ns +1.31%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg 189.1±4.93µs 184.4±2.77µs -2.49%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf 1619.2±15.56µs 1662.8±63.94µs +2.69%
crc32_resource_id_creation/compute_from_bytes:large 80.3±0.67µs 81.4±2.01µs +1.37%
crc32_resource_id_creation/compute_from_bytes:medium 5.0±0.04µs 5.3±0.16µs +6.00%
crc32_resource_id_creation/compute_from_bytes:small 85.6±1.05ns 86.5±1.80ns +1.05%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg 61.0±0.75µs 60.7±0.88µs -0.49%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf 899.2±7.24µs 908.0±24.46µs +0.98%
index_build/index_build/../test-assets/ 972.6±9.78µs N/A N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksMfIXZr 1010.2±14.30µs N/A N/A
resource_index/index_get_resource_by_id 11.3±0.11ns N/A N/A
resource_index/index_get_resource_by_path 42.4±0.60ns N/A N/A
resource_index/index_update_all 9.3±0.28ms N/A N/A

@kirillt
Copy link
Member

kirillt commented Jul 7, 2024

By the way, let's use squashing and rebasing only before the merge into main.

Otherwise, it's difficult to see what was changed after last review...

@ARK-Builders ARK-Builders deleted a comment from github-actions bot Jul 7, 2024
Copy link

github-actions bot commented Jul 7, 2024

Benchmark for decbfeb

Click to view benchmark
Test Base PR %
blake3_resource_id_creation/compute_from_bytes:large 247.8±1.09µs 251.9±0.98µs +1.65%
blake3_resource_id_creation/compute_from_bytes:medium 15.7±0.02µs 15.7±0.09µs 0.00%
blake3_resource_id_creation/compute_from_bytes:small 1418.3±25.91ns 1404.2±96.95ns -0.99%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg 197.9±1.46µs 197.3±0.48µs -0.30%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf 1741.6±34.82µs 1742.2±15.20µs +0.03%
crc32_resource_id_creation/compute_from_bytes:large 86.9±0.30µs 86.7±1.43µs -0.23%
crc32_resource_id_creation/compute_from_bytes:medium 5.4±0.06µs 5.4±0.08µs 0.00%
crc32_resource_id_creation/compute_from_bytes:small 92.4±0.43ns 92.4±0.61ns 0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg 65.6±1.05µs 65.6±0.27µs 0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf 958.5±4.71µs 969.2±4.66µs +1.12%
index_build/index_build/../test-assets/ 1069.2±4.25µs N/A N/A
resource_index/index_build//tmp/ark-fs-index-benchmarkse9OGgy 1059.4±17.68µs N/A N/A
resource_index/index_get_resource_by_id 12.0±0.03ns N/A N/A
resource_index/index_get_resource_by_path 46.1±0.34ns N/A N/A
resource_index/index_update_all 9.8±0.37ms N/A N/A

Copy link

github-actions bot commented Jul 7, 2024

Benchmark for 78d6301

Click to view benchmark
Test Base PR %
blake3_resource_id_creation/compute_from_bytes:large 248.7±1.04µs 251.0±1.33µs +0.92%
blake3_resource_id_creation/compute_from_bytes:medium 15.7±0.09µs 15.7±0.09µs 0.00%
blake3_resource_id_creation/compute_from_bytes:small 1377.0±9.78ns 1387.5±2.89ns +0.76%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg 197.6±0.91µs 197.7±0.71µs +0.05%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf 1734.9±4.08µs 1760.1±48.86µs +1.45%
crc32_resource_id_creation/compute_from_bytes:large 86.4±0.36µs 86.6±0.47µs +0.23%
crc32_resource_id_creation/compute_from_bytes:medium 5.4±0.05µs 5.4±0.03µs 0.00%
crc32_resource_id_creation/compute_from_bytes:small 92.3±0.23ns 92.3±0.26ns 0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg 65.2±0.37µs 65.6±0.56µs +0.61%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf 962.2±5.44µs 955.8±2.86µs -0.67%
index_build/index_build/../test-assets/ 1069.9±6.59µs N/A N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksJNSByH 1067.5±2.98µs N/A N/A
resource_index/index_get_resource_by_id 11.8±0.11ns N/A N/A
resource_index/index_get_resource_by_path 46.7±0.39ns N/A N/A
resource_index/index_update_all 10.2±0.23ms N/A N/A

Copy link

github-actions bot commented Jul 7, 2024

Benchmark for f78add5

Click to view benchmark
Test Base PR %
blake3_resource_id_creation/compute_from_bytes:large 247.7±0.65µs 248.7±1.21µs +0.40%
blake3_resource_id_creation/compute_from_bytes:medium 15.7±0.04µs 15.7±0.12µs 0.00%
blake3_resource_id_creation/compute_from_bytes:small 1386.5±1.46ns 1388.7±1.78ns +0.16%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg 197.8±2.97µs 197.9±0.54µs +0.05%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf 1741.2±6.06µs 1752.9±30.86µs +0.67%
crc32_resource_id_creation/compute_from_bytes:large 86.6±0.33µs 86.6±0.72µs 0.00%
crc32_resource_id_creation/compute_from_bytes:medium 5.4±0.03µs 5.4±0.03µs 0.00%
crc32_resource_id_creation/compute_from_bytes:small 92.7±2.66ns 92.4±0.41ns -0.32%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg 64.8±0.38µs 65.5±1.21µs +1.08%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf 962.5±2.74µs 958.4±8.63µs -0.43%
index_build/index_build/../test-assets/ 1070.3±16.95µs N/A N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksOJvvdj 1079.0±15.57µs N/A N/A
resource_index/index_get_resource_by_id 11.7±0.07ns N/A N/A
resource_index/index_get_resource_by_path 47.4±0.51ns N/A N/A
resource_index/index_update_all 9.7±0.35ms N/A N/A

fs-index/src/index.rs Outdated Show resolved Hide resolved
fs-index/src/index.rs Outdated Show resolved Hide resolved
Copy link

Benchmark for 1134cc1

Click to view benchmark
Test Base PR %
blake3_resource_id_creation/compute_from_bytes:large 250.3±1.31µs 250.2±0.70µs -0.04%
blake3_resource_id_creation/compute_from_bytes:medium 15.6±0.09µs 15.6±0.30µs 0.00%
blake3_resource_id_creation/compute_from_bytes:small 1348.6±4.80ns 1348.3±2.39ns -0.02%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg 199.3±1.12µs 198.0±0.70µs -0.65%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf 1762.5±33.97µs 1740.6±22.46µs -1.24%
crc32_resource_id_creation/compute_from_bytes:large 86.7±0.34µs 87.0±1.08µs +0.35%
crc32_resource_id_creation/compute_from_bytes:medium 5.4±0.02µs 5.4±0.03µs 0.00%
crc32_resource_id_creation/compute_from_bytes:small 92.7±0.49ns 92.6±0.19ns -0.11%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg 65.2±0.45µs 65.2±1.04µs 0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf 994.8±3.76µs 973.1±6.13µs -2.18%
index_build/index_build/../test-assets/ 1056.4±7.91µs N/A N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksZ1q6Pm 111.5±3.04ms N/A N/A
resource_index/index_get_resource_by_id 76.8±0.34ns N/A N/A
resource_index/index_get_resource_by_path 35.9±0.31ns N/A N/A
resource_index/index_update_all 1111.3±43.58ms N/A N/A

Copy link

Benchmark for d2cb695

Click to view benchmark
Test Base PR %
blake3_resource_id_creation/compute_from_bytes:large 248.9±1.10µs 248.5±0.64µs -0.16%
blake3_resource_id_creation/compute_from_bytes:medium 15.5±0.03µs 15.5±0.06µs 0.00%
blake3_resource_id_creation/compute_from_bytes:small 1351.2±12.16ns 1349.8±6.35ns -0.10%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg 198.5±1.51µs 199.0±1.02µs +0.25%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf 1765.8±4.30µs 1780.4±23.61µs +0.83%
crc32_resource_id_creation/compute_from_bytes:large 86.9±0.72µs 86.8±0.57µs -0.12%
crc32_resource_id_creation/compute_from_bytes:medium 5.4±0.03µs 5.4±0.01µs 0.00%
crc32_resource_id_creation/compute_from_bytes:small 93.1±0.97ns 93.0±0.57ns -0.11%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg 65.6±2.34µs 65.6±1.01µs 0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf 978.1±5.54µs 995.1±4.47µs +1.74%
index_build/index_build/../test-assets/ 1052.6±11.98µs N/A N/A
resource_index/index_build//tmp/ark-fs-index-benchmarkspp62rB 114.7±2.04ms N/A N/A
resource_index/index_get_resource_by_id 99.3±0.51ns N/A N/A
resource_index/index_get_resource_by_path 52.5±0.35ns N/A N/A
resource_index/index_update_all 1141.1±35.00ms N/A N/A

@kirillt kirillt mentioned this pull request Jul 22, 2024
@kirillt
Copy link
Member

kirillt commented Jul 22, 2024

Actually, we need multiple values in the added field... because we can have collisions/duplicates, so better to inform the user about it:

pub struct ModifiedPath {
    path: PathBuf,
    modified: SystemTime
}

pub struct IndexUpdate<Id: ResourceId> {
    added: HashMap<Id, HashSet<UpdatedPath>>,
    removed: HashSet<Id>,
}

And we should not duplicate ResourceId in left and right sides, otherwise we'd need to maintain equality of them. So, auxiliary structure ModifiedPath is handy.

Last actions points before merging:

  • multiple values in added field of IndexUpdate
  • fix the build

Copy link

github-actions bot commented Aug 3, 2024

Benchmark for f08cd2e

Click to view benchmark
Test Base PR %
blake3_resource_id_creation/compute_from_bytes:large 247.4±1.14µs 248.6±1.10µs +0.49%
blake3_resource_id_creation/compute_from_bytes:medium 15.6±0.08µs 15.6±0.06µs 0.00%
blake3_resource_id_creation/compute_from_bytes:small 1392.5±31.00ns 1384.7±5.81ns -0.56%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg 197.3±0.34µs 197.9±0.94µs +0.30%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf 1743.1±62.52µs 1746.3±26.60µs +0.18%
crc32_resource_id_creation/compute_from_bytes:large 86.6±0.28µs 86.7±0.54µs +0.12%
crc32_resource_id_creation/compute_from_bytes:medium 5.4±0.09µs 5.4±0.08µs 0.00%
crc32_resource_id_creation/compute_from_bytes:small 93.0±0.69ns 93.0±0.49ns 0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg 64.8±0.38µs 64.8±1.18µs 0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf 973.5±12.28µs 972.4±13.97µs -0.11%
index_build/index_build/../test-assets/ 1077.5±7.88µs N/A N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksqMxIJ4 110.4±1.12ms N/A N/A
resource_index/index_get_resource_by_id 98.9±1.19ns N/A N/A
resource_index/index_get_resource_by_path 52.4±0.32ns N/A N/A
resource_index/index_update_all 1092.5±39.00ms N/A N/A

@tareknaser
Copy link
Collaborator Author

Actually, we need multiple values in the added field... because we can have collisions/duplicates, so better to inform the user about it:

Thanks for catching that.
I've made the updates and added a unit test to verify the implementation's correctness. The test involves updating the content of a file and adding a new file with the same content, then confirming that update_all() returns two entries for the new ID.
I also included some additional tests to cover corner cases we hadn't previously checked with update_all().

And we should not duplicate ResourceId in left and right sides, otherwise we'd need to maintain equality of them. So, auxiliary structure ModifiedPath is handy.

I renamed IndexEntry to ResourceIdWithTimestamp and created a separate struct called ResourcePathWithTimestamp to prevent confusion with IndexedResource and IndexEntry. Do the names make things clearer?

I also ran into a bug where the root path prefix wasn't being stripped from the files in some cases. This issue has been fixed.

Now, all that's left is to fix the build. A couple of tests are failing only on windows for some reason. I tried to resolve it today but couldn't get it to work. I still need to spend some time on it, because I rely on CI logs for debugging.

fs-index/src/index.rs Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Aug 4, 2024

Benchmark for 4705382

Click to view benchmark
Test Base PR %
blake3_resource_id_creation/compute_from_bytes:large 247.2±1.13µs 251.0±5.64µs +1.54%
blake3_resource_id_creation/compute_from_bytes:medium 15.6±0.07µs 15.6±0.04µs 0.00%
blake3_resource_id_creation/compute_from_bytes:small 1382.3±8.47ns 1378.2±9.67ns -0.30%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg 198.2±1.41µs 197.5±0.87µs -0.35%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf 1739.7±4.06µs 1746.1±25.21µs +0.37%
crc32_resource_id_creation/compute_from_bytes:large 86.8±0.61µs 86.8±1.11µs 0.00%
crc32_resource_id_creation/compute_from_bytes:medium 5.4±0.01µs 5.4±0.01µs 0.00%
crc32_resource_id_creation/compute_from_bytes:small 93.0±0.40ns 93.0±0.40ns 0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg 65.1±0.41µs 65.0±1.71µs -0.15%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf 970.7±2.34µs 975.4±14.85µs +0.48%
index_build/index_build/../test-assets/ 1070.0±4.72µs N/A N/A
resource_index/index_build//tmp/ark-fs-index-benchmarkswFbTtd 111.1±1.58ms N/A N/A
resource_index/index_get_resource_by_id 98.1±2.28ns N/A N/A
resource_index/index_get_resource_by_path 53.7±0.31ns N/A N/A
resource_index/index_update_all 1131.4±35.94ms N/A N/A

@tareknaser
Copy link
Collaborator Author

Windows CI error fixed in e0f6e75

Copy link

github-actions bot commented Aug 6, 2024

Benchmark for 2c6d686

Click to view benchmark
Test Base PR %
blake3_resource_id_creation/compute_from_bytes:large 253.8±2.14µs 248.3±1.48µs -2.17%
blake3_resource_id_creation/compute_from_bytes:medium 15.6±0.07µs 15.6±0.07µs 0.00%
blake3_resource_id_creation/compute_from_bytes:small 1372.6±10.20ns 1379.9±8.71ns +0.53%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg 197.4±0.67µs 197.5±0.60µs +0.05%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf 1740.5±4.61µs 1741.3±13.77µs +0.05%
crc32_resource_id_creation/compute_from_bytes:large 86.6±0.37µs 86.7±0.45µs +0.12%
crc32_resource_id_creation/compute_from_bytes:medium 5.4±0.02µs 5.4±0.03µs 0.00%
crc32_resource_id_creation/compute_from_bytes:small 93.0±0.43ns 93.0±0.43ns 0.00%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg 64.8±0.35µs 65.6±3.61µs +1.23%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf 971.5±11.41µs 973.8±2.79µs +0.24%
index_build/index_build/../test-assets/ 1069.6±7.43µs N/A N/A
resource_index/index_build//tmp/ark-fs-index-benchmarks7WQ4tH 110.8±1.46ms N/A N/A
resource_index/index_get_resource_by_id 97.9±0.38ns N/A N/A
resource_index/index_get_resource_by_path 53.0±0.44ns N/A N/A
resource_index/index_update_all 1128.1±50.40ms N/A N/A

@kirillt kirillt self-requested a review August 7, 2024 05:16
Copy link
Member

@kirillt kirillt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Changes:
- Refine `rustfmt` style rules to improve code readability and consistency:
  - Group imports by crate
  - Use Unix-style newlines
  - Limit comment width to 80 characters
- Add `toolchain: nightly` to the Build CI workflow to support nightly features.
- Address several Clippy warnings

Signed-off-by: Tarek <[email protected]>
This refactoring includes:
- Introduced a `Timestamped<Item>` struct to represent items with their last modified time
- Added documentation comments for key structs and public functions
- Implemented an `IndexUpdate` struct to encapsulate the results of update operations, detailing resources added and removed during updates.
- Developed custom serialization and deserialization methods for `ResourceIndex`
- Created a `collisions()` method to identify and return resources with collisions
- Added helper functions like `load_or_build_index()` to load the index from the file system or construct a new one if it doesn't exist

Signed-off-by: Tarek <[email protected]>
This refactor enhances the organization `fs-index`  by:

- Relocating unit tests to a dedicated file
- Implementing the `for_each_type!` macro, which accepts a list of hash function types and executes a block of code for each type

Signed-off-by: Tarek <[email protected]>
…hods

- Benchmarks have been defined for all relevant methods of `ResourceIndex` using `criterion`
- Update README benchmark example

Signed-off-by: Tarek <[email protected]>
@tareknaser tareknaser merged commit 06fadc6 into main Aug 8, 2024
4 checks passed
@tareknaser tareknaser deleted the fs_index_refactor branch August 8, 2024 20:14
Copy link

github-actions bot commented Aug 8, 2024

Benchmark for 84b9573

Click to view benchmark
Test Base PR %
blake3_resource_id_creation/compute_from_bytes:large 248.3±1.61µs 249.5±1.04µs +0.48%
blake3_resource_id_creation/compute_from_bytes:medium 15.7±0.04µs 15.7±0.08µs 0.00%
blake3_resource_id_creation/compute_from_bytes:small 1345.1±4.88ns 1345.9±2.82ns +0.06%
blake3_resource_id_creation/compute_from_path:../test-assets/lena.jpg 197.5±2.60µs 198.0±1.91µs +0.25%
blake3_resource_id_creation/compute_from_path:../test-assets/test.pdf 1780.2±5.67µs 1784.9±37.86µs +0.26%
crc32_resource_id_creation/compute_from_bytes:large 91.7±0.23µs 91.7±0.40µs 0.00%
crc32_resource_id_creation/compute_from_bytes:medium 5.7±0.04µs 5.7±0.09µs 0.00%
crc32_resource_id_creation/compute_from_bytes:small 96.6±0.18ns 96.7±0.41ns +0.10%
crc32_resource_id_creation/compute_from_path:../test-assets/lena.jpg 68.9±2.22µs 64.5±0.24µs -6.39%
crc32_resource_id_creation/compute_from_path:../test-assets/test.pdf 977.8±4.74µs 949.9±4.85µs -2.85%
index_build/index_build/../test-assets/ 1056.1±2.55µs N/A N/A
resource_index/index_build//tmp/ark-fs-index-benchmarksMUUvqb 108.6±2.43ms N/A N/A
resource_index/index_get_resource_by_id 98.4±0.29ns N/A N/A
resource_index/index_get_resource_by_path 53.8±0.34ns N/A N/A
resource_index/index_update_all 1129.2±47.79ms N/A N/A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants