Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inaccurate claimed DAG sizes #1059

Closed
dchoi27 opened this issue Mar 3, 2022 · 2 comments · Fixed by #1196 or #1535
Closed

Inaccurate claimed DAG sizes #1059

dchoi27 opened this issue Mar 3, 2022 · 2 comments · Fixed by #1196 or #1535
Assignees
Labels
kind/bug A bug in existing code (including security flaws) P1 High: Likely tackled by core team if no one steps up topic/pot Issues handled by PT.

Comments

@dchoi27
Copy link
Contributor

dchoi27 commented Mar 3, 2022

DAG sizes for some upload types are incorrect (reporting smaller than actual). The delta between size_claimed and size_actual can get quite big.

This issue is a sister issue to: nftstorage/nft.storage#1427, but higher prio here, given that we are tracking upload sizes for account limits.

@mbommerez going to add this to the shortlist! It somewhat blocks the account limit restrictions.

@dchoi27 dchoi27 added P1 High: Likely tackled by core team if no one steps up kind/bug A bug in existing code (including security flaws) labels Mar 3, 2022
@mbommerez mbommerez added the topic/pot Issues handled by PT. label Mar 7, 2022
@mbommerez
Copy link

Investigation from @flea89 in NFT #1427:

I haven't gone too deep yet, but I'll start sharing my Initial investigation results (in web3.storage):

Data sample 6484 cids:

  • Most of the "problematic" cids are cbor ones (5392 out of 6484)
  • For pb one, 50 with problems out of 50 I've checked are directories

Looking at the code, possible roots of the problem are:

  • for codec pb we rely on metadata to calculate size, which could be deliberately changed
  • in carStat we're calculating size for code PB and raw (with one block), and not CBOR.

CBOR dags
Given the size calculation doesn't happen in .storage AFAICT, I wonder if size_claimed is populated for those cids wrongly in cargo? But I haven't had time to look there yet.

From a quick look, I suspect size_claimed stores the size of the first block rather than the whole dag.

PB directories
I just quickly checked a couple of CIDs, and in this case the we're actually reporting a bigger size in .storage.
ie.
CID: bafybeiduwb4o2fsl2lbmuyigzhjdpluahrexjpd7edlilyl5wmz332vnyq
public.content.size = 715
cargo.dag.size_actual = 690

> ipfs dag stat /ipfs/bafybeiduwb4o2fsl2lbmuyigzhjdpluahrexjpd7edlilyl5wmz332vnyq 
> Size: 690, NumBlocks: 7

> ipfs files stat /ipfs/bafybeiduwb4o2fsl2lbmuyigzhjdpluahrexjpd7edlilyl5wmz332vnyq
> bafybeiduwb4o2fsl2lbmuyigzhjdpluahrexjpd7edlilyl5wmz332vnyq
> Size: 0
> CumulativeSize: 715
> ChildBlocks: 1
> Type: directory

I haven't yet checked why the 2 reports different sizes, (is it unixFs headers or a bug) but I'm sure you know @alanshaw.

@alanshaw can you run a query in prod where you use the dag size from public.content, to make sure this is really a problem for .storage?

orvn added a commit that referenced this issue Jun 13, 2022
* fix: JS errors from docs changes (#1334)

* fix: remove React fragment which is causing an error

This was causing `Each child in a list should have a unique "key" prop`

* fix: indentation from my previous commit

* fix: js errors

* fix: revert package-lock changes

* fix: lint error

Co-authored-by: Adam Alton <[email protected]>

* fix: Removed mistakenly generated link on CID header item in filemanager (#1336)

* fix!: psa pinning APIs - rename requestId to requestid

* chore(main): release website 2.4.0 (#1299)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* make tooltip accessible (#1340)

* chore: rename pinned to psaPinned (#1268)


Co-authored-by: Alan Shaw <[email protected]>

* docs: update peers (#1344)

* feat: respond with unique error message when blocked API key is used (#1302)

* chore(main): release api 6.0.0 (#1325)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* chore(main): release website 2.4.1 (#1342)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix: Set fetch date before changing isFetching state (#1341)

* feat: send email notifications for storage quota usage (#1273)

This includes:
- a general email component in cron package for sending emails
- notifications to web3.storage users when they get to specific thresholds
- notifications to web3.storage admins when users go over their quota


Co-authored-by: Gary Homewood <[email protected]>
Co-authored-by: Paolo <[email protected]>
Co-authored-by: Oli Evans <[email protected]>
Co-authored-by: francois-potato <[email protected]>

* fix: inaccurate used_storage migrations (#1360)

* feat: add user blocking functionality to web3 (#1322)

* chore: do not convert bigint to number (#1366)

* chore: use mailchimp provider in crons (#1368)

* chore: send list of storage quota violators to [email protected], not admin@ (#1369)

* chore: remove unnecessary migration for creating admin user (#1373)

Rename the subsequent migration to keep the numbers sequential.
Now that we're using support@ rather than admin@ for the admin email address, that user already exists on both staging and prod.

* feat: implement postgres optimization (#1305)

* Refactor backups
* Add db migration scripts
* Update db configuration

* fix: typo in Logging constructor (#1346)

I assume this is a typo?

@adamalton can you also look into this logger running during testing in CI?

https://github.com/web3-storage/web3.storage/runs/6553500624?check_suite_focus=true

* fix: db migrations versioning (#1375)

* chore(main): release api 6.1.0 (#1356)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* chore(main): release cron 1.1.0 (#1357)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* docs: note db schema required in api readme

...and tweak website README to note that you are using a mock API

* chore: add test for CORS OPTIONS handler (#1331)

* fix: show custom storage quota to user (#1338)

* chore: fix tags in api user info (#1379)

* chore(main): release api 6.1.1 (#1382)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* chore(main): release website 2.5.0 (#1348)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* chore: change the storage cron job config to run in prod, and every 6 hours (#1371)

* chore: trigger crons workflow on conf change (#1394)

* feat: Adding HasDeleteRestriction user_tag (#1390)

* Adding the type and failing HTTP DELETE operations if this tag is set.
* See nftstorage/admin.storage#66

* fix: clone env so new each request (#1396)

* chore(main): release api 6.2.0 (#1397)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* feat: DB schema and API for user_tag_proposal. (#1006)

* Users create records in this table and admins manage it.

Co-authored-by: trigramdev9 <[email protected]>

* feat: Adding admin ability to search by github_id (#1403)

* See nftstorage/admin.storage#68

* chore: optimise getUserByStorage query (#1405)

* fix: optimise getUserByStorage query to avoid timeouts (#1412)

* chore: update SQL migration to drop the old `users_by_storage_used` function before replacing (#1414)

* fix: update incorrect dag sizes job (#1059) (#1196)

* fix: 404 API http reference links (#1358)

Fixes: #1359

* fix(http docs): incorrect endpoint in description (#1429)

* fix(http docs): incorrect endpoint in description

* test: trigger a rebuild via CI

Co-authored-by: orun <[email protected]>

* chore: add Wrangler worker env for Josh (#1351)

* chore: Add Wrangler worker env for Josh.

* chore: Add GATEWAY_URL to josh wrangler config.

* chore: Add default `GATEWAY_URL` var to wrangler template.

* chore(main): release website 2.5.1 (#1417)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix: docs toc highlight on click (#1392)

* fix: toc highlight on click - docs

* fix: Replace document query with inline conditional classes and tweak scroll magic scene settings

* test: force rebuild

Co-authored-by: svvimming <[email protected]>
Co-authored-by: orun <[email protected]>

* fix: refactor accordion content (#1391)

* fix: refactor accordion content

* style: reduce mobile padding below faq accordion

Co-authored-by: orun <[email protected]>

* fix: Adding user_tag_proposal schema to reset.sql for local dev (#1445)

* Introduced in #1006

* feat: Split file manager table into uploaded & pinned (#1363)

* feat: Filemanager header file type tabs and split table by type

* fix: Storage manager progress bar double border & height

* feat: File manager search results in title + margins and spacing

* test: Console log all files

* test: Console log files and user storage data

* feat: Mock data for files using pinning service + api request for pinned files

* test: Fetch pinning data

* feat: Add loading state to pins tab table

* feat: Add loading state to pins tab table

* feat: Add uploaded/pinned tab url param

* test: Pinned fetch request

* test: Pinned fetch request

* revert 'test: Pinned fetch request'

* revert 'test: Pinned fetch request'

* test: Pinning fetch request with generated token

* revert 'test: Pinning fetch request with generated token'

* test: Pass generated API token to /pins GET request

* test: console log API

* feat: Disable pins table if no files are present & revert pins status to pinned in request

* chore: fix linting warning

* style: Responsive file manager header layout at small breakpoints

* feat: Account page UI minor tweaks

* feat: reduce font size (#1411)

* feat: Message bar incident/maintenance name (#1335)

* feat: Docs automatically generated files

* refactor: Message banner incident/maintenance message displays name

* chore: Remove testing materials

* chore: Remove testing materials

* Revert "chore: Remove testing materials"

This reverts commit ab5f9e9.

* Revert "refactor: Message banner incident/maintenance message displays name"

This reverts commit 7359dbe.

* Revert "feat: Docs automatically generated files"

This reverts commit 26ad9ca.

* refactor: Message banner incident/maintenance message displays name

* chore: Remove testing materials

* chore(main): release cron 1.1.1 (#1413)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix: use NODE_TLS_REJECT_UNAUTHORIZED=0 env var for storage cron job (#1418)

This allows the direct connection to the Postgres DB to work.

* chore: add package-lock change report to PRs (#1453)

Bot to add comment on PRs with a human readable report of changes to package-lock.json

see: https://github.com/marketplace/actions/npm-lockfile-changes

License: (Apache-2.0 AND MIT)
Signed-off-by: Oli Evans <[email protected]>

* docs: clarify payload size limit for /car endpoint (#1457)

* chore: get back integrity and resolved keys for deps (#1456)

* chore: only check package-lock diff on PRs (#1460)

only check package-lock diff on PRs

see: #1453
which produces nice package-lock reports for our PRs but errors when run from not-a-pr.

License: (Apache-2.0 AND MIT)
Signed-off-by: Oli Evans <[email protected]>

* fix: Adjust progress bar styles for pinned files minimum cases

Co-authored-by: Joanna Ong <[email protected]>
Co-authored-by: Adam Alton <[email protected]>
Co-authored-by: Paolo Chillari <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Alan Shaw <[email protected]>
Co-authored-by: e-schneid <[email protected]>
Co-authored-by: Gary Homewood <[email protected]>
Co-authored-by: Paolo <[email protected]>
Co-authored-by: Oli Evans <[email protected]>
Co-authored-by: francois-potato <[email protected]>
Co-authored-by: Vasco Santos <[email protected]>
Co-authored-by: Josh Jarvis <[email protected]>
Co-authored-by: Joe Spencer <[email protected]>
Co-authored-by: Joe Spencer <[email protected]>
Co-authored-by: Jorropo <[email protected]>
Co-authored-by: Yusef Napora <[email protected]>
Co-authored-by: orun <[email protected]>
Co-authored-by: Hugo Dias <[email protected]>
@mbommerez mbommerez reopened this Jun 22, 2022
@mbommerez mbommerez linked a pull request Jul 4, 2022 that will close this issue
@joshJarr
Copy link
Contributor

Looks like the PRs are merged and deployed! Closing this issue now. Welcome to reopen if we need to!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) P1 High: Likely tackled by core team if no one steps up topic/pot Issues handled by PT.
Projects
None yet
4 participants