Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: unify /upload and /car uploads #480

Merged
merged 10 commits into from
Sep 23, 2021
Merged

chore: unify /upload and /car uploads #480

merged 10 commits into from
Sep 23, 2021

Conversation

olizilla
Copy link
Contributor

@olizilla olizilla commented Sep 17, 2021

  • Have the API pack uploaded files into a CAR, to make the two paths more similar and ideally simplify the upload to S3 path.
    • We need to test this out on staging to see how it impacts the total file size folks can upload, as now we have to have the upload and the CAR in memory at the same time.
  • Simplify dagSize calcualtion. If it's a dag-pb root, then just derive the cumulative size of the dag in the same way that ipfs does from the sum of the link sizes and the node size. if it's not dag-pb, just use the sum of the block sizes in the CAR.
    • This means we dont have to walk the entire DAG and we can get a dagSize figure for dags that are split across mutiple CARs (assuming they stick to the treewalk strategy from carbites, and each contain the root block).

License: (Apache-2.0 AND MIT)
Signed-off-by: Oli Evans [email protected]

Have the API pack uploaded files into a CAR, to make the two paths more similar and ideally simplify the upload to S3 path.

There were two ideas here
1. If uploaded files are packed into a CAR by the API before sending to cluster we then only have to deal wth sending CARs to S3, which would be nicer than having to handle both raw files and CARs.
2. We have a lot of different notions of "how big is this thing"... the size of the uploaded CAR, the cumulative size of uploaded files, the sum of the size of each block in gthe CAR, the size or bytes value returned by cluter, and the size of all the blocks in the dag where we understand the ipld codecs and can follow the links, which could be different to the sum of the size of blocks in a CAR if it has redundent or duplicate blocks. It would be good to simplify that.

The dream was
- To lean on cluster to get the unixFS CumulativeSize when uploading a CAR with files with a unixFS root
- ...and pack raw files uploads into a CAR.

The reality is
- We can't get the unixFS CumulativeSize out of cluster. It might return the FileSize for a CAR with a single file in, but that will be 0 for Directories.
- We can't pack files into a unixFS flavour CAR with ipfs-car in a cloudflare worker today as it fails with a about `importScrtips` not being available in CloudFlare workers.

License: (Apache-2.0 AND MIT)
Signed-off-by: Oli Evans <[email protected]>
@olizilla
Copy link
Contributor Author

olizilla commented Sep 17, 2021

Error: Failed to execute 'importScripts' on 'WorkerGlobalScope': the method is not implemented.

Looks like it might be our webpack config trying to split the service worker bundle into chunks, but as it's for upload to the cloudflare that should just be disabled. Am investigating.

this fixes the `importScripts` error

License: (Apache-2.0 AND MIT)
Signed-off-by: Oli Evans <[email protected]>
@olizilla
Copy link
Contributor Author

Nice! Forcing webpack to not split the bundle fixes the importScripts error.

License: (Apache-2.0 AND MIT)
Signed-off-by: Oli Evans <[email protected]>
License: (Apache-2.0 AND MIT)
Signed-off-by: Oli Evans <[email protected]>
@olizilla
Copy link
Contributor Author

Of note this pushes the api bundlesize up from 686.35KB to 799.28KB

> @web3-storage/[email protected] test:size
> bundlesize

 PASS  ./dist/main.js: 799.28KB < maxSize 1MB (no compression) 

}

return new JSONResponse({ cid })
// TODO: do we want to wrap file uploads like we do car uploads from the client?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when receiving a single file, it won't have a path, while directories might have in the name. For directories with several files with no path like name it would likely make sense. I suggest that we should discuss this externally within an issue and keep the current behaviour in this PR context.

packages/api/src/car.js Outdated Show resolved Hide resolved
packages/api/src/car.js Outdated Show resolved Hide resolved
packages/api/src/car.js Show resolved Hide resolved
License: (Apache-2.0 AND MIT)
Signed-off-by: Oli Evans <[email protected]>
License: (Apache-2.0 AND MIT)
Signed-off-by: Oli Evans <[email protected]>
@olizilla olizilla marked this pull request as ready for review September 21, 2021 11:23
@olizilla olizilla changed the title wip: unify /upload and /car uploads chore: unify /upload and /car uploads Sep 21, 2021
@olizilla olizilla requested a review from alanshaw September 21, 2021 11:30
packages/api/src/car.js Show resolved Hide resolved
packages/api/src/car.js Show resolved Hide resolved
Comment on lines 146 to 147
// We can't use the `bytes` from cluster where it's a unixfs root partial, as it'll be a shard of the total dag size.
// We can't use the `size` from cluster as it's the unixfs FileSize which is 0 for directories, and is not set for raw encoded files, only dag-pb ones.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add this knowledge to the function where we compute the dag size. What do you think?
Reading through the code, it seems strange this placement as it is not clear what it relates to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, fair point, but the bytes and size are properties of the cluster.add response. I'll shuffling it around and see if I can make it make more sense.

olizilla and others added 2 commits September 21, 2021 15:15
License: (Apache-2.0 AND MIT)
Signed-off-by: Oli Evans <[email protected]>
Copy link
Contributor

@vasco-santos vasco-santos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@alanshaw alanshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not finished review, submitting what I have...

packages/api/src/car.js Outdated Show resolved Hide resolved
packages/api/src/car.js Show resolved Hide resolved
packages/api/src/car.js Outdated Show resolved Hide resolved
packages/api/src/upload.js Outdated Show resolved Hide resolved
// NOTE: Tsize is optional, but all ipfs implementations we know of set it.
// It's metadata, that could be missing or deliberately set to an incorrect value.
// This logic is the same as used by go/js-ipfs to display the cumulative size of a dag-pb dag.
return pbNodeBytes.length + pbNode.Links.reduce((acc, curr) => acc + (curr.Tsize || 0), 0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vasco-santos uh oh. i think this should be:

-  return pbNodeBytes.length + pbNode.Links.reduce((acc, curr) => acc + (curr.Tsize || 0), 0)
+  return pbNodeBytes.byteLength + pbNode.Links.reduce((acc, curr) => acc + (curr.Tsize || 0), 0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestions to reliably test this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants