Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add pinned storage to user account API #1044

Merged
merged 7 commits into from
Mar 28, 2022

Conversation

GaryHomewood
Copy link
Contributor

@GaryHomewood GaryHomewood commented Feb 25, 2022

Resolves #1043

Amend /user/account API endpoint to return both uploaded and pinned storage usage for the user.

DB function amended to include a separate query to get the aggregated size of pinned content. This query is potentially expensive, joining 4 tables. Some benchmarking done with indexes added to the psa_pin_request table, but should be further tested against large data volumes before deploy to production.

@GaryHomewood GaryHomewood linked an issue Feb 25, 2022 that may be closed by this pull request
@flea89 flea89 self-assigned this Feb 25, 2022
flea89
flea89 previously requested changes Feb 28, 2022
Copy link
Contributor

@flea89 flea89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one!
Left some comments!

packages/db/postgres/functions.sql Outdated Show resolved Hide resolved
JOIN content c ON c.cid = pr.content_cid
JOIN auth_key a ON a.id = pr.auth_key_id
WHERE a.user_id = query_user_id::BIGINT
AND a.deleted_at is null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting!
TL;DR I'm wondering we should keep consistency with uploads and ignore the auth_key.deleted_at for now.

I can see why you added this, but I'm wondering if should care about this or not.

At the moment it's not entirely "consistent" what an auth_key represents. While in the context of the pinning APIs can be considered a child "scope" (or application if you like) in the web3.storage HTTP APIs tokens are just a means to authorizing requests.
I'm not too opinionated really and, not sure what's best, but wondering if we should take the path of consistency over semantics.

Once we migrate to UCAN auth, I feel we'll have a more clear definition and we'll be more informed as to if/how to exclude uploads for deleted keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok have removed for now, for consistency.

packages/db/postgres/functions.sql Outdated Show resolved Hide resolved
packages/db/test/pinning.spec.js Outdated Show resolved Hide resolved
*/
async getUsedStorage (userId) {
/** @type {{ data: string, error: PostgrestError }} */
/** @type {{ data: import('./db-client-types').UsedStorage, error: PostgrestError }} */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is returning
/** @type {{ data: {uploaded: string, pinned: string}, error: PostgrestError }} */
rather than UsedStorage, isn't it?
(Please run node scripts/cli.js pg-rest-api-types and you will get the right type from type ./postgres/pg-rest-api-types').definitions).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have made this function responsible for converting to number so this return type is now correct.

const { data, error } = await this._client.rpc('user_used_storage', { query_user_id: userId }).single()

if (error) {
throw new DBError(error)
}

return data || 0 // No uploads for the user
return data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the comment above the values of this map arent' numbers atm.
I wonder if we should have this function accountable for parsing the string.

Being the largest integer 9007199254740991, I wonder if it should be ok to just convert the string to Number here.
If we do, I reckon we should use Number.isSafeInteger and raise otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conversion now done here, with error handling for invalid number.

packages/db/postgres/migrations/001-user-used-storage.sql Outdated Show resolved Hide resolved
packages/db/index.d.ts Outdated Show resolved Hide resolved
packages/db/postgres/functions.sql Outdated Show resolved Hide resolved
@GaryHomewood GaryHomewood requested a review from flea89 February 28, 2022 14:09
Copy link
Contributor

@vasco-santos vasco-santos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call on separating both used storage for uploads and pinning for the consumer to decide what to do with them 👍🏼

const { data, error } = await this._client.rpc('user_used_storage', { query_user_id: userId }).single()

if (error) {
throw new DBError(error)
}

return data || 0 // No uploads for the user
return {
uploaded: parseTextToNumber(data.uploaded),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thee numbers might get super big and not fit in JS number. Should we keep it as is in text and WebUI must deal with it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MAX_SAFE_INTEGER is 9007199254740991, if I've done my conversion right that translates to 7.999999999999999112 Pebibyte, which for a single user should be safe enough?
And even if not safe we're raising if that's the case.
I wonder if letting the client deal with it is just moving a problem that we'd have anyway?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which for a single user should be safe enough?

I think it should be enough yes, even though I feel it would be safer to simply rely on BigInt in client instead.

I wonder if letting the client deal with it is just moving a problem that we'd have anyway?

The client would be able to use BigInt. Otherwise we would need to serialize from API to client.

What do you think?

Copy link
Contributor

@flea89 flea89 Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both solutions have their merits and cons, and I'm not too opinionated to go one route or the other.

Here's main drivers for having the DB package accountable for the conversion (and possible error):

  • db client packages don't need to be aware of the casting (and its problems). That means if we end up having to use that number in other contexts is less likely we end up with a "transparent" bugs.
  • The same applies to APIs clients.

I'd like to avoid that, those are the kind of bugs that are really tricky to find.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not a big deal for now. If we are lucky enough to get a user with 7 PiB, then we can review this. For now, let's do the most convenient thing for the calling code and give them numbers as numbers. If this turns out to be a problem, then we can send them as strings and let the rendering code parse it with BigInt, which is well supported now.


const mailTo = useMemo(() => {
const { mail, subject, body } = emailContent
return `mailto:${mail}?subject=${subject}&body=${encodeURIComponent(body.join('\n'))}`
}, [])

const isLoaded = !isLoading && !isFetching
const percentage = Math.ceil((storageData.usedStorage || 0) / MAX_STORAGE * 100)
const percentage = Math.ceil(((storageData.usedStorage.uploaded)) / MAX_STORAGE * 100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not part of your PR, but we should consider both?

Comment on lines 274 to 277
FROM psa_pin_request psa_pr
JOIN content c ON c.cid = psa_pr.content_cid
JOIN pin p ON p.content_cid = psa_pr.content_cid
JOIN auth_key a ON a.id = psa_pr.auth_key_id
Copy link
Contributor

@vasco-santos vasco-santos Mar 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query is dangerous in terms of execution time needed considering we are using 4 tables.

We should add an index for psa_pr.content_cid. Also not entirely sure if we need an individual index for p.content_cid. Can we have some benchmarks on these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some benchmarking based on this data volumne

counts

No index

no-idx

Index on content_cid

content-cid-idx

Index on content_cid and auth_key_id

content-cid+auth-key-idx

Copy link
Contributor

@vasco-santos vasco-santos Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think we did not get super better results in psa_pin_request.content_cid index because we still have a low number of pin requests. I think we should just add both indexes

@alanshaw @hugomrdias what do you think about these indexes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vasco-santos I'm happy to get this PR shipped without the indexes and monitor it. We can follow up with a PR to add indexes on all the foreign key source columns that are used in joins.

@olizilla
Copy link
Contributor

should be further tested against large data volumes before deploy to production.

Has this happened? Are you satisfied that this change will performant enough not to be a problem?

@olizilla
Copy link
Contributor

olizilla commented Mar 23, 2022

Please update the PR with conflict fixes

@olizilla
Copy link
Contributor

Please always include relevant links in the PR description like a link to issue where one exists
#1043

@GaryHomewood
Copy link
Contributor Author

Have included the issue number in the PR description for context, and will resolve conflicts.

re: should be further tested against large data volumes before deploy to production. I have done some bench-marking as suggested, screenshots here: #1044 (comment) but against thousands not millions of rows, so wanted to flag as a possible risk.

@olizilla
Copy link
Contributor

Thinking about it, Heroku supports forking the db, so we could clone prod and test it out manually if we think the query perf is a risk. A forked of prod will have real amounts of data, but wont have the same load, so it's not perfect, but is an option that's available.

@olizilla
Copy link
Contributor

Also, we can roll it out and monitor it. If it starts to get slow we can refactor to store the usage stats in the db and update them during an upload.

ghost
ghost previously requested changes Mar 23, 2022
packages/db/postgres/functions.sql Show resolved Hide resolved
@flea89 flea89 removed their assignment Mar 28, 2022
@GaryHomewood GaryHomewood dismissed ghost ’s stale review March 28, 2022 12:45

Has been addressed

Copy link
Contributor

@olizilla olizilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Good work on adding the extra test to check that it's summing the dag_size for unique root CIDs.

one nitpick, this changeset includes packages/website/package-lock.json which shouldn't be included in this PR. In general, we shouldn't have package-lock.json files in the packages/* directories, and they are not used in prod, so it's minor, but please trim your PRs down to just the files related to the issue being fixed.

Otherwise, merge when ready!

@GaryHomewood GaryHomewood dismissed stale reviews from vasco-santos and flea89 March 28, 2022 15:58

All changes addressed.

@GaryHomewood GaryHomewood merged commit 3200a6e into main Mar 28, 2022
@GaryHomewood GaryHomewood deleted the feat/1043-add-pinned-quota branch March 28, 2022 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add the user's pinned storage to the API
4 participants