-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add pinned storage to user account API #1044
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one!
Left some comments!
packages/db/postgres/functions.sql
Outdated
JOIN content c ON c.cid = pr.content_cid | ||
JOIN auth_key a ON a.id = pr.auth_key_id | ||
WHERE a.user_id = query_user_id::BIGINT | ||
AND a.deleted_at is null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting!
TL;DR I'm wondering we should keep consistency with uploads and ignore the auth_key.deleted_at
for now.
I can see why you added this, but I'm wondering if should care about this or not.
At the moment it's not entirely "consistent" what an auth_key
represents. While in the context of the pinning APIs can be considered a child "scope" (or application if you like) in the web3.storage HTTP APIs tokens are just a means to authorizing requests.
I'm not too opinionated really and, not sure what's best, but wondering if we should take the path of consistency over semantics.
Once we migrate to UCAN auth, I feel we'll have a more clear definition and we'll be more informed as to if/how to exclude uploads for deleted keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok have removed for now, for consistency.
packages/db/index.js
Outdated
*/ | ||
async getUsedStorage (userId) { | ||
/** @type {{ data: string, error: PostgrestError }} */ | ||
/** @type {{ data: import('./db-client-types').UsedStorage, error: PostgrestError }} */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function is returning
/** @type {{ data: {uploaded: string, pinned: string}, error: PostgrestError }} */
rather than UsedStorage
, isn't it?
(Please run node scripts/cli.js pg-rest-api-types
and you will get the right type from type ./postgres/pg-rest-api-types').definitions
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have made this function responsible for converting to number so this return type is now correct.
packages/db/index.js
Outdated
const { data, error } = await this._client.rpc('user_used_storage', { query_user_id: userId }).single() | ||
|
||
if (error) { | ||
throw new DBError(error) | ||
} | ||
|
||
return data || 0 // No uploads for the user | ||
return data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the comment above the values of this map arent' numbers atm.
I wonder if we should have this function accountable for parsing the string.
Being the largest integer 9007199254740991, I wonder if it should be ok to just convert the string to Number here.
If we do, I reckon we should use Number.isSafeInteger
and raise otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conversion now done here, with error handling for invalid number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call on separating both used storage for uploads and pinning for the consumer to decide what to do with them 👍🏼
const { data, error } = await this._client.rpc('user_used_storage', { query_user_id: userId }).single() | ||
|
||
if (error) { | ||
throw new DBError(error) | ||
} | ||
|
||
return data || 0 // No uploads for the user | ||
return { | ||
uploaded: parseTextToNumber(data.uploaded), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thee numbers might get super big and not fit in JS number. Should we keep it as is in text and WebUI must deal with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MAX_SAFE_INTEGER
is 9007199254740991
, if I've done my conversion right that translates to 7.999999999999999112 Pebibyte
, which for a single user should be safe enough?
And even if not safe we're raising if that's the case.
I wonder if letting the client deal with it is just moving a problem that we'd have anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which for a single user should be safe enough?
I think it should be enough yes, even though I feel it would be safer to simply rely on BigInt in client instead.
I wonder if letting the client deal with it is just moving a problem that we'd have anyway?
The client would be able to use BigInt. Otherwise we would need to serialize from API to client.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both solutions have their merits and cons, and I'm not too opinionated to go one route or the other.
Here's main drivers for having the DB package accountable for the conversion (and possible error):
- db client packages don't need to be aware of the casting (and its problems). That means if we end up having to use that number in other contexts is less likely we end up with a "transparent" bugs.
- The same applies to APIs clients.
I'd like to avoid that, those are the kind of bugs that are really tricky to find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not a big deal for now. If we are lucky enough to get a user with 7 PiB, then we can review this. For now, let's do the most convenient thing for the calling code and give them numbers as numbers. If this turns out to be a problem, then we can send them as strings and let the rendering code parse it with BigInt, which is well supported now.
packages/website/pages/account.js
Outdated
|
||
const mailTo = useMemo(() => { | ||
const { mail, subject, body } = emailContent | ||
return `mailto:${mail}?subject=${subject}&body=${encodeURIComponent(body.join('\n'))}` | ||
}, []) | ||
|
||
const isLoaded = !isLoading && !isFetching | ||
const percentage = Math.ceil((storageData.usedStorage || 0) / MAX_STORAGE * 100) | ||
const percentage = Math.ceil(((storageData.usedStorage.uploaded)) / MAX_STORAGE * 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not part of your PR, but we should consider both?
packages/db/postgres/functions.sql
Outdated
FROM psa_pin_request psa_pr | ||
JOIN content c ON c.cid = psa_pr.content_cid | ||
JOIN pin p ON p.content_cid = psa_pr.content_cid | ||
JOIN auth_key a ON a.id = psa_pr.auth_key_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This query is dangerous in terms of execution time needed considering we are using 4 tables.
We should add an index for psa_pr.content_cid
. Also not entirely sure if we need an individual index for p.content_cid
. Can we have some benchmarks on these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I think we did not get super better results in psa_pin_request.content_cid
index because we still have a low number of pin requests. I think we should just add both indexes
@alanshaw @hugomrdias what do you think about these indexes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vasco-santos I'm happy to get this PR shipped without the indexes and monitor it. We can follow up with a PR to add indexes on all the foreign key source columns that are used in joins.
Has this happened? Are you satisfied that this change will performant enough not to be a problem? |
Please update the PR with conflict fixes |
Please always include relevant links in the PR description like a link to issue where one exists |
Have included the issue number in the PR description for context, and will resolve conflicts. re: should be further tested against large data volumes before deploy to production. I have done some bench-marking as suggested, screenshots here: #1044 (comment) but against thousands not millions of rows, so wanted to flag as a possible risk. |
Thinking about it, Heroku supports forking the db, so we could clone prod and test it out manually if we think the query perf is a risk. A forked of prod will have real amounts of data, but wont have the same load, so it's not perfect, but is an option that's available. |
Also, we can roll it out and monitor it. If it starts to get slow we can refactor to store the usage stats in the db and update them during an upload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Good work on adding the extra test to check that it's summing the dag_size for unique root CIDs.
one nitpick, this changeset includes packages/website/package-lock.json
which shouldn't be included in this PR. In general, we shouldn't have package-lock.json
files in the packages/* directories, and they are not used in prod, so it's minor, but please trim your PRs down to just the files related to the issue being fixed.
Otherwise, merge when ready!
All changes addressed.
Resolves #1043
Amend
/user/account
API endpoint to return both uploaded and pinned storage usage for the user.DB function amended to include a separate query to get the aggregated size of pinned content. This query is potentially expensive, joining 4 tables. Some benchmarking done with indexes added to the
psa_pin_request
table, but should be further tested against large data volumes before deploy to production.