Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export clips.tsv with user_id column #4

Closed
kdavis-mozilla opened this issue Dec 10, 2018 · 6 comments
Closed

Export clips.tsv with user_id column #4

kdavis-mozilla opened this issue Dec 10, 2018 · 6 comments
Assignees

Comments

@kdavis-mozilla
Copy link
Contributor

The current clips.tsv does not have a "user_id" column. However, for various reasons, one is needed for the final export. The directory of the wav can not be used as the value of a "user_id" column as it's not properly hashed.

@kdavis-mozilla
Copy link
Contributor Author

@Gregoor Have you had a chance to create an export with a properly hashed "user_id" column?

@Gregoor
Copy link
Contributor

Gregoor commented Dec 12, 2018

I've added a user column to the tsv, it's not hashed though. Could we maybe do this as part of the postprocessing? We would also need to throw out the clip paths, once we got them from S3 (as they also contain the user id).

For hashing we could do this:

import os, hashlib, binascii
salt = os.urandom(23)
hashed = binascii.hexlify(hashlib.pbkdf2_hmac('sha256', b'<client_id>', salt, 100000))
print(hashed)

@kdavis-mozilla
Copy link
Contributor Author

Two points come to mind:

  • First a nit, could the column be called user_id ?
  • Second, what are we shipping for alpha users? clips.tsv or the output of this CLI?

@Gregoor
Copy link
Contributor

Gregoor commented Dec 12, 2018

  1. Will change for next export. Was just going with the naming we've been using in CV
  2. My thinking would've been the output. And maybe an example tsv (with random user-ids) if they want to work on the CorporaCreator

@kdavis-mozilla
Copy link
Contributor Author

I'll make the code use client_id.

@kdavis-mozilla
Copy link
Contributor Author

It's best if the users, without asking for audio, get the entire clips.tsv.

kdavis-mozilla added a commit that referenced this issue Dec 13, 2018
Export has client_id not user_id, so changed code (See #4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants