Skip to content
This repository has been archived by the owner on Dec 3, 2020. It is now read-only.

Uniquely identifying products #43

Open
Osmose opened this issue Aug 10, 2018 · 2 comments
Open

Uniquely identifying products #43

Osmose opened this issue Aug 10, 2018 · 2 comments

Comments

@Osmose
Copy link
Contributor

Osmose commented Aug 10, 2018

Currently, products are identified by their full URL. Several shopping sites offer the same product under different URLs, often for SEO purposes.

There's a few options:

  • Per-site custom handling of unique IDs in URLs
  • Normalize URLs to remove non-canonical bits
  • Extract canonical IDs or URLs from the webpage
  • Identify products by a compound key of vendor and product title
  • Do nothing, URLs might actually be good enough
@biancadanforth
Copy link
Collaborator

biancadanforth commented Oct 5, 2018

We should do something, since currently we are using the URL of the page as the product ID, and that's Category 3+ data (for the MVP, we want to keep everything Category 1/2 per @chuckharmston ), since we will be sending the product ID along with many telemetry events.

We could seed a UUID from some of the product information (I'm not sure if generating one from the URL would be considered Category 3+; I've asked rrayborn).

Chuck gave me some examples of UUID modules:

Python's uuid module lets you do that with uuid.uuid5(uuid.NAMESPACE_URL, 'http://www.whatever.com')
Node does too, I just can't speak for it as personally.
https://www.npmjs.com/package/uuid

@biancadanforth
Copy link
Collaborator

Sharing this from my discussion with Osmose earlier over Vidyo:

Using URLs to identify the product locally is not a problem. It’s when that data gets sent away from the client's computer that it becomes a problem.

From a data collection perspective, there’s no way to send a product id that identifies a product across users without something that’s category 3+.

What we can do is use a product id locally and send an alternate identifier (e.g. a primary key that is a UUID or ever-increasing number) to Mozilla. This alternate identifier would allow us to identify a product uniquely from the collection of products for a particular user, but it would prevent us from knowing the exact product or page that the user was on.

@muccimoz muccimoz removed this from the November MVP milestone Nov 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants