Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do we know 2 proposals have the same content? #5

Open
BelfordZ opened this issue Mar 3, 2019 · 13 comments
Open

How do we know 2 proposals have the same content? #5

BelfordZ opened this issue Mar 3, 2019 · 13 comments

Comments

@BelfordZ
Copy link
Owner

BelfordZ commented Mar 3, 2019

2.) How do we know 2 proposals have the same content? For instance lets say we have 2 ECIP repos for Geth and Parity clients. Someone opens a PR with the proposal on Geth repo and then Parity people see it and open the same PR on their repo. Parity implements the proposal and merges the PR and then people comment on the Geth proposal which causes a change to the content of the proposal. Geth client implements it and they now both have the same filename with the same hash, but a different content.
Should we use the hash of the content of the file instead (this would mean we can't have the version hash in the content itself, so maybe we could use some nonce/integer or something)?

Originally posted by @phyro in #2 (comment)

@meowsbits
Copy link

meowsbits commented Mar 4, 2019

Or version/semversion the names (eg. The Proposal For Alien Intelligence Readiness, v1), and use the Replaces/Superseded-by header tags.

@meowsbits
Copy link

The problem with numerical versioning is that forked versions can have the same number of modifications w/ different contents.

@meowsbits
Copy link

meowsbits commented Mar 4, 2019

Or you could just hash the content, keeping the header/metadata separate somehow; either by using two files or just hashing the n-end lines of the file. Like everything after the

### Title

    StarIP: <StarIP Version Hash>
    Title: <StarIP title>
    Author: <list of authors' real names>
    Discussions-To: <email address>
    Created: <date created on, in ISO 8601 (yyyy-mm-dd) format>
    Replaces: <StarIP-{StarIP Version Hash}>
    Superseded-By: <StarIP-{StarIP Version Hash}>
    Resolution: <url>

or whatever

@meowsbits
Copy link

meowsbits commented Mar 4, 2019

Example of 2-file approach:

  • prop.2b8aef3b.header
  • prop.2b8aef3b.content

Just maybe don't use .h and .c or github will flavor them like C.

Or, ideally, just write the props in C.

@phyro
Copy link
Collaborator

phyro commented Mar 4, 2019

Something that is doing file content hashing is what I would prefer. I 'proposed' a variant of this in the question with the addition of having the title in the ECIP proposal filename (just to make it easier to know what is what. I remember tracing transactions at some point and it was hard because hashes are nondescriptive, the same thing could happen with *IPs in the long run) nevermind, the title thing is a separate issue.

@phyro
Copy link
Collaborator

phyro commented Mar 9, 2019

Proposing a machine format

what do you guys think about making it a json format. something like:

{
  "header": {
    "hash": "494414ded24da13c451b13b424928821351c78fce49f93d9e1b55f102790c206",
    "content": {
      "title": "Proposing StarIP JSON",
      "author": ["Author1", "Author2"],
      "discussions-to": "[email protected]",
      "created": "ISO 8601 date",
      "replaces": "starIp assigned name including the hash",
      "superseded-by": "starIp assigned name including the hash",
      "resolution": "http://resulutionurl.com",
    },
  },
  "body": {
    "hash": "2dab7013f332b465b23e912d90d84c166aefbf60689242166e399d7add1c0189",
    "content": {
      "markdown": "# content that can use markdown?",
    }
  }
}

then the deterministic hash for the filename is simply hash(ecip.header.hash + ecip.body.hash). It's also possible to just skip the hash files and do hash(hash(ecip.header) + hash(ecip.body)) or simply cat ecip_xxxx.json | hash. The reason I'd go with a machine format is because it is easier than parsing some custom thing. Some benefits you get with this:

  • easy hash checking automation on CI
  • automatic validation of the header on CI
  • json file could serve as data for aggregating websites (the chronological order of proposals that the current ECIP numbering system provides can be derived also from the created header param)

If the proposals need structure, why don't we just adopt an existing structured format e.g. json. I don't think we lose much by having a machine format because it is very easy to write a transformation from *IP_hash.json -> *IP_hash.html

@phyro
Copy link
Collaborator

phyro commented Mar 9, 2019

Another thing to keep in mind is that if we want to rely on the hash of the content, then the content itself should be implementation agnostic e.g. not client specific - otherwise another team adopting the IP would either have to live with some stuff that is not really related to them or change the content which would in turn change the hash. Maybe it also should not reference any specific implementations?
Perhaps a way to allow client specific data would be to not only have header and body but also metadata that adds such additional information if needed. The hash of the IP is derived only from the header and body hashes.

@phyro
Copy link
Collaborator

phyro commented Mar 9, 2019

@meowsbits @BelfordZ do you guys think having the proposal be a pure json file makes sense? I think it's kinda cool especially when combined with the URI proposal from sorpaas https://ecips.that.world/24-ECIPURI/

@meowsbits
Copy link

I think it's a neat idea.

My only real hesitation is barrier to entry; I'm inclined to keeping the forum as open as possibly possible ;), where new and old participants alike don't have to have ANY computer skillz. don't have to use fancy computer tools.

Maybe there's a way to marry these ideas, too; like designating the migrations of "raw" proposals to a structured format to a Janitor or whatever (whether human or robot).

@phyro
Copy link
Collaborator

phyro commented Mar 11, 2019

@meowsbits yeah, that was my concern too, but then I thought that it's easy to solve this problem by just making a website with 8 input fields and a live preview of github markdown and a button Generate json that shows the json structure with the hashes and everything. Is this what you meant by migrations of "raw" to structured(I don't know what Janitor is)?

@meowsbits
Copy link

meowsbits commented Mar 11, 2019

Yea, that's exactly what I mean :)

By "Janitor" I just mean something like "moderator" - a person or robot whose job is to handle due diligence and chores around content management.

A web app would work; the obvious caveat here being that you'd have to use a webapp to "properly process" the proposal. Where is it hosted? What language is it written in? Do we approximate the Github markdown flavor w/ a static lib, or use someone's API key to hit the official API? Other options I can think of that would play a similar role would be like a CLI app (markdown -> JSON, etc.)

But I guess that these questions above are assuming "a single" webapp, when in fact there could be any number of apps (web-based or not) that could handle the processing... but in any case they'll need to be built, maintained, and advertised/documented, which points to the idea that the "barriers to entry" concern also involves a "beast of burden" issue, where this "Janitor's" job will need to be taken on by somebody... or something 👿 🤖 😹 .

So I guess these questions of complexity lead me to ask again why are we pursuing this system? Is it all for the sake of an ID-creation system? Do we NEED to be able to programmatically compare proposal content? (Surely there are other benefits of structured data, too) but just want to raise the idea that the magnitude and complexity of a solution should try to match the problem.

EDIT And in general, I think work to simplify requirements for submitting a proposal to bare-bones/most-generically-accessible, and shift any "hard work" or "management" duties to specialists at other stages of the process.

@meowsbits
Copy link

meowsbits commented Mar 11, 2019

@phyro If we were to remove the hash fields from your JSON sketch, then everything could be filled in by hand (by our lowest-common-denominator contributor w/ no minimum necessary knowledge of any specialized computing domain).

And content hashing could be done pretty trivially as a separate task (whether by Janitor, author, curious cat) --

cat my-proposal.json | jq '.body.content.markdown' | md5
# Where 'jq' is any one of many CLI tools for working w/ JSON.

And maybe we could flatten away the content fields too, to just header.title, etc, and body.markdown, for instance.

@phyro
Copy link
Collaborator

phyro commented Mar 11, 2019

@meowsbits

If we were to remove the hash fields from your JSON sketch, then everything could be filled in by hand

I agree 👍 It would be better to remove the hashes and content and do what you proposed above

So I guess these questions of complexity lead me to ask again why are we pursuing this system? Is it all for the sake of an ID-creation system? Do we NEED to be able to programmatically compare proposal content?

This is of course not needed. I'll explain my thought process why I even considered this. The *IPs solve the 'decentralization' problem of the standard ECIPs (which are all in a centralized repository), but they solve it inside the context of Github which is itself centralized. The ECIP-URI proposal solves the 'github context' issue, but it does not solve the deterministic proposal problem (how do we know two proposals are the same and that people are talking about the same thing if it is in multiple places). The determinism was solved by *IPs by suggesting the hashing. This is why I was playing with the idea of mixing both and did a sketch of how it would look like communicating deterministic ECIPs via ecipURI. However, this has it's own problems like how do you know a client is ready for proposal A which in case of *IPs it was 'when they merged the A proposal PR'. I agree that it's probably introducing more complexity than it is solving the actual problem we have right now. If this ever becomes a problem, we can solve it later.

Thanks for the feedback! (and for showing the jq example, never knew this tool existed)

This was referenced Mar 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants