Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

Enable multihash support for swarm root hashes & ENS #186

Closed
5 tasks
cobordism opened this issue Jan 2, 2018 · 25 comments
Closed
5 tasks

Enable multihash support for swarm root hashes & ENS #186

cobordism opened this issue Jan 2, 2018 · 25 comments

Comments

@cobordism
Copy link

cobordism commented Jan 2, 2018

Goal: as described in #166 we want to be able to request swarm data using URLs of the form bzz://<multi-hash>/path/in/manifest.

The reason is that this will allow people to store multi-hashes in the ENS resolver contracts at "content" and thereby allowing swarm and ipfs and other systems to exists side by side.

This change also allows us to add to ENS Swarm content that has been uploaded with the --encrypt flag. In the current system that is not possible.

  • Enable retrieval of swarm-content using a multi-hash in the URL
  • Generate a multi-hash when uploading swarm content
  • Document the functionality in the swarm docs
  • Notify the ENS guys -> Need new resolver and new ENS tools.
  • Update all our own ENS names to use a multi-hash
@cobordism cobordism added this to the 0.3 milestone Jan 2, 2018
@gbalint
Copy link

gbalint commented Jan 4, 2018

Is this surely needed for 0.3?

@cobordism
Copy link
Author

I bundled all the backward compatibility breaking changes under the 0.3 label.

@cobordism
Copy link
Author

let's not change the hashes that go into ENS twice. Once for BMT and once more for multihash.

@gbalint
Copy link

gbalint commented Jan 4, 2018

Ok, I get it, let's not break compatibility again after 0.3 if possible

@cobordism
Copy link
Author

I'll repeat my comment from Issue 440 - because the --encrypt flag is new.

With the advent of encrypted swarm uploads (swarm up --encrypt) requiring double length hashes, it is time to update swarm to use multi-hash.

Adding the double-length encrypted swarm hashes to ENS would require a lot of changes. According to Arachnid:
"Presently the return type for content hashes is 'bytes32'. Changing this would require writing a new resolver (ideally, after writing an EIP describing the new type). Existing tools that use the resolver (including manager.ens.domains) would need updating to support the new type."

And if we are already in need of changing so much, this is the perfect opportunity to:

"Instead of bytes32, we can use 'bytes', and start using multihash, so we can support Swarm, IPFS, and any other content-addressed system."

@Neurone
Copy link

Neurone commented Nov 24, 2018

Enable retrieval of swarm-content using a multi-hash in the URL

Saying multihash does not seems to me enough to set a standard for URLs, because a multihash is a binary.
I suggest to choose a base58 representation using the IPFS dictionary because:

  • even using 2 bytes longer hashes (34 vs 32 in this case), it is shorter then the hex representation currently used by Swarm URLs
  • it can work seamlessly with IPFS

@acud
Copy link
Member

acud commented Nov 24, 2018

I suggest to choose a base58 representation using the IPFS dictionary because ...

All well and good, but Keccak256 hashes were picked in order to provide seamless interoperability with the Ethereum blockchain (which uses keccak256). This allows better and easier integration with smart contracts for various purposes. And since Swarm is, after all, a core project of the Ethereum Foundation, this is highly unlikely to change.

@homotopycolimit to veer back into your questions, there's a few considerations to give room for:

  1. should we allow both normal hash format and multihash?
  2. I still don't understand how specifically multihashes would solve the problem with ENS
  3. Multihash for keccak256 is prefixed with 0x1b, are we supposed to add just two more bytes (1b) or four more bytes (0x1b)?
  4. I don't understand how multihashes would solve the problem of publishing content uploaded with --encrypt. You'd still have to include the decryption key and so unless there would be another layer of encapsulation by another manifest which would generate the shorter hash.
  5. This could go in as part of rewriting manifests. Most of the changes go there, since the code basically has to support traversal of multihash manifests and content. I have already written most of the spec for that, however we've de-prioritised it till we finish the xmas edition sprint

@cobordism
Copy link
Author

cobordism commented Nov 24, 2018

A few things to note:
Since creating this issue, there have been discussion at ENS about changes to the default resolver and we should incorporate those changes. In particular I don't think the 'content' field is going to be used in future.

1(one) I thought yes.
2(two)I had been under the impression that a multi hash describes itself i.e. can identify itself as a swarm or ipfs hash... but I guess that's not quite accurate.

4(four) It doesn't. I'm just saying that it order to save to ENS the reference to an encrypted swarm upload, we have to change the ENS resolvers anyway.
5(five). I don't envision any new hashes in any manifest. This is only about requesting something via the http interface. Internally we continue to use standard swarm hashes.

This all came about when Swarm Feeds were introduced. They were (are?) using multihashes. Do you know the current state of affairs there?

edit: autoformatting messed up my numbering.

@jpeletier
Copy link
Contributor

  1. should we allow both normal hash format and multihash?. I still don't understand how specifically multihashes would solve the problem with ENS

I don't see how multihash is helping us except by complicating things around Swarm code. Swarm works using 32-byte hashes only, so why does it need to understand what type of hash it is reading out of ENS? It should just expect 32 bytes out of the resolver, and if those 32 bytes interpreted as a Keccak256 don't yield content after looking up, well, so that content does not exist.

  1. Multihash for keccak256 is prefixed with 0x1b, are we supposed to add just two more bytes (1b) or four more bytes (0x1b)?

I think you are confusing a hex representation of a byte array with how a byte array is stored. The 0x portion is not stored. And 2 hex digits are stored in ENS as 1 byte. Thus, 1b takes 1 byte.

Multihash for Keccak256 is actually prefixed with 0x1b20 which is 2 bytes. 1 byte hash type and 1 byte hash length (0x20 = 32 ). Again, the 0x is just part of the printable, human-readable representation. This is not stored.

  1. I don't understand how multihashes would solve the problem of publishing content uploaded with --encrypt. You'd still have to include the decryption key and so unless there would be another layer of encapsulation by another manifest which would generate the shorter hash.

What is the point of encrypting something and then publishing the decryption key somewhere like ENS where everyone can see it? When browsing such encrypted website, only the content hash should be in ENS, and the decryption key should be something you have in some sort of wallet. Perhaps the browser should prompt you for it like when you try to access a username/password protected URL.

If you insist publishing the decryption key, then some sort of manifest should do the trick

@jpeletier
Copy link
Contributor

jpeletier commented Nov 24, 2018

This all came about when Swarm Feeds were introduced. They were (are?) using multihashes. Do you know the current state of affairs there?

Swarm Feeds does not use multihashes anymore. It works like a key-value store in which the key is YourEthAddress | Topic and the value is an arbitrary byte array of up to 3963 bytes. You can store arbitrary data there. It can be a multihash, a small JSON file or whatever.

If, however, you choose to store in that arbitrary data something that looks like a multihash, then you can use that Feed in combination with bzz:. When a bzz: URL is requested with a Feeds manifest hash, the referenced feed is looked up and the value is checked to see if it is a multihash for content. If so, that multihash is looked up to see if it points to content and that content is returned.

Note that for the above scheme to work, the hash stored in the Feed would not need to be a multihash. It could be a straightforward regular Swarm hash if we make the necessary change.

The whole 0x1b20 prefix thing is driving most people using Feeds+bzz: + ENS nuts. I am in favor or removing it altogether. If you agree, I'll be more than happy to open a PR. This is just a few lines of code.

@cobordism
Copy link
Author

one of the points of encrypting something is so that the storers don't know the contents of what they are storing.
Even if you publicise the decryption key of the root chunk on ENS, you still gain something.
If you are running a swarm node and receive an encrypted chunk, you will not know what root hash it belongs to, and you will not know if the decryption key is in ENS unless you try downloading everything registered there.

Indeed in future the encrypted upload might be the only one. No more plaintext.

@cobordism
Copy link
Author

@ Javier, what it the best way forward for feeds?

@jpeletier
Copy link
Contributor

jpeletier commented Nov 24, 2018

one of the points of encrypting something is so that the storers don't know the contents of what they are storing.

ok.

Indeed in future the encrypted upload might be the only one. No more plaintext.

I would then store the encrypted key in a manifest. The hash of that manifest would go to ENS. That way no changes to ENS are required. This is a huge advantage.

@jpeletier
Copy link
Contributor

@ Javier, what it the best way forward for feeds?

@homotopycolimit regarding what? Regarding multihash, Feeds does not use/have any multihash anywhere in its code.

I posted above another comment about Feeds. Does that answer your question?

@cobordism
Copy link
Author

I saw your comment and thus asked what is the best way forward? Shall we close this issue and change swarm feeds to just use regular swarm hashes?

in regards to decryption keys -- the manifests are themselves encrypted. Everything is. You need a decryption key just to get started. (calling @nagydani - what do you make of unencrypted manifests containing references to encrypted content?)

@jpeletier
Copy link
Contributor

I saw your comment and thus asked what is the best way forward? Shall we close this issue and change swarm feeds to just use regular swarm hashes?

TL;DR. Yes

Longer read:

Just a clarification: Feeds does not "use" multihashes, it blindly stores arbitrary data. Whether that data is a multihash or not, Feeds does not know or care.

bzz: is a "client" of Feeds. In that usage, it expects Feeds to return a 34-byte multihash. (2 bytes 0x1b20 prefix + 32 bytes swarm hash. So somehow, you should have stored those 34 bytes in a feed update before you use bzz, or your lookup will fail.

I would change bzz: to expect a regular 32-byte swarm hash instead of a multihash. I can do this pretty quickly and more than happy to and reduce confusion everywhere.

@jpeletier
Copy link
Contributor

in regards to decryption keys -- the manifests are themselves encrypted. Everything is. You need a decryption key just to get started.

I would add an additional manifest that points to the actually encrypted manifest+data, because even if you do not store the decryption key in Swarm you would then be storing it in ENS: nodes could also scan ENS for decryption keys too (?), so what is the point?

@cobordism
Copy link
Author

cobordism commented Nov 24, 2018

I think using the same hashes for feeds as for regular content sounds good, but is not my decision to make.
let's discuss at next round-table on Tuesday.

To clarify, one more point about encryption: every chunk in the chunk tree of a dataset is encrypted with a different key. Only the decryption key of the root chunk would be visible in ENS. You'd have to recursively download the entire trie in order to decrypt the data chunks. I particular, if you hold one chunk and do not know what dataset it belongs to, you cannot decrypt it.

@acud
Copy link
Member

acud commented Nov 25, 2018

I think you are confusing a hex representation of a byte array with how a byte array is stored. The 0x portion is not stored. And 2 hex digits are stored in ENS as 1 byte. Thus, 1b takes 1 byte.

I am facepalming myself 😭

Multihash for Keccak256 is actually prefixed with 0x1b20 which is 2 bytes. 1 byte hash type and 1 byte hash length (0x20 = 32 ). Again, the 0x is just part of the printable, human-readable representation. This is not stored.

I remembered this but somehow missed the point where the length is written into the multihash. Again facepalm.

in regards to decryption keys -- the manifests are themselves encrypted. Everything is. You need a decryption key just to get started. (calling @nagydani - what do you make of unencrypted manifests containing references to encrypted content?)

That's a bit of a problem actually because when you create an unencrypted manifest with that reference it would be possible to a node to intercept that manifest (it would probably fit into one chunk) and the decryption key could possibly leak. That would be the same case for retrieval requests as forwarding nodes could intercept decryption keys and encrypted references.

@homotopycolimit I think that this issue is within an overlap with this issue (which will actually give a real solution to the problem. well. it won't solve storing decryption keys on ENS) #940

multihashes are just half of a solution since they just describe the type of the hashing algorithm, they don't describe the storage infrastructure on which the content hash resides on. Unless web3 storage and delivery providers (swarm, ipfs, storj) work out a convention to use between themselves on extending multihashes with Dstorage provider identifiers embedded into the multihash, this won't really solve anything.

@zelig
Copy link
Member

zelig commented Nov 25, 2018

I particular, if you hold one chunk and do not know what dataset it belongs to,

reverse indexing ENS is prettty simple thing to do. So only who does not want to know will not know.
but yes it is better than nothing.
BTW 'root access' manifests (the ones that contain the encrypted reference and a link to an ACT) are of course unenctypted so they can be referenced by 32 bytes.
I remember when we ditched this whole topic with ENS because of this.

Now as for supporting multihash on bzz, i am ok with that but find it a bit pointless, since if any client knows that 0x1b20.... hashes need swarm to resolve they can just as well trim this prefix before handing it to bzz ...

@acud
Copy link
Member

acud commented Nov 25, 2018

Whoops, it seems that in the last few weeks the discussion has progressed in the provided EIP.

See:
https://ethereum-magicians.org/t/eip1577-multiaddr-support-for-ens/1969
status-im/status-mobile#6688
https://eips.ethereum.org/EIPS/eip-1577
ethereum/EIPs#1577

New contract: https://github.com/ensdomains/resolvers/blob/master/contracts/PublicResolver.sol

@homotopycolimit I think we can close this issue

@jpeletier
Copy link
Contributor

jpeletier commented Nov 25, 2018 via email

@acud
Copy link
Member

acud commented Nov 25, 2018

@jpeletier my vote is 👍for getting rid of multihashes at this point. they provide no real benefit in the codebase for our current use cases

@cobordism
Copy link
Author

Ok. Let get rid of multihash in swarm and work towards using https://eips.ethereum.org/EIPS/eip-1577 for ENS.

I will now close this issue. We can create new ones as needed.

@jpeletier
Copy link
Contributor

PR to remove multihash: ethereum/go-ethereum#18175

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants