-
Notifications
You must be signed in to change notification settings - Fork 30
if the Permanent Web is “content-addressable”, could it be designed so that each file has only one address? #126
Comments
I don't really see the problem. In my understanding, here is what could alter the final hash of a file:
All in all, I don't see this being too much of a problem. Actually, the biggest problem would be that you can check the file content hash only when you have all the block on disk, rather than check each block individually. |
Hey @Mithgol. I totally grok your line of reasoning, but I also believe IPFS offers exactly what you're proposing already plus the natural next step.
You can already do this:
At the block level, objects are exactly the mulithash of their content. You can totally 100% use IPFS this way. Things like ipfs protobuf objects and IPLD and chunkers are simply serialization formats on top of blocks. They are something you can opt-out of and happily use raw blocks whereever you'd like. As you work this way (with raw blocks) you may begin to lament the lack of more powerful linking data structures (like IPLD for linked data) and the advantages that different chunkers bring. How could you add these without modifying the hash? Say I had It's not a crazy idea: it'd be really nice if The natural result: you begin to hash the linked data + chunks instead of the raw data, and refer to that by its hash, which is exactly where IPFS is today. |
First of all, many thanks @Mithgol for clearly articulating this problem! To be clear, I have no desire or expectation for IPFS to switch to my hash URI scheme. IPFS is focused on paths (and the file system model in general), and regardless of my personal preferences, I think it's a valid idea worth trying. And, after all, the particular string format doesn't really matter (as long as programs can understand it). What does matter, of course, is the way hashes are computed. I'm afraid I've been a constant thorn in @jbenet's side on this issue (sorry!). See our back and forth ipfs/kubo#1953 and the resolution we eventually came to #89. Recently I've been working on a very simple project called the Hash Archive (https://hash-archive.org) (BTW it's still unstable, so please don't spread that link too far...) that builds a mapping of hashes to web URLs. I realized just the other day that the exact same system could be extended to IPFS (and, incidentally, BitTorrent). So while IPFS continues to focus on its own merkle-DAG hashes (for the reasons @noffle explains), it may eventually be possible to build a mapping system of file hashes on top of it. To me, this seems to fit with IPFS's desire to be an infrastructure component. Things can and will be built on top of it to make it more user-friendly and interoperable. That said, I would suggest being careful about embedding IPFS paths in files intended to be "permanent," since those particular hashes may be long gone, even if IPFS is still around. |
But using a hash of the raw file would work for a great number of cases where the files are small enough, and would be a good convention to support even though it doesn't solve every case. For example, most source code files are small enough to link by the raw hash (for an IPFS code repository system). |
Excellent discussion and timely in our discussion of how to make files from the Internet Archive collections available in a world where IPLD is non deterministic (same file, different IPLDs). |
@jefft0 this is an option |
This /ipmh/ approach assumes that current cryptographic hashes are never broken though... Maybe there should be support for an arbitrary number of hashStrings in arbitrary order after /ipmh/? This still could result in some wrong data being downloaded (e.g. if you download files by looking up a CRC32 of The issue still remains that there needs to be a location where the mappings for Multihash --> [IPFS hashes] is stored, it needs to be world writable and readable. Also ideally there must be some way to ensure that vandalism (insert bogus hashes) or sybil attacks ("I'm 5000 users and we all think this bogus hash is valid!") are not possible or at least not rewarded, while somehow not making it necessary to have a central "blessed" authority that 24/7 downloads data (something which might very well be illegal) and verifies hashes to resolve conflicts or hand out certificates/signatures. |
One part of this solution, which I've been suggesting is that the IPLD as produced by IPFS should contain the hash of the content as well when it is known. That way someone who built up a document from a list of shards would know whether they had the right final result. Of course, it would be great if there was a canonical IPLD spec, so we'd all know where to put this hash but I'm told that isn't part of the plan. :-( |
Recently at ipfs/kubo#875 (comment) I have once again encountered the following fact:
ipfs cat $HASH | ipfs add
is not always the original hash(it's all the same fact, just rephrased). Apparently there are several factors (encoding, sharding, IPLD) that influence the IPFS hash and make it different even if the file's contents are not different at all.
I have then found myself questioning whether the current IPFS URIs (designed in ipfs/kubo#1678) are useful in hyperlinks of the Permanent Web. After all, if a hyperlink references an IPFS hash, then that hyperlink by design becomes broken (forever) once that hash can no longer be used to retrieve the file. Even if someone somewhere discovers such lost file in some offline archive and decides to upload that file (or the whole archive) to the Permanent Web, the file is likely to yield a different IPFS hash and thus an old hyperlink (which references the original IPFS hash) is still doomed to remain broken forever. Such behaviour is not okay for the Permanent Web.
What can be done to improve it?
After a few minutes of hectic googling I've encountered @btrask's design of “Common content address URI format” which uses URIs such as
hash://sha256/98493caa8b37eaa26343bbf7
that are based on cryptographic hashes of the addressed content. As long as the hash (“sha256”) stays the same, each file has only one address.In addition to its main advantage (the improved immutability of the addresses), it also has a couple of additional advantages:
SubtleCrypto.digest
available in some Web browsers before JS IPFS is completed.Therefore here's a proposal: implement such addressing on top of IPFS to ensure that each file has only one address (minor correction: “only one” until multihash is upgraded from
sha256
to another algorithm and changes the address inevitably), an address that is determined only by the cryptographic hash of the file's content.As an address @btrask's scheme of
hash://algorithm/hashString
is too long and also not similar to the other IPFS addresses. I propose the form/ipmh/hashString
wherehashString
is a base58-encoded multihash of the file's content (not of the file's merkledag!) andipmh
means “InterPlanetary Multihash”. It's better to refrain from the idea of/iphs/
(“InterPlanetary Hash System”) becauseiphs
andipns
are visually alike (their likeness might cause perception errors in OCR and human vision).I am certain that an implementation won't be an easy task and would need at least the following:
sha256
hashes. Such (or similar) DHT would also eventually be necessary to find new (upgraded) multihashes that correspond to the currentsha256
multihashes./ipfs/
address which it cannot resolve (the “forever dead hyperlink” case, discussed above), it should try using DHT backwards (to find an IPMH for such IPFS) and then use IPMH to look for equivalent IPFS hashes (where “equivalent” means that they designate the same content as the original IPFS).ipfs add
to ensure that/ipmh/
addresses are issued by defaultipfs get
andipfs cat
to ensure that/ipmh/
addresses can be used to retrieve filesipfs mount
to ensure that/ipmh
mountpoint is mountedhttps://ipfs.io/ipmh/
addresses are served/ip4/127.0.0.1/tcp/8080
https://ipfs.io/ipmh/
addresses (and, optionally, also @btrask'shash://sha256/
addresses).hash://
(unlikeipmh://
orhttps://ipfs.io/ipmh/
) is not necessarily IPFS-related and thus the user might want another application (such as StrongLink) to handle it. (Such ambiguity is similar to the case ofmagnet:
hyperlinks.)However, it really seems that there's no other way to make the Permanent Web more permanent, to prevent dead hyperlinks from staying dead.
(Everything that is said here about the files can probably be also said about IPFS directory structures; but I am not sure.)
The text was updated successfully, but these errors were encountered: