Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-maintaining IPNS Record Replication #268

Closed
Winterhuman opened this issue Feb 23, 2022 · 8 comments
Closed

Self-maintaining IPNS Record Replication #268

Winterhuman opened this issue Feb 23, 2022 · 8 comments
Labels
kind/enhancement A net-new feature or an improvement to an existing feature P3 Low: Not priority right now

Comments

@Winterhuman
Copy link

Winterhuman commented Feb 23, 2022

Self-maintaining Record Replication V2

See edit history for Self-maintaining Record Replication V1.

The Problem

IPFS has some challenging attributes which makes storing content for long periods of time without maintenance difficult:

  • Most IPFS nodes don't stay online for long.
  • Many IPFS nodes will clear their caches over time or will just disappear forever.
  • All IPFS nodes will only store content they've explicitly requested, the only exception being DHT entries.

So, if we have an IPNS record, which we want to remain accessible to anyone even after the publisher has stopped republishing records under it's key, how do we achieve this?

Related Issues: ipfs/kubo#8542 ipfs/kubo#4435 ipfs/kubo#3117 ipfs/kubo#1958

The Aim

The goal of this proposal is to achieve the following:

  • Distribute IPNS records amongst a dynamic set of peers, this peer set will change as record holders disappear and become inaccessible, the overall effect is that the IPNS record never disappears completely.
  • Let IPNS record holders be optionally responsible for updating this dynamic set of peers such that, once some lower threshold is reached, all responsible record holders will bring the number of record holders above the threshold again.
  • Achieve all of the above without significantly increasing background bandwidth usage from record holders, even for those responsible for updating the peer set, this follows a similar principle to CircuitV2 to ensure this system is as widely used as possible.
  • And, most importantly, remove the need for IPNS record expiration while still mitigating replay attacks as much as possible.

So, let's outline the scenario:

  • Node A wants to publish a CID under k51key.
  • Node A finds nodes 12D3KooWkay, 12D3KooWkez, and 12D3KooWbey who are willing to add the IPNS record to their DHT. These nodes are now record holders.
  • And now, Node A disappears; no one is able to publish any new IPNS records under k51key, however, the IPNS record should still be around for as long as possible.
  • What can the record holders do?

Record Holder Behaviour

Each IPNS record holder needs to consider the following two questions:

  • What is the minimum number of record holders (lower threshold) that needs to be reached before I update the peer set?
  • Which nodes should be in the dynamic set of peers and by what criteria?

Let's look at question 1...

For record holders which have not opted in to maintaining the peer set above the threshold, their lower threshold is 0 meaning they don't check the threshold and they don't maintain the peer set, this is analogous to Routing: dhtclient.

For record holders which have opted in to maintaining the peer set above the threshold, their lower threshold is configurable per a config option. The lower the threshold, the more likely another record holder's threshold has already been passed, this means that only the select few record holders with high threshold values need to update the peer set.

Let's say nodes 12D3KooWkay, 12D3KooWkez, and 12D3KooWbey have threshold values of 0, 1, and 2 respectively...

  • Node 12D3KooWkay no longer resolves k51key.
  • Node 12D3KooWkez checks the DHT or the PubSub topic to see that two record holders now remain, it's lower threshold is 1 however, so it won't respond.
  • Node 12D3KooWbey checks the DHT or the PubSub topic to see that two record holders now remain, it's lower threshold is 2 so it will start looking for a record holder to replace 12D3KooWkay...

Going back to our second question, who should 12D3KooWbey find to replace 12D3KooWkay? Or in other words, who should ideally be in the dynamic peer set for k51key?

When node A wanted to publish a CID under k51key, it searched towards key to find the original three record holders. So, 12D3KooWbey should also search towards key to find new record holders in order to maintain IPFS's searching properties. Now, let's continue...

  • Node 12D3KooWbey searches towards key and eventually finds node 12D3KooWkab (in reality, multiple record holders will likely be found), 12D3KooWkab is willing to hold the IPNS record for k51key and so it becomes a new record holder, the number of record holders is now above the lower threshold of node 12D3KooWbey.

The optional 5th step stated in https://github.com/ipfs/specs/blob/main/naming/pubsub.md#protocol should prevent old IPNS records from being replicated instead of newer records, this makes replay attacks more difficult as genuine IPFS nodes will actively attempt to keep the newest IPNS records alive.

To prevent a node from setting threshold: 10,000 and over-replicating the record, IPFS nodes that agree to hold an IPNS record must search towards the IPNS address and find the number of providers, if the number of record holders is more than double the node's own threshold value then it must not continue to fetch the record.

Summary

IPFS nodes (IPNS record holders) will have a lower threshold value they can configure, the threshold value defines the minimum number of record holders that need to exist before they search for new record holders.

Advantages

  • IPFS nodes can be volatile and short-lived without affecting the longevity of the IPNS record in the long-term, they are actively replaced when needed.
  • IPFS nodes don't need to communicate with the publisher nor with each other, this is similar to the behaviour derived from Kademlia DHT.
  • This proposal doesn't require extensive changes to the IPNS protocol and reuses the searching logic of publisher nodes for record holders.
  • The lower threshold, as well as the time that should elapse between the checks of the record holder count, are configurable per record holder without any issues.
  • IPNS records no longer need to expiry to (mostly) mitigate replay attacks, IPNS records in this cycle will rarely fully disappear.

Disadvantages

  • IPNS records can be inlined, this means nodes can quickly be filled up without making any requests for content themselves, IPNS record contents could be over-replicated under this system by malicious IPFS nodes.

Possible names

"Self-maintaining IPNS Record Replication" isn't the greatest name, it certainly won't work in a code context, so here's some proposals for possible names:

Lagrange Replication - Space themed, idea of orbiting a point in space, doesn't strictly tie itself to IPNS.

InterPlanetary Name System Replication (IPNSR) - Keeps to IPFS and IPNS name scheme, clearly defines itself as an extension of IPNS, IPNS and IPNSR may be too similar in appearance.

@Winterhuman Winterhuman added the need/triage Needs initial labeling and prioritization label Feb 23, 2022
@Winterhuman
Copy link
Author

@aschmahmann Any chance you're available for feedback on this?

Sorry if you're the wrong person to ping (or if I'm not meant to ping people for spec issues), I'm going off of the other open issues.

@aschmahmann
Copy link
Contributor

aschmahmann commented Mar 4, 2022

Unfortunately, I haven't had much time to look at this and probably won't for the next week or so. I am trying to write up some common FAQs around IPNS and the various issues people have with it soon to make it easier for folks like you to find good places to push forward. I'll post back here once it's up as it'll probably speed up conversation. If I don't post back in the next two weeks feel free to ping me 😄.

A few clarification points that I'll make though just in case you weren't aware:

  • Any node can already put a valid IPNS record in the DHT. You can try it out yourself using ipfs dht get and ipfs dht put using go-ipfs. This means that the ability for any peer to republish a record and keep it alive is already possible. There is some discussion about enabling this to happen within go-ipfs in the ipfs name follow proposal.
  • Unfortunately, this third-party republishing only works up until the record expires (i.e. the EOL is exceeded) and the record is no longer valid. After this the key holder needs to create and publish a new record.
  • While you could just increase your EOLs to be crazy far in the future to avoid or reduce this problem there are some tradeoffs here around what is "safe" to do with records that are not the newest one which is something that can occur with longer EOLs (e.g. Alice publishes a record with a 1 year EOL, then publishes an update the following day with a 1 year EOL, a week later Alice goes offline and Bob publishes her first record to the DHT, now when Charlie fetches the IPNS record he sees the older version not the newer one).
    • For example, am I willing to get a record that's 6 months old, or would I rather error? For a blog it might be fine for you to see an old version, but for downloading the latest version of some software it might be a problem since the old version could be vulnerable

@Winterhuman
Copy link
Author

Winterhuman commented Mar 4, 2022

Thanks for the feedback!

  • I wasn't aware this was already possible, I'll remove it from the disadvantages section then.
  • My main motivation for creating this system was actually to eliminate the need for IPNS record expiration, the idea is that the newest record should never disappear under this scheme and should be replicated more frequently than older records (which should make replay attacks very difficult), however, I've not got any definitive evidence that this is always the case.

Some things I need to investigate (help is appreciated, especially since some of these things are outside of what I can do) is to:

  • Create a formal math proof to show the newest IPNS record always wins in this system, even if there's only one anchor node with the newest record. Or, modify the system to ensure this is always the case if it's not already true.
  • Reduce the number of connections made because of anchor checks to a minimum, which will likely involve leveraging IPNS-PubSub to search for anchor nodes more efficiently.

EDIT: I've now updated the issue with Self-maintaining Record Replication V2.

  • The proposal now more explicitly defines it's goals and purposes.
  • IPNS records no longer require adding the anchors: field, a lower threshold of the number of record holders now triggers the search for new record holders.

@Winterhuman
Copy link
Author

@aschmahmann Just pinging as you said to do

@aschmahmann
Copy link
Contributor

aschmahmann commented Mar 27, 2022

@LynHyper thanks for the ping (and apologies for the delay). Here (https://pl-strflt.notion.site/IPNS-Overview-and-FAQ-071b9b14f12045ea842a7d51cfb47dff) are some general thoughts and FAQs on IPNS tackling some of the questions I've heard most frequently.

If you have more general IPNS questions happy to discuss on that link, on discuss.ipfs.io, or Discord/Matrix. If more interesting conversation comes out of it we can just add it into the doc for the next group of people to benefit from 😄.

Of course if you want to talk more about this proposal we can do that here.

@Winterhuman
Copy link
Author

Winterhuman commented Mar 27, 2022

@aschmahmann Thanks for the link! I think I want to talk more about my proposal here, I want some feedback on the updates I made to the proposal (especially if you think you have a better mechanism for the threshold trigger and such).

It seems my proposal tackles "Increase Flexibility Around Freshness Guarantees".

(Also, I just created #274, the "Default performance" section in the link gave me the idea for it)

@Winterhuman
Copy link
Author

@aschmahmann Pinging for feedback

@Winterhuman
Copy link
Author

I've now converted this proposal into an IPIP: #309

@guseggert guseggert added kind/enhancement A net-new feature or an improvement to an existing feature P3 Low: Not priority right now and removed need/triage Needs initial labeling and prioritization labels Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or an improvement to an existing feature P3 Low: Not priority right now
Projects
None yet
Development

No branches or pull requests

3 participants