Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Public key payload format refactoring #191

Closed
neruthes opened this issue Sep 18, 2019 · 36 comments
Closed

[RFC] Public key payload format refactoring #191

neruthes opened this issue Sep 18, 2019 · 36 comments

Comments

@neruthes
Copy link
Contributor

neruthes commented Sep 18, 2019

Metadata

Field Value
Manifest Meta/Article-3
Purpose Refactor
Feedback deadline 2019-12-13 12:00:00 UTC
Explicitly designated reviewers @Jack-Works @SunriseFox @guanbinrui
Later Updated By N/A

Abstract

We have seen a few challenges for the current payload format. And it is time to design a new one.

Background

  • Twitter forbids 🔒 emoji from being used in profile bio.
  • Maintaining a list of usable emoji is not graceful.
  • We are looking forward to adopting MnemonicWords-based keygen.

Basic Ideas

Version Indicating

The public key payload should be accompanied with a leading version indicator.

Emoji Flexibility

The detection of payload should no longer rely on .split('🔒') or certainty of string payload length, but should prefer .match(/(\d+):([\dA-Za-z\/\+]+=*)/g).

Detection shall not rely on external payload.

Structure of Internal Payload

Magic Header

The magic header is 31AB24, which will become Mask after Base64 encoding.

Version Indicator

  • Length: 1B
  • Type: UInt8

Starting from 0x01.

Actual Public Key

  • Length: 33B
  • Type: Binary

33-byte compressed SECP256K1 public key. Trailing padding characters should be removed.

Checksum

Append one checksum byte. Calculate the SHA-256 hash of all bytes afore as 𝐇 (a sequence of 256 bits), and get 𝐇.slice(8, 16) as the checksum byte.

Structure of External Payload

We make network-specific external payloads.

Network Payload
facebook.com 🔒${ internal_payload }🔒
twitter.com 🎭${ internal_payload }🎭

Example

31AB24 01 ED3A2ECE8013ED335741C1D7A78C0C32546C3D56FCFDE0348A3288D89B5D5BFB45 7C

🔒MaskAe06Ls6AE+0zV0HB16eMDDJUbD1W/P3gNIoyiNibXVv7RXw🔒 (length: 51+2)

🔒꼚뜤갞뤺껬몀괾뤳녴귁륺뎌곃깔닃륖믏맠꽈똲뒍뒛뇕럻끗렀🔒 (length: 26+2)

🔒儚夤丞嬺僬岀伾嬳却俁孺喌仃偔哃孖巏寠先堲嚍嚛叕姻剗娀🔒 (length: 26+2)

@Tedko
Copy link
Member

Tedko commented Sep 18, 2019 via email

@Misaka-0x447f
Copy link
Contributor

versioning number can't be very long. hide it in the baseXX string may better for UX, or user always can be attracted by the number, and may thought that this is the lucky number of our team.

@Artoria2e5
Copy link
Contributor

But elliptic curve compressed X points do have a fixed binary length…

@neruthes
Copy link
Contributor Author

But elliptic curve compressed X points do have a fixed binary length…

We will eventually support algorithm diversity and this is not merely political correctness.

This was referenced Oct 11, 2019
@Misaka-0x447f Misaka-0x447f removed their assignment Oct 29, 2019
@SunriseFox SunriseFox removed their assignment Nov 25, 2019
@Jack-Works
Copy link
Member

we now use network specific way to compose the key.

@neruthes
Copy link
Contributor Author

neruthes commented Dec 5, 2019

Time to continue these discussions.

@Tedko
Copy link
Member

Tedko commented Dec 5, 2019

Emoji Flexibility
The detection of payload should no longer rely on .split('🔒') or certainty of string payload length, but should prefer .match(/(\d+):([\dA-Za-z/+]+=*)/g).

@SunriseFox @Jack-Works we just encounter a case on fb with similar problem - better update these soon since users are encounter these problem

@neruthes neruthes changed the title [RFC] Public key payload format refactoring [Draft] [RFC] Public key payload format refactoring Dec 5, 2019
@neruthes
Copy link
Contributor Author

neruthes commented Dec 5, 2019

This RFC is open for discussions. Reviews are requested.

cc @yisiliu @Jack-Works @SunriseFox @guanbinrui

@Tedko
Copy link
Member

Tedko commented Dec 5, 2019

Why do we need Version Indicator with in pub key?

@neruthes neruthes assigned SunriseFox and guanbinrui and unassigned Artoria2e5 Dec 5, 2019
@neruthes
Copy link
Contributor Author

neruthes commented Dec 5, 2019

Why do we need Version Indicator with in pub key?

Disambiguation. There is no guarantee that different versions of the public key payload will come in different lengths.

@neruthes
Copy link
Contributor Author

neruthes commented Dec 5, 2019

Remember that every emoji is considered 2 characters on both Facebook and Twitter. Speaking of efficiency...

Encoding Bits per character Character per character Actual efficiency (b/c)
Base64 6 1 6
Base4096Hangul 12 1 12
Base4096Emoji 12 2 6
Base256Emoji 8 2 4

@Jack-Works
Copy link
Member

Current key representation is small enough, what's the problem?

@neruthes
Copy link
Contributor Author

neruthes commented Dec 6, 2019

Current key representation is small enough, what's the problem?

False Positive. Random Base64 text may be falsely considered as a public key payload if we soften the requirement for emoji presence.

@Tedko
Copy link
Member

Tedko commented Dec 6, 2019

Remember that every emoji is considered 2 characters on both Facebook and Twitter. Speaking of efficiency...

Encoding Bits per character Character per character Actual efficiency (b/c)
Base64 6 1 6
Base4096Hangul 12 1 12
Base4096Emoji 12 2 6
Base256Emoji 8 2 4

Base CJK Char is also a good idea. the only tricky part is the culture background

@neruthes
Copy link
Contributor Author

neruthes commented Dec 6, 2019

Base4096 can be done with either Hanzi and Hangul. Choose carefully.

@Tedko
Copy link
Member

Tedko commented Dec 6, 2019

Base4096 can be done with either Hanzi and Hangul. Choose carefully.

After consulting ppl familiar with Unicode standard, seems like base4096 can only be done in Hanzi & Hangul.

@neruthes
Copy link
Contributor Author

neruthes commented Dec 6, 2019

Base4096Hanzi strings may sometimes form up meaningful substrings. This should be avoided. In this sense, Base4096Hangul can be more safe. My conclusion is that Base4096Hangul is the most desirable option. We may add this as an advanced option for users who enable debug mode by adding a switch option in the profile connecting process.

@Tedko
Copy link
Member

Tedko commented Dec 6, 2019

Base4096Hanzi strings may sometimes form up meaningful substrings. This should be avoided. In this sense, Base4096Hangul can be more safe. My conclusion is that Base4096Hangul is the most desirable option. We may add this as an advanced option for users who enable debug mode by adding a switch option in the profile connecting process.

Again, Hangul may not contain meaningful strings but It's still quiet 🤫 sensitive due to the reason we all know, such as General KJU.

One suggestion - if the user choose the specific i18n, such as Chinese / Korean, let them choose this as an advanced options.
Generally I feel unicode is full of meaningful chars... which is bad in our case

@Tedko
Copy link
Member

Tedko commented Dec 6, 2019

I’m against using Maskbook/Key as Magic header. It’s too long. On the other side ‘PGP’ is short enough.

Key/ is another option.

Also note that the length of the Base64 string must be a integral multiple of 4, thus the original bitstream can be whole bytes.

Proposals are welcomed.

Key/ seems good.
Or maybe Mask/ ?

@neruthes
Copy link
Contributor Author

neruthes commented Dec 6, 2019

I’m against using Maskbook/Key as Magic header. It’s too long. On the other side ‘PGP’ is short enough.

Key/ is another option.
Also note that the length of the Base64 string must be a integral multiple of 4, thus the original bitstream can be whole bytes.
Proposals are welcomed.

Key/ seems good.
Or maybe Mask/ ?

"must be an integral multiple of 4"

@neruthes
Copy link
Contributor Author

neruthes commented Dec 6, 2019

Base64

mb-base64

Base4096Hanzi

mb-hanzi

Base4096Hangul

mb-hangul


Base4096Hangul and Base4096Hanzi are equivalent in terms of character efficiency, but Base4096Hangul has better pixel efficiency. The result of Base4096Hangul encoding will take less pixels to render on most devices. Hangul characters are designed to be subtly smaller than Hanzi characters by type designers of most major fonts. The advantage is clear. But we may ask users for their feelings.

@Tedko
Copy link
Member

Tedko commented Dec 7, 2019

  • Remove Magic Header for now
  • Leave it as Base64 for now as well.

We can implement what’s left of the rfc

@neruthes
Copy link
Contributor Author

neruthes commented Dec 7, 2019

Then there will be no difference with the current public key payload.

This RFC is for further improvements.

@Tedko
Copy link
Member

Tedko commented Dec 9, 2019 via email

@neruthes
Copy link
Contributor Author

neruthes commented Dec 9, 2019

Remember that this is to solve false positive.

@neruthes
Copy link
Contributor Author

neruthes commented Dec 9, 2019

This RFC has received substantial updates and is mostly ready for delivery. Reviews are requested.

cc @yisiliu @Jack-Works @SunriseFox @guanbinrui

@Tedko
Copy link
Member

Tedko commented Dec 9, 2019

  • If no other proper reason, I don't get why do we need Magic Header. Is that for easier detection, like 🔒 in the current ver? If yes, our plan of growth for now is just for Facebook/Twitter, and maybe some possibilities for forums like Reddit. Using Regex to parse bio won't have any problem for now without any header/🔒 . If this is for branding and awareness like PGP: fingerprint, It's a different case
  • Unless we have substantial number of Korean users, we should not implement Hangul version of payload.
  • Though we have some users in places speak Chinese, it's also too early to do Hanzi pub key. We have many other important priorities. Also both Hanzi & Hangul ver of pub key is not human readable at all. Indeed, For non geeky user, I don't think they will be happy put anything not human readable but written in native chars in their bio. It's not making sense to the general public we're trying to attract.

For Version Indicator. Do we need that for now? When will us break the format of pub key? Can we avoid that case? We definitely can add a version indicator easily - just make sure it's necessary.

Instead, we should start to consider these:

  • ENS based human readable public key (for experiments?) A long way to go though. But at least it's human readable. Imagine using alice.twitter.maskbook.xxx as a unique identifier (e.g. blockstack did a not bad job on this). Worthing deep thinking.

Screen Shot 2019-12-10 at 00 33 37

  • Emoji Flexibility (on recognition etc). This part is important. And agree we should do that.

Don't waste time on these standards that general user don't care. Most part of this RFC should be dismissed since it's way too early -- besides the Emoji Flexibility OR verison indicator -- That part need a revised one from this RFC.

For later: We can spend some time thinking about human readable identifiers -- A hard task and we may not have any good ideas for now.

@Tedko Tedko closed this as completed Dec 9, 2019
@Tedko Tedko reopened this Dec 9, 2019
@yisiliu
Copy link
Member

yisiliu commented Dec 10, 2019

  • If no other proper reason, I don't get why do we need Magic Header. Is that for easier detection, like 🔒 in the current ver? If yes, our plan of growth for now is just for Facebook/Twitter, and maybe some possibilities for forums like Reddit. Using Regex to parse bio won't have any problem for now without any header/🔒 . If this is for branding and awareness like PGP: fingerprint, It's a different case

It's for easier machine detections. We will also need to use the same format in Tessercube as a standard.

  • Unless we have substantial number of Korean users, we should not implement Hangul version of payload.
  • Though we have some users in places speak Chinese, it's also too early to do Hanzi pub key. We have many other important priorities. Also both Hanzi & Hangul ver of pub key is not human readable at all. Indeed, For non geeky user, I don't think they will be happy put anything not human readable but written in native chars in their bio. It's not making sense to the general public we're trying to attract.

I agree. We discussed this months ago and we have already come to the conclusion that this was not useful at least for now. However, as you mentioned here, non-geeky users will be happy to have their public key in their native chars - are you serious about this conclusion? Does it mean Korean users would be happy to have a Hangul based public key instead of Latin characters? Please clarify this.

For Version Indicator. Do we need that for now? When will us break the format of pub key? Can we avoid that case? We definitely can add a version indicator easily - just make sure it's necessary.

Yes, we need it for now since it is not finalized yet.

Instead, we should start to consider these:

  • ENS based human readable public key (for experiments?) A long way to go though. But at least it's human readable. Imagine using alice.twitter.maskbook.xxx as a unique identifier (e.g. blockstack did a not bad job on this). Worthing deep thinking.

Screen Shot 2019-12-10 at 00 33 37

I've proposed another public key format as
<publickey>.<curve.name>, e.g. A8/ir492e0uckD9/JYmv+/meD7LwRt3QYiBLhwmc3ebk.secp256k1. However, this part won't help us improve the public key detections. Also, what's the point of ENS here? We need a pure public key that's independent of any 3rd party services.

  • Emoji Flexibility (on recognition etc). This part is important. And agree we should do that.

Don't waste time on these standards that general user don't care. Most part of this RFC should be dismissed since it's way too early -- besides the Emoji Flexibility OR verison indicator -- That part need a revised one from this RFC.

These are worth discussing but since we have already come to a conclusion, we should reopen this conversation once we face challenges putting Latin based public keys in those giant SNS.

@neruthes
Copy link
Contributor Author

  • Magic Header: Human-readable payload type hint. Branding concerns do exist.
  • Version Indicator: Adding 1 byte to reduce future risks is a good trade in my perspective.

@neruthes
Copy link
Contributor Author

I am closing all my RFCs since Meta/Article-3: RFC Peer Review Convention has been abolished. I assume that everyone agrees that this is not worth discussing for now. Some discussions may be restarted, if necessary, after Meta/Bill-4: DSD Peer Review Convention is ratified. In the meantime, if there is any question with documentation and workflow, please consult @yisiliu. Thanks for everyone who has engaged in the discussions so long.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants