-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow auto-linkification of non-standard schemas without calling mdurl.decode
#183
Comments
Thanks for opening your first issue here! Engagement like this is essential for open source projects! 🤗 |
There's a lot to intake here, but I'll start with a question: If there's a JS implementation, why not copy what it does? Are you trying to achieve something that the JS implementation does not do? |
Hey thanks folks for the quick replies! Thing is, they're doing pretty much what I (think) I am doing... Inserting some // the match object after modification
Match {
schema: '%',
index: 14,
lastIndex: 66,
raw: '%9eJYIT1HDNhWOeLK0EhhiHJTPwvDGZWGd/E6CBCG5XY=.sha256',
text: '%259eJYI...',
url: '%259eJYIT1HDNhWOeLK0EhhiHJTPwvDGZWGd/E6CBCG5XY=.sha256'
} result = "<p>Hey check out <a href="#/msg/%259eJYIT1HDNhWOeLK0EhhiHJTPwvDGZWGd%2FE6CBCG5XY%3D.sha256">%9eJYI...</a> to see my mad ssb skillz!</p>" So this is my minimal example to reproduce this: import json
import re
import urllib.parse
from markdown_it import MarkdownIt
MESSAGE_SIGIL_REGEX = r'[a-zA-Z0-9+/=]{44}\.sha256'
message_regex = re.compile(f"^{MESSAGE_SIGIL_REGEX}")
def normalize_message_sigil(obj, match):
old_url = match.url
match.url = urllib.parse.quote(old_url, safe="")
match.text = f"%25{old_url[1:6]}..."
print(json.dumps(match.__dict__, indent=2))
print()
md = MarkdownIt("js-default", {
"typographer": True,
"linkify": True,
"breaks": True,
})
md.linkify.add("%", {"validate": message_regex, "normalize": normalize_message_sigil})
markdown_str = "Hey check out %9eJYIT1HDNhWOeLK0EhhiHJTPwvDGZWGd/E6CBCG5XY=.sha256 it's epic"
print(md.render(markdown_str)) This generates the same kind of match: {
"schema": "%",
"index": 14,
"last_index": 66,
"raw": "%9eJYIT1HDNhWOeLK0EhhiHJTPwvDGZWGd/E6CBCG5XY=.sha256",
"text": "%259eJYI...",
"url": "%259eJYIT1HDNhWOeLK0EhhiHJTPwvDGZWGd%2FE6CBCG5XY%3D.sha256"
} But the result is this: <p>Hey check out <a href="%259eJYIT1HDNhWOeLK0EhhiHJTPwvDGZWGd%2FE6CBCG5XY%3D.sha256">%259eJYI...</a> it’s epic</p> While if I change the <p>Hey check out <a href="%259eJYIT1HDNhWOeLK0EhhiHJTPwvDGZWGd%2FE6CBCG5XY%3D.sha256">�JYI...</a> it’s epic</p> It works fine if I edit my local # in markdown-it-py/markdown_it/common/normalize_url.py
return mdurl.decode(mdurl.format(parsed) But I understand that this was actually introduced for a reason, so not sure how to proceed here... Sorry for the infodump... it's a bit late here, this is all free-time stuff for me 😆 |
Description / Summary
I propose to allow the unmodified handling of link text during auto-linkification.
Think something like this:
Value / benefit
I launched this as a discussion before but after thinking about it a little more I don't see a workaround for this.
I'm trying to implement some custom extensions to markdown for the scuttlebutt markdown flavour as implemented in ssb-markdown which is the JS implementation and relies on markdown-it. Hence it makes sense to me to make the re-implementation using markdown-it-py. 🙂
One of the key features of ssb is that messages are referenced by ids like this:
%9eJYIT1HDNhWOeLK0EhhiHJTPwvDGZWGd/E6CBCG5XY=.sha256
(feeds have an
@
identifier, and blobs a&
so they may have similar issues, but let's talk about message ids only for the sake of this discussion)Anyway, so these message ids should be linked to urls like this:
Note that the link text is an abbreviated version of the full id, but still begins with a
%
sigil.So I have this regex to match message IDs:
To automatically linkify these ids I set the
%
character up as a schema:The problem I have with this is that once matched by linkify, the link text that results is actually interpreted as a url-encoded string, i.e. the
%9e
gets decoded to a (non-displayable) character.The resulting link isn't exactly what I had hoped for:
I've stepped through this a while now and I haven't figured out yet whether this is a bug or just me holding this wrong...
The resulting text gets put through
state.md.normalizeLinkText
here:markdown-it-py/markdown_it/rules_core/linkify.py
Line 104 in b1a74b4
That function in turn passes the whole thing through
mdurl.decode(mdurl.format(parsed), mdurl.DECODE_DEFAULT_CHARS + "%")
:markdown-it-py/markdown_it/common/normalize_url.py
Line 63 in bb6cf6e
And I don't see any way to prevent it from doing so...
But I thought I could just try to replace the
%
with%25
and letmdurl.decode
replace it back to%
. Alas, if I try that, it indeed produces%259eJYI...
as the output. Not what I wanted...Now, I realize I could just generate the text to escape the
%
into something like%
, but the result is then that the&
sign is escaped into&percnt;9eJYI...
which is also not quite what I want...So... is this an issue of usage? Is there something obvious I'm missing?
Implementation details
As I said in the beginning, I think this would best be signalled while setting up the schema. But I'm not sure how to do this cleanly, since the matches themselves are actually directly added to a linkify instance, not a class of markdown-it-py.
So assigning the flag for "raw/pass-through" mode to the
match
instance seems a bit iffy...Tasks to complete
No response
The text was updated successfully, but these errors were encountered: