Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove isHTML #519

Closed
wants to merge 1 commit into from
Closed

Remove isHTML #519

wants to merge 1 commit into from

Conversation

f6p
Copy link

@f6p f6p commented Jan 24, 2022

Hello,

Before Turbo will make a request it checks URL against regexp. The check passes for URLs without a file extension and those ending with .htm, .html and .xhtml. On success request is made using Turbo and on failure whole page is replaced.

I would like to propose removing isHTML function. Results of isHTML call are highly dubious and should not be trusted.

Example 1: Lets say there is a download action that maps to /download URL path. Response type of it would be application/octet-stream but Turbo will assume that it is HTML.

Example 2: For URLs like https://github.com/hotwired/turbo/blob/main/README.md where parameter is part of path Turbo will fail to recognize browsable URL.

I believe a better approach would be to try to open everything by default and require to explicitly mark unbrowsable links with data-turbo=”false”. That would remove a bunch of unobvious URL problems like these mentioned above.

Moreover, currently its impossible to force Turbo visit as setting data-turbo to true triggers default behavior (that calls isHTML) and normal request instead of Turbo request is made (if isHTML will return false).

The change is a breaking one because it requires unbrowsable links to have data-turbo=”false” attribute added (not the case currently) but I believe it would result in more predictable less error prone behaviour in general.

Also after removing isHTML locationIsVisitable is reduced to return just the value of isPrefixedBy and is somewhat obsolete. So I replaced calls to locationIsVisitable with isPrefixedBy.

There are some disscussions regarding isHTML probems already open: #185, #385, #480.

Cheers
🍷

@seanpdoyle
Copy link
Contributor

Thank you for opening this change!

In theory, I'm in favor. I'm having some trouble digging through the Git history to get background on why this restriction was introduced.

The earliest version of this was introduced in the port to TypeScript.

Jumping over to the project's turbolinks/turbolinks predecessor, the method was introduced to Avoid visiting links to non-HTML destinations about 6 years ago.

@packagethief I see you're in the git blame output around those lines. Do you have any recollection of what motivated introducing this check?

@packagethief
Copy link
Contributor

@packagethief I see you're in the git blame output around those lines. Do you have any recollection of what motivated introducing this check?

Hey @seanpdoyle! Yes, it was because we wanted to avoid visiting links to images and other media by default.

I agree that the heuristics are somewhat dubious. But I do think it's important that we not try to visit locations that are unlikely to return HTML. For example, it would be extremely unlikely for a path ending in ".jpg" to return an HTML document (but not impossible).

Perhaps it would be better to define the exclusions (likely also perilous), or to delegate the decision to the application.

@packagethief
Copy link
Contributor

Some additional context (from 2016!) regarding public, configurable API:

so your application can define its own rules for what constitutes an “HTML URL.” That way it will work for link clicks and Turbolinks.visit without any annotations or extra arguments.

turbolinks/turbolinks#15 (comment)

@packagethief
Copy link
Contributor

@seanpdoyle, you have my memory gears turning now 😄 It seems we concluded that delegating the decision to applications via configuration was the way to go. Past me agrees:

@f6p
Copy link
Author

f6p commented Jan 28, 2022

@seanpdoyle, @packagethief: Thank you very much for the input.

I guess it makes perfect sense to have some kind of configurable URL based HTML recognition. It would allow application wide defaults without being forced to annotate all links with data-turbo=false. And it also would be backward compatible. Thats much better.

I have some initial implementation that moves locationIsVisitable and isHTML from url to session. I will polish it up and force-push replacing current branch (I think it would be better to keep whole conversation here rather than closing this PR and opening another for configurable HTML check).

Cheers
🍷

@f6p f6p force-pushed the f6p-remove-is-html branch from 3094424 to b8e55f5 Compare January 31, 2022 05:53
@f6p
Copy link
Author

f6p commented Jan 31, 2022

So that’s what I come with.

I moved locationIsVisitible and isHTML from url to session. As Turbo.Location doesn’t exists anymore I thought that it would be the best places for these to live in. Especially that some URL navigation helpers are already there.

After doing that I couldn’t stop thinking that session.isHTML sounds weird (how a session can be a HTML document? 🤪) so I renamed it to session.isVisitable (as Location doesn’t exists anymore it won’t be backward compatible anyway). I’m not sure about the name but that's the best what I was able to think of. My reasoning was that there are already functions/methods like locationIsVisitable and formActionIsVisitable so isVisitable as a shared name part would give a idea about relation between them (they call isVisitable).

TLDR: Turbo.Location.prototype.isHTML become Turbo.session.isVisitable.

Other changes:

I added php extension to list of extensions that resolve as visitable by default. Probably it is as common as html extension and would almost certainly return some kind of HTML content type. I thought that this would be a convenient addition as in issues it already starts to pop out that Turbo doesn’t work with php extensions by default. I think this would bring some peace ✌️.

Added tests that bind user defined functions to session.isVisitable to make sure we won’t lose it again.

Added session parameter to FrameRedirector#constructor to avoid circular dependency loop.

Cheers
🍷

@f6p f6p force-pushed the f6p-remove-is-html branch from b8e55f5 to dd2c46e Compare January 31, 2022 06:01
@ajlive
Copy link

ajlive commented Mar 1, 2022

As a php developer, I thank you!!

@dmcge
Copy link

dmcge commented Mar 23, 2022

There must be an opportunity here to leverage the type attribute on <a>s. Turbo could simply require that links to non-HTML resources must be specified with a non-HTML type, with a check like:

return link.type && link.type != "text/html"

This feels more in keeping with the current scope of Turbo, as a progressive enhancement that uses the HTML we already have. It also seems to meet the need of user configuration, but without requiring users to write their own Javascript.

@dmcge
Copy link

dmcge commented Mar 23, 2022

(Incidentally, on the subject of <a> attributes, example 1 in the description of the original issue could be solved with [download]. Turbo won’t attempt to visit those.)

@f6p
Copy link
Author

f6p commented May 11, 2022

That's idea that could be considered but the problem here is that Turbo can't navigate certain URLs at all. For example now there no is way to visit page with .php extension. To accomplish this one has to build his own version with modified isHTML function.

@ajlive
Copy link

ajlive commented May 12, 2022

Turbo could simply require that links to non-HTML resources must be specified with a non-HTML type

If I'm understanding correctly this means that I would need to add a type attribute to every <a> tag in my app where Turbo is used? If so...

This feels more in keeping with the current scope of Turbo, as a progressive enhancement that uses the HTML we already have.

I don't have any HTML: I have PHP templates. So this would be the opposite of progressive for me: it would require a change to half the files in my codebase.

@f6p f6p force-pushed the f6p-remove-is-html branch from dd2c46e to ebbb3f6 Compare May 13, 2022 05:27
@dmcge
Copy link

dmcge commented May 13, 2022

If you link to a .php URL (like /show.php), my thinking was that Turbo could simply assume it will ultimately return HTML unless told otherwise by the type attribute. We’d replace the existing isHTML check with something like the check I posted above. I’d assume that most apps using Turbo mostly link to HTML pages (whether they are, under the hood, PHP templates, Ruby templates, or whatever else). If that’s correct, you’d only add a type attribute on the rarer case when you don’t want to link to HTML (hopefully that’s not half your app!).

On GitHub, for example, you often encounter links that end with README.md but they are HTML resources (but a Ruby template underneath). Consider the readme for this repo). Turbo would treat that as a non-HTML page today and you’d get a full-page reload. With my proposal, Turbo would control the navigation and you wouldn’t have a full-page reload. If the page was actually markdown and not HTML (again, there would be Ruby underneath), you’d add type="text/markdown" and Turbo would let the browser do its thing.

This may well be a bad idea (and I don’t want to hijack this PR), but I’d personally rather write type="whatever" than a custom Javascript function. 🤷‍♂️

@ajlive
Copy link

ajlive commented May 13, 2022

Ah, thanks for explaining further! Now that I understand it, I think I do like the idea, but you're right it's probably outside the scope of this PR

@kasperkamperman
Copy link

Is there any way to override isHTML or locationIsVisitable with the current release?
I could use this branch maybe, but it's already several commits behind the release.

Still struggling to get Turbo to work with .php extension.

@tleish
Copy link
Contributor

tleish commented Jun 22, 2022

I prefer data-turbo over [download] or [type]. [download] is not an attribute of form and would only work with a elements. data-turbo already works with either element.

I also prefer exposing the isHTML method to a global level to allow override:

Turbo.session.isHTML = () => true;

However, you could also provide a simpler on/off solution with a boolean variable.

Turbo.session.strictVisitMimeType = true; // default

@dhh dhh requested a review from packagethief July 15, 2022 23:42
dhh added a commit that referenced this pull request Jul 15, 2022
Before we fully resolve #519, we can sort out the main hurt from this, which seems to be .php extensions.
@dhh dhh closed this in #629 Jul 15, 2022
dhh added a commit that referenced this pull request Jul 15, 2022
Before we fully resolve #519, we can sort out the main hurt from this, which seems to be .php extensions.
@f6p
Copy link
Author

f6p commented Jul 18, 2022

Hey @dhh it looks like github accidentally closed this one as string resolve #519 was found in #629 comment.

Cheers
🍷

@dhh dhh reopened this Jul 18, 2022
@f6p f6p force-pushed the f6p-remove-is-html branch from ebbb3f6 to 081c273 Compare July 19, 2022 17:45
@f6p
Copy link
Author

f6p commented Jul 19, 2022

Hmmm, I'm not able to run tests locally anymore. Playwright rly dosn't seem to come along nicely with non Ubuntu distros.

@f6p f6p force-pushed the f6p-remove-is-html branch from 081c273 to 98f6bc6 Compare July 20, 2022 03:49
@dhh
Copy link
Member

dhh commented Jul 29, 2022

Enabled tests to be run. Needs a rebase.

@f6p f6p force-pushed the f6p-remove-is-html branch from 98f6bc6 to 1bf49fd Compare July 30, 2022 23:13
@f6p
Copy link
Author

f6p commented Jul 30, 2022

@dhh Rebased it, but approval seems to expire after I force push rebased branch.

@f6p f6p force-pushed the f6p-remove-is-html branch from 1bf49fd to 3e5a825 Compare August 1, 2022 07:33
@dhh
Copy link
Member

dhh commented Aug 1, 2022

Think it needs another rebase. Sorry!

@f6p f6p force-pushed the f6p-remove-is-html branch 2 times, most recently from 17a7678 to da2ca9e Compare August 2, 2022 12:09
@f6p
Copy link
Author

f6p commented Aug 2, 2022

np, rebeased!

Cheers
🍷

@f6p
Copy link
Author

f6p commented Aug 10, 2022

Hey @packagethief, any thoughts how we could move forward with this one?

Cheers
🍷

@manuelpuyol
Copy link
Contributor

@f6p could you fix the conflicts?

@dhh is there anything you feel like this PR is missing? This would be a great improvement to how turbo manages visitable URLs

Add `Session.isVisitable` method whitch is
equivalent of no longer accessible `URL.isHTML`.

This allows `Turbo.session.isVisitable` to be overriden by user and makes possible
navigation on URL's other than those with trailing slash or HTML extenion.
@f6p f6p force-pushed the f6p-remove-is-html branch from da2ca9e to c7f03e1 Compare September 26, 2022 10:42
@f6p
Copy link
Author

f6p commented Sep 26, 2022

@manuelpuyol: I resolved conflicts but its hard to keep this one up to date. Usually it breaks after a few commits to mian. It seems that it touches the lines that change very often.

@manuelpuyol
Copy link
Contributor

@dhh could you enable the tests in this pr? 🙏

@bennadel
Copy link

bennadel commented Jan 5, 2023

I just wanted to say that I arrived here because I am using .cfm (ColdFusion) file extensions. But, I will concede that a ColdFusion page can serve up anything (HTML, CSV, Images, etc). As such, I understand how hard it would be have a blanket acceptance of all .cfm files. I'm new to Hotwire, so I have no advice -- just adding my voice to the discussion.

bennadel added a commit to bennadel/ColdFusion-Hotwire that referenced this pull request Jan 6, 2023
After all of the CFML that I've been writing, it was a bit soul-crushing to finally install Hotwire only to find that it doesn't work with CFML. Or, rather, it doesn't work with `.cfm` extensions. This is by design, though there are talks on the Hotwire side to re-think this constraint:

**GitHub Issue**: [Remove isHTML](hotwired/turbo#519)

Since non-HTML extensions might serve up _anything_, Hotwire can't know that said URL is for an HTML page. And, therefore, can't know that it has to intercept the link-click/form-submission. As such, I have to change the `.cfm` extensions to `.htm` extensions and then enable SES URL rewriting in the CommandBox server. This will rewrite to `index.cfm` with the `cgi.path_info` value being set to the original URL. That said, since I'm only using the `index.cfm` in this app, I don't even have to worry about it.
@radanskoric
Copy link
Contributor

radanskoric commented May 7, 2024

What's preventing this PR from being merged now? Is just a rebase that's needed or some more work is required? I'd be happy to help finish this.

I landed here, like all the others, by running into the issue because Turbo mysteriously stopped working on some URLs. I have URLs that are referencing external IDs. And the IDs sometimes have a "." in it.

If there are still concerns regarding the original motivation behind this feature (preventing Turbo from fetching images) perhaps the check could be inverted from isHTML to !isImage, i.e. from allow to skip list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.