Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Programmatically found broken internal links #527

Open
ff6347 opened this issue Sep 8, 2024 · 11 comments
Open

Programmatically found broken internal links #527

ff6347 opened this issue Sep 8, 2024 · 11 comments
Labels
Bug: Broken Link Bug Something isn't working

Comments

@ff6347
Copy link
Contributor

ff6347 commented Sep 8, 2024

Most appropriate sections of the p5.js website?

Other (specify if possible)

What is your operating system?

Mac OS

Web browser and version

not related to browsers

Actual Behavior

Broken links generate a 404

Expected Behavior

Links should serve the linked content

Steps to reproduce

Since I saw some reports of broken links in the website I tried a programmatic approach using broken-link-checker.

How to:

Edit: To make it easier to generate an overview I created a script that outputs GitHub flavored markdown. https://github.com/ff6347/find-broken-links

npm install broken-link-checker
npx blc -roe https://p5js.org > report-blc-prod-live.txt

# The flags do the following
# --exclude-external, -e  Will not check external links.
# --ordered, -o           Maintain the order of links as they appear in their HTML document.
# --recursive, -r         Recursively scan ("crawl") the HTML document(s).

The output needs some cleaning using regular expressions afterwards.

Remove all reports that are okay:

^├───OK───.*?$\n

Then remove all reports of urls that have no broken links:

^Getting links from:.*?$\nFinished! \d{1,1000} links found. \d{1,1000} excluded. 0 broken.

Fix some reports where there is no new line between the report and the next.

look for:

broken\.\n(\w)

Replace with:

broken.\n\n$1

These three should do most of the grunt work. There are some false positive reports about images that I removed manually at the end.


Here is what I've found:

Low Hanging Fruits

There are some links in footer and reference that create 301 redirects which can be avoided through small tweaks.

These are the lines that need a trailing slash:

Generally these are neglect-able, but a HEAD request against them returns a 301 HTTP response. This could be fixed since it is a low hanging fruit.

301 redirects

There are many links in the reference generated form the JSDoc comments in the source of p5.js that are lacking the trailing slash and generate a 301 HTTP redirect. These need some manual labor in the source code. This is a bit tedious work. It should not be a problem keeping this as is since the site-host + Astro take care of the redirects.

Broken Links

There are some broken links also generated from the JSDoc that need fixing and I found broken links in the tutorials. They are hard-coded into the .mdx files and need to be fixed one by one.

Next Steps

IMO the next steps should be

  • creating separate issues for these since fixing it just based on this list is error prone.
  • check external links as well when the internal ones are fixed.

Below you can find the report

report created: Fri, 13 Sep 2024 11:06:07 GMT

Would you like to work on the issue?

yes

@ff6347 ff6347 added the Bug Something isn't working label Sep 8, 2024
@ff6347 ff6347 changed the title Programmatically find broken links Programmatically found broken links Sep 8, 2024
@Qianqianye
Copy link
Collaborator

Qianqianye commented Sep 13, 2024

Thanks @ff6347, this looks great! We have multiple open issues about broken links that we are gradually fixing by batches, since the reasons for the broken links vary.

To provide some context:

  • p5.js website Contribute pages are generated directly from p5.js repo contributor_docs folder, so broken links will be fixed there.
  • p5.js website Reference pages are generated directly from p5.js repo src folder, so we’ll address broken links in that folder.
  • p5.js website Tutorials broken links can be fixed directly on p5.js website repo tutorial folder.
  • Any broken links related to p5.sound are currently on hold, as we’re planning to release a new version of the p5.sound library within the next month.

Here’s my suggestion: Let's use this issue as the main issue to consolidate all the broken link issues. We can update the issue with the latest report when some links are fixed, for examples, some broken links in the report generated on 2024-09-08 are already fixed. For anyone interested in contributing, feel free to start by fixing broken links in the Contribute or Tutorial pages in batches, then we can move to Reference page and others. Please go ahead and open PRs as you work through them. Thank you!

@ff6347
Copy link
Contributor Author

ff6347 commented Sep 13, 2024

hi @Qianqianye thanks for the clarification where to look and start. 👌🏾

I updated the issue description to have check boxes rather then plaintext to track the progress. I also created a script to regenerate the report. https://github.com/ff6347/find-broken-links

I'll ping you as soon as I have the first PR.

@ff6347 ff6347 changed the title Programmatically found broken links Programmatically found broken internal links Sep 13, 2024
muffinista added a commit to muffinista/p5.js-website that referenced this issue Oct 14, 2024
The old link is broken and this seems like the proper one. Addresses processing#527
@thegodworm
Copy link

Hi! I'm new to contributing and I wanted to let you know that I'm interested in fixing the broken links. I will open PR gradually.

@thegodworm
Copy link

thegodworm commented Oct 21, 2024

So for the contributor docs, I don't get it. reading the doc ; p5js-websites pulls from p5.js then uses astro to build the static website right? " https://p5js.org/contribute/contributor_guidelines.md#software-design-principles" doesn't work but

https://p5js.org/contribute/contributor_guidelines/#software-design-principles works, it seems it's all due to .md not being present, thus refering to the correct page. I don't know how to correct this ( p5.js or p5jswebsite repo?) or the conversion script is missconfigured?

@ff6347
Copy link
Contributor Author

ff6347 commented Oct 23, 2024

@thegodworm Astro transforms all Markdown .md files to a folder with the name of that file (minus the extension) and an index.html in it.

@thegodworm
Copy link

after inspections, the links are broken because they are not built correctly (via script?)

<a href="./pathToContent.md#anchor"> point To X</a> -> error 404.

works when corrected like this:

<a href="../pathToContent#anchor">** point To X</a>

adding ../ relative path correctly and removing the md.

I would ask what would be the best course of actions to take?

writing a script automatically addressing the issues?
manually?

or modifying how the script for astro handle site building?

@davepagurek
Copy link
Collaborator

@thegodworm the content is intentionally specifying a .md file so that they work when viewed on GitHub, so I think doing it as part of a script makes sense here. There's already some initial parsing and converting done here that can be modified:

const contentWithRewrittenLinksAndComments = convertMarkdownCommentsToMDX(
rewriteRelativeImageLinks(
rewriteRelativeMdLinks(contents),
assetsOutputBaseUrl,
),
);

@thegodworm
Copy link

@davepagurek yes , I saw that, I will see then what I can do :) . I prefer to ask first because I want to be sure as it's my first collaboration 😃

@davepagurek
Copy link
Collaborator

No problem! Happy to have you helping out 🙂

@ffd8
Copy link

ffd8 commented Nov 19, 2024

Just a quick chime in while checking my teaching docs to see if/what broke and found a pretty big one:
https://p5js.org/learn/ » https://p5js.org/tutorials/ ?

One idea would be to implement legacy dirs that forward, or better yet a custom 404 page that parses the URL window.location.href and compares to a list of new homes for those links? It could include a form input (if a true 404 without redirect) that creates an issue with expected URL and new URL if they research the solution.

@ff6347
Copy link
Contributor Author

ff6347 commented Nov 20, 2024

I was thinking the same thing. The site needs a custom 404 page.

I'm not 100 % sure where this is deployed. If would be on cloudflare or netlify there is the possibility to configure redirect.

But I think it's GitHub pages with Cloudflare in front of it.

As far as I know, there is not the possibility to do this on GitHub pages. So a 404 page with some common links and a search might be necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug: Broken Link Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants