Local folder with subfolders #82

XhmikosR · 2019-09-04T13:57:55Z

@JustinBeckwith: I'm trying to add linkinator in the nodejs.org repo.

The file structure there is like this:

build/
  ar/
    index.html
    <more folders>/
      index.html
  ca/
    index.html
    <more folders>/
      index.html
  en/
    index.html
    <more folders>/
      index.html
  <more folders>/
  static/

So, if I do linkinator build/ --recurse it fails because it doesn't see an index.html. If I do linkinator build/en --recurse then I get false positives because linkinator can't find the root.

Is there a workaround or something you could do about it?

Thanks in advance!

The text was updated successfully, but these errors were encountered:

JustinBeckwith · 2019-09-08T03:45:19Z

Ahh, that's not how --recurse works :) The --recurse flag just signals that we are going to follow links from the top level scanned page, and go another level deeper. Everything in linkinator assumes the link checking is happening in the context of either a.) a url you've pointed it to or b.) a directory with an index.html that sprawls out. It doesn't really know anything about filesystem.

To get this to work today - you'd have to run linkinator for each directory. Something like:

for d in */ ; do
    npx linkinator --recurse "$d"
done

XhmikosR · 2019-09-08T13:38:09Z

The thing is that if I do that then linkinator cannot find the root. So I'm stuck :/

JustinBeckwith · 2019-09-08T14:58:27Z

I suppose I'm a little confused. What do you mean by not being able to "find the root"? Effectively - linkinator expects that it starts from a single crawling point, and can follow links to all other links it wants to be checked.

Are you saying that the subdirectories throughout your structure also have independent roots that need to be crawled?

XhmikosR · 2019-09-08T15:07:31Z

If I do linkinator build/en --recurse, then linkinator throws errors because /static/ isn't in the en folder, it's one folder up.

Maybe adding a root option could help in such cases.

I'm not sure how I can make linkinator work in this case. Here's a WIP branch https://github.com/XhmikosR/nodejs.org/tree/master-xmr-linkinator

JustinBeckwith · 2019-09-08T21:29:08Z

Thank you - I get it now! All linkinator does when you point it at a directory is naively spin up a web server from the CWD to serve requests. You could do something very similar:

npx http-server .
for d in */ ; do
    npx linkinator --recurse http://localhost:8080/$d
done

XhmikosR · 2019-09-09T07:50:01Z

Thanks for the reply!

The problem with this approach is that it's not cross-platform, unfortunately. Also, http-server is broken BTW. I personally use the serve package, but it seems I'm having some issues in this case.

That being said, I'll see what I can cook up. I still believe a new option for linkinator to load files from is useful though.

Also a thing I notice from a quick test is that linkinator throws for the link dns-prefetch links.

XhmikosR · 2019-09-14T10:00:28Z

Alright, I successfully managed to set a CLI solution but that scans only one locale.

So, I figure we need to use the API, right? Not very familiar with it but it seems linkinator returns a Promise so my first attempt doesn't work.

If you have any time, here's my WIP branch with the API solution nodejs/nodejs.org@master...XhmikosR:master-xmr-linkinator-2

First, I want to at least land the English locale CLI scanning on nodejs/nodejs.org#2565 and then see how we can improve things with using the API directly.

JustinBeckwith · 2019-09-15T17:19:57Z

Using the API seems like the right thing to do here for sure. Linkinator does return a promise from the check method :) There's a simple example here:
https://github.com/JustinBeckwith/linkinator/#simple-example

JustinBeckwith · 2019-11-03T04:00:39Z

Just wanted to check in @XhmikosR - did you end up getting this rolling with the API? I was thinking alternatively - you could create an index page at the root level (mostly for testing purposes) that has a link to all of the per-language root pages, and then scan that page.

XhmikosR · 2019-11-03T06:37:31Z

@JustinBeckwith I started a patch one day nodejs/nodejs.org@1cdc841 but never got to tweak it :/

Do not hesitate to PR if you have some time :) We definitely get some false positives, but my plan is to just keep a list of whitelisted domains, which will be better than nothing.

nkuehn · 2020-04-24T14:33:18Z

Hi, I have a similar setup with microsites under the root that link to each other. I am now throwing an index.html with nothing but <a href="/foo"></a> (or multiple of these) into the root but it's not really nice. In production the webserver does a HTTP level redirect to the actual entry page, but that's not happening in the local webserver emulation here (how should it know my custom redirect...).

It would be great if the use case would be covered by the CLI, too since it's pretty good already with colored output, grouping etc.

Conceptually the issue is about being able to separate the first entrypoint from the scope of the recursion and the root path of the local webserver.

So what about an optional flag --entrypoint that allows to e.g. call

linkinator ./output-dir --entrypoint /home

With the effect that the local webserver is started on ./output-dir but the crawler starts at http://localhost:0000/home instead of defaulting to http://localhost:0000/home?

(could be a plural form too, taking multiple entrypoints)

Does that make sense? What do you think?

JustinBeckwith · 2020-12-03T04:37:32Z

I think I may have finally cracked this nut 😆 The latest version of linkinator (2.6 and up) now supports passing multiple locations, and it supports globbing. So you should be able to do something like this:

$ linkinator '*/index.html'

This will do a top level scan (you can of course use --recurse if you want to follow links). The path config flag accepts a string or a string[], so this should work from the CLI or the API. Now when you pass multiple paths like this, it will default the server root to process.cwd(). Just to be safe, I added a flag to override that too:

$ linkinator --server-root . "**/index.html"

@XhmikosR @nkuehn want to take this for a spin and let me know how it goes?

nkuehn · 2020-12-03T14:53:49Z

@JustinBeckwith sounds like a nice solution, I'll give it a try. Thanks for the update!

Is the --recurse crawling shared among the matches of the glob or would linkinator individually recurse (and then maybe check some parts twice if the matches link to each other)?

JustinBeckwith · 2020-12-03T17:21:11Z

Any time we check a link, regardless of the top level root path, it will be added to a cache and not checked again. So no double checking :) On the matching logic - the --recurse flag will match the root path given on the CLI to it's own tree of requests. So if a request starts at jbeckwith.com/page/2, during the scan of that tree it will only match links that start with jbeckwith.com/page/2.

nkuehn · 2020-12-03T21:14:18Z

Thanks for the clarification! It's a bit embarassing but I think I am not understanding what the --server-root option really does, I assume I am just projecting what I hope it does.

Given my file layout

...
src
linkinator.config.json
public/
public/website1/
public/website1/index.html
public/website1/fooContent/index.html
public/
public/website2/
public/website2/index.html
public/website2/fooContent/index.html
...

when I call npx linkinator public/*/index.html --server-root public, I am getting

[404] http://localhost:5433/public/website1/index.html
[404] http://localhost:5433/public/website2/index.html
...

but had hoped for

[200] http://localhost:5433/website1/index.html
[200] http://localhost:5433/website2/index.html
...

I can work around it by temporarily copying linkinator.config.json into the public build output directory and then run linkinator from that directory without the --server-root option but since you built it I would prefer using the server root.

JustinBeckwith · 2020-12-03T21:16:30Z

The paths you pass into linkinator will be relative to the server-root. So you could say:

npx linkinator public/*/index.html --server-root .

Or you could say:

npx linkinator */index.html --server-root public

nkuehn · 2020-12-03T21:33:00Z

The latter you propose was what I initially tried, but what I get is

$ npx linkinator */index.html --server-root public
[404] http://localhost:5738/public/index.html
[404] http://localhost:5738/websites/index.html

the version output is 2.7.0. "websites" accidentally also contains an index.html so the glob catches it. To me the output points towards the glob not being relative to the passed server root

github-actions · 2020-12-26T17:32:34Z

🎉 This issue has been resolved in version 2.11.1 🎉

The release is available on:

Your semantic-release bot 📦🚀

nkuehn · 2021-01-08T15:39:11Z

Thanks, using the server root works now!

XhmikosR mentioned this issue Sep 4, 2019

404 links and how to handle them in the future nodejs/nodejs.org#2424

Closed

JustinBeckwith added the question Further information is requested label Sep 8, 2019

XhmikosR mentioned this issue Sep 9, 2019

Add linkinator. nodejs/nodejs.org#2565

Merged

JustinBeckwith added the needs more info label Nov 3, 2019

JustinBeckwith added enhancement New feature or request and removed needs more info question Further information is requested labels Dec 3, 2020

JustinBeckwith self-assigned this Dec 3, 2020

JustinBeckwith mentioned this issue Dec 26, 2020

fix: resolve glob paths based on server root #218

Merged

JustinBeckwith closed this as completed in #218 Dec 26, 2020

github-actions bot added the released label Dec 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local folder with subfolders #82

Local folder with subfolders #82

XhmikosR commented Sep 4, 2019

JustinBeckwith commented Sep 8, 2019

XhmikosR commented Sep 8, 2019

JustinBeckwith commented Sep 8, 2019

XhmikosR commented Sep 8, 2019

JustinBeckwith commented Sep 8, 2019

XhmikosR commented Sep 9, 2019 •

edited

Loading

XhmikosR commented Sep 14, 2019

JustinBeckwith commented Sep 15, 2019

JustinBeckwith commented Nov 3, 2019

XhmikosR commented Nov 3, 2019 •

edited

Loading

nkuehn commented Apr 24, 2020

JustinBeckwith commented Dec 3, 2020

nkuehn commented Dec 3, 2020

JustinBeckwith commented Dec 3, 2020

nkuehn commented Dec 3, 2020

JustinBeckwith commented Dec 3, 2020

nkuehn commented Dec 3, 2020

github-actions bot commented Dec 26, 2020

nkuehn commented Jan 8, 2021

Local folder with subfolders #82

Local folder with subfolders #82

Comments

XhmikosR commented Sep 4, 2019

JustinBeckwith commented Sep 8, 2019

XhmikosR commented Sep 8, 2019

JustinBeckwith commented Sep 8, 2019

XhmikosR commented Sep 8, 2019

JustinBeckwith commented Sep 8, 2019

XhmikosR commented Sep 9, 2019 • edited Loading

XhmikosR commented Sep 14, 2019

JustinBeckwith commented Sep 15, 2019

JustinBeckwith commented Nov 3, 2019

XhmikosR commented Nov 3, 2019 • edited Loading

nkuehn commented Apr 24, 2020

JustinBeckwith commented Dec 3, 2020

nkuehn commented Dec 3, 2020

JustinBeckwith commented Dec 3, 2020

nkuehn commented Dec 3, 2020

JustinBeckwith commented Dec 3, 2020

nkuehn commented Dec 3, 2020

github-actions bot commented Dec 26, 2020

nkuehn commented Jan 8, 2021

XhmikosR commented Sep 9, 2019 •

edited

Loading

XhmikosR commented Nov 3, 2019 •

edited

Loading