-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build: add Travis job to check for dead URL links #27267
Conversation
.travis.yml
Outdated
- make doc-only | ||
- npm install -g linkinator | ||
script: | ||
- linkinator out/doc/api -r |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just scanning all.html is a good idea? Or does that just not make a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it only run when the docs are touched? (Like a make target that depends on docs?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the same mechanism that add doc label to PR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a change that grep
s the PR's patch diff for *.md
files in doc/api
and only runs the scan if grep returns something.
For local files linkinator
works on a directory level so to scan, say, just all.html
would involve moving files into a temporary directory. If someone has time maybe they can run linkinator
on just all.html
and see how many links it scans.
Looks like some are false positive: https://travis-ci.com/nodejs/node/jobs/193473667#L688. |
Like I said in the OP:
e.g. https://travis-ci.com/nodejs/node/jobs/193473667#L688 [404] https://chromedevtools.github.io/devtools-protocol/v8/Debugger -bash-4.2$ curl --head https://chromedevtools.github.io/devtools-protocol/v8/Debugger
HTTP/1.1 404 Not Found
Server: GitHub.com
Content-Type: text/html; charset=utf-8
ETag: "5cb52d6a-1f0"
Access-Control-Allow-Origin: *
X-GitHub-Request-Id: 2612:7F6A:221DC7:2D2A1F:5CB6B0D2
Content-Length: 496
Accept-Ranges: bytes
Date: Wed, 17 Apr 2019 04:51:31 GMT
Via: 1.1 varnish
Age: 0
Connection: keep-alive
X-Served-By: cache-mdw17362-MDW
X-Cache: MISS
X-Cache-Hits: 0
X-Timer: S1555476691.380921,VS0,VE29
Vary: Accept-Encoding
X-Fastly-Request-ID: 8ced15e319228e1c9c4666e404a8a66c20f7d9d8
-bash-4.2$ -bash-4.2$ curl https://chromedevtools.github.io/devtools-protocol/v8/Debugger
<!DOCTYPE html><meta charset="utf-8"><title>Chrome DevTools Protocol Viewer (redirecting)</title><script>// Copyright (c) 2016 Rafael Pedicini, licensed under the MIT License
var segmentCount=1,l=window.location;l.replace(l.protocol+'//'+l.hostname+(l.port?':'+l.port:'')+l.pathname.split('/').slice(0,1+segmentCount).join('/')+'/?p=/'+l.pathname.slice(1).split('/').slice(segmentCount).join('/').replace(/&/g,'~and~')+(l.search?'&q='+l.search.slice(1).replace(/&/g,'~and~'):'')+l.hash);</script>-bash-4.2$ It seems very weird for a 404 Not Found response page to be redirecting -- There are specific HTTP response codes for redirection. |
|
3022b98
to
2c15115
Compare
- npm install -g linkinator | ||
script: | ||
- if [ "${TRAVIS_PULL_REQUEST}" != "false" ]; then | ||
DOC_FILES=`curl -sL https://github.com/nodejs/node/pull/${TRAVIS_PULL_REQUEST}.patch | grep -o '\bdoc/api/.*\.md\b'`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DOC_FILES=`curl -sL https://github.com/nodejs/node/pull/${TRAVIS_PULL_REQUEST}.patch | grep -o '\bdoc/api/.*\.md\b'`; | |
DOC_FILES=`git diff --name-only HEAD...$TRAVIS_BRANCH | grep -o '\bdoc/api/.*\.md\b'`; |
Refs: https://stackoverflow.com/questions/41145041/list-files-modified-in-a-pull-request-within-travis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we had trouble with commit message linting before with $TRAVIS_BRANCH
which is why we ended up downloading the patch files but I forget the details.
- make doc-only | ||
- npm install -g linkinator | ||
script: | ||
- if [ "${TRAVIS_PULL_REQUEST}" != "false" ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant because of
Line 81 in 3ed6684
if: type = pull_request |
Nice idea, but why not run it in our CI? (I'm thinking it might be a good idea anyway to move out the |
Yeah, it probably makes more sense to run it on our CI. It could be run on the nightlies rather than PR's so that way we won't be hammering remote webservers. |
Refs: #27168 (comment)
Seems easy enough to put together.
This is a proof of concept of running https://github.com/JustinBeckwith/linkinator
on our built HTML docs to find broken URL links. I don't think we want to be
running this for every PR as when I run it locally it checks 900+ links which could
be seen as a denial of service attack if we run it too often.
Also it seems to be flagging some URL's as broken which I appear to be able to
navigate to in web browser (although there does appear to be redirecting going
on). If anyone wants to look into that or adopt this PR then great as I probably
won't be spending much time on it.
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes