Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds WARC generation for offline viewing #95

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

gjvnq
Copy link
Contributor

@gjvnq gjvnq commented May 26, 2022

I added a gulp task to generate WARC files from the website so it can be read offline. These can viewed through WARC players like replayweb.page and they are attached to this pull request to allow the mantainers to evaluate this feature.

There's a minor bug about the sidebar/bookmarks which I'm still investigating (see webrecorder/replayweb.page#99).

Finally, WARCs aren't as good as EPUBs but they are here and don't have as many compatibility quircks as EPUBs.


Attached files: (you will likely need to rename the extensions from .warc.txt to .warc)

gdb-en.warc.txt
gdb-zh.warc.txt
gdb-de.warc.txt
gdb-hu.warc.txt
gdb-pl.warc.txt
gdb-zh.warc.txt
gdb-es.warc.txt
gdb-fr.warc.txt

gjvnq added 3 commits May 26, 2022 00:23
The Gothic A1 type-face is divided into 120 files! (see [1])
This can make the website slow to load and specially slow to
generate WARC file for.

The Inter type-face is very similar in looks and is divided
only into 21 files. (see [2])

[1]: https://fonts.googleapis.com/css2?family=Gothic+A1:wght@300;400;600&display=swap
[2]: https://fonts.googleapis.com/css2?family=Inter:wght@300;400;600&display=swap
@Twipped
Copy link
Contributor

Twipped commented May 26, 2022

I've never heard of this format. Is there a meta tag or something that exposes it as available?

@gjvnq
Copy link
Contributor Author

gjvnq commented May 26, 2022 via email

@Twipped
Copy link
Contributor

Twipped commented May 26, 2022

Ah, okay, I thought it was something that browsers hooked into for offline caching.

So, one small problem with this is that I'm actually trying to get off of gulp. There's also several other changes that I just can't merge at this time without seeing what they do. I'll have to review this more in-depth later.

@gjvnq
Copy link
Contributor Author

gjvnq commented May 26, 2022

So, one small problem with this is that I'm actually trying to get off of gulp. There's also several other changes that I just can't merge at this time without seeing what they do. I'll have to review this more in-depth later.

I dislike gulp too. What do plan to use as a replacement?

And what can I use for WARC generation? Is it okay if I use something written in Go, Rust or Python?

@gjvnq
Copy link
Contributor Author

gjvnq commented May 26, 2022

Ah, okay, I thought it was something that browsers hooked into for offline caching.

I can use Manifests (see http://diveintohtml5.info/offline.html) to allow for offline browsing but I think that the usability is subpar.

@Twipped
Copy link
Contributor

Twipped commented May 31, 2022

What do plan to use as a replacement?

Just plain JS tied together with yargs. The main thing holding me back right now is the AWS publish system. I haven't found anything else that supports pattern based cache control headers. I need to port gulp-awspublish out into a standalone library, but haven't the time to do it.

@Twipped
Copy link
Contributor

Twipped commented May 31, 2022

Is it okay if I use something written in Go, Rust or Python?

Only if it's something quick and easy to install. Python dependency management is a hell I'd rather not get involved with, and I have zero experience with Go or Rust. I'd much rather it be in JS.

@gjvnq
Copy link
Contributor Author

gjvnq commented May 31, 2022

Is it okay if I use something written in Go, Rust or Python?

Only if it's something quick and easy to install. Python dependency management is a hell I'd rather not get involved with, and I have zero experience with Go or Rust. I'd much rather it be in JS.

Golang generates static binaries so installing them is pretty easy.

As for the current JS script (offline.js), it uses Gulp very minimally so converting it to work with another tool should be very easy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants