shiori update doesn't clear cache after it's finished, maxing out /tmp/ dir. #966

Skivling · 2024-08-20T05:24:49Z

Data

Shiori version: 1.7.0
Database Engine: If unknown, SQLite is the default.
Operating system: Fedora GNU/Linux
CLI/Web interface/Web Extension: CLI

Describe the bug / actual behavior

I ran shiori update 500-600 to update the descriptions, archives and images for 100 bookmarks. After it was completed, the last few hadn't worked because they 'ran out of storage'. I have 50GB free on my disk, so didn't know what it meant. Turns out it had filled up the /tmp/ directory, which I think is RAM / cache storage. This resulted in my computer running very slowly until I restarted and it was cleared.

Expected behavior

After a bookmark update is finished, it shouldn't need to have anything in /tmp/ anymore and clear it.

To Reproduce

Steps to reproduce the behavior:

Run shiori update with a certain amount of entries, depending on how much ram you have. e.g. it took about 100 entries to fill up my 4GB ram.
Notice that they may start failing and give you an error relating to not enough space available.
Stop the command.
See lots of files in /tmp/ prefixed with either archive or image and suffixed by random numbers. Notice that the /tmp/ directory has little or no space left in it.

The text was updated successfully, but these errors were encountered:

Oppen · 2024-08-24T14:51:21Z

I came to report just this. I have my suspicions that it has to do with a mismatch of packages used to create and remove the files. In these lines the file is created with the stdlib but the afero abstraction is used to clean up instead. That abstraction provides its own MakeTemp utility, so I suspect maybe it won't touch it if it doesn't own it.

Oppen · 2024-08-24T14:54:59Z

It looks like there's also a double delete 🤔
That defer would run the removal twice.

Oppen · 2024-08-25T06:28:33Z

OK, I can confirm the cause of the issue, I executed under strace and got this sequence:

pwrite64(8</tmp/archive1692607475>, "\16\0\0\0\0\0\0\0\20\0\3\0\0\0\0\0\30\0\0\0\0\0\0\0\32\0\0\0\0\0\0\0"..., 4096, 5
unlinkat(AT_FDCWD</srv/shiori/bin>, "/srv/shiori/data/tmp/archive1692607475", 0) = -1 ENOENT

As seen, using the abstraction constructs a different path because the FS member is relative to the shiori data dir, while os.CreateTemp is the global /tmp directory.
I may send a PR with a fix tomorrow, it's simple but I need to try it and for reasons I'm too lazy to explain that takes time on my side.

Oppen · 2024-08-25T06:29:59Z

BTW, using git blame I saw the lack of a removal on success case was intentional, although due to faulty logic: the original author of that piece seemed to believe they could resume where they left if the file persisted, but that's not really true, because they don't know the random component of the file name.

Oppen · 2024-08-25T06:39:42Z

I can confirm #969 fixed it for me. @Skivling if you wish you could try my branch.
To build run:

go build -tags osusergo,netgo,fts5 main.go

Then run the main binary produced as you would run shiori, or replace (after backing up) your shiori binary with it.

Skivling · 2024-08-25T20:53:10Z

To quickly backup shiori before this do I need to copy a directory or database file or what? Then I can build and test that version fine.

Oppen · 2024-08-25T23:43:21Z

I would say only copy the binary, but if you want to be extra cautious, maybe copy the data dir.

Skivling · 2024-08-26T09:02:11Z

It didn't work - I cloned your forked repo, switched to the branch, and built it with the command. I ran it using the binary (./main instead of shiori) but it still crashed my computer after the /tmp was full and nothing could manage to run.

Oppen · 2024-08-26T12:36:53Z

What do the files look like for you? Mine were archive<random_string>. Those seem to appear and disappear as expected on my system.

Oppen · 2024-08-26T15:35:08Z

Looking more closely, I think I understand why I didn't solve your issue. I was dealing with the leak itself by running shiori add (found the issue while migrating from Wallabag). The update command has a second issue: it fires up an unbounded number of concurrent goroutines to update the articles, each storing a file on /tmp. Your RAM gets full not because of the leak, but because of the concurrency.

@Monirzadeh I think a reasonable fix would be to have an env var to limit concurrency here, what do you think? If that's OK with you I may make a PR. I can first do a test version for @Skivling to test.

Oppen · 2024-08-26T15:47:13Z

Oh, there is a semaphore already there, nevermind. I still think the concurrency is the issue tho. Maybe the archives are individually too big. Concurrency is set to 10 simultaneous archives right now.

Oppen · 2024-08-26T18:59:05Z

@Skivling this branch on my fork should be useful to confirm or reject the hypothesis: test/limit_concurrency.
Same procedure as before. What to expect: if I'm right, the operation should be slow (as it is sequential) but succeed without crashes (because it lowers the peak number of simultaneous temporary files to one archive plus its images at a time).

Skivling · 2024-08-26T21:54:54Z

The limit concurreny worked 🎉, although now I think I didn't switch branches properly the first time and actually just build the main branch, bc I only put git branch which creates one instead of git checkout to switch. I'll try the original fix again in case it did fix it as well; then you can figure out what to merge.

Skivling · 2024-08-26T22:07:52Z

Ok. The limit concurreny branch was required to fix the issue.

a reasonable fix would be to have an env var to limit concurrency here

setting it as an ENV variable would be good. Maybe in my system I'd set it to 3 so it isn't as slow but avoids the issue. I'd be happy to test a new branch to test it.

What do the files look like for you? Mine were archive<random_string>. Those seem to appear and disappear as expected on my system.

Yes that's what mine are like.

Oppen · 2024-08-27T15:08:59Z

@Monirzadeh @fmartingr since adding a config flag may be invasive I'd like to have your input before doing so.
The idea would be to simply add an env var named SHIORI_MAX_CONCURRENT_DL or something like that, default it to 10 (the current default) if unset, and use that number as the limit for the semaphore on the update command.
Do note the other fix is still necessary because the temp files were leaking.

Oppen · 2024-08-27T15:11:08Z

Re: the FS abstraction. I would add another domain that represents the tmp filesystem, which may or may not be based on the data dir depending on config. By default it should still be /tmp because it's what most people would expect but, for example, could be an on-disk directory to avoid filling up the RAM on constrained environments. What do you think?
If I get the time to do that I may try, I'd still much rather have the other fix merged first so we have something working.

fmartingr · 2024-08-27T16:17:16Z

Hey @Oppen, I'm good on your two proposals. Notes:

On the first one (concurrency) I would limit to the number of CPU cores by default, optionally modifying it using the proposed env var.
On the second one (temp fs), just fixing it using the os package seems fine for now, no need to add more complexity until we need it. I think there's a PR already for it, I have a ton of notifications to go through.

Oppen · 2024-08-27T18:28:39Z

On the first one (concurrency) I would limit to the number of CPU cores by default, optionally modifying it using the proposed env var.

IIUC, it's two different kinds of concurrency at play here. What the semaphore limits is asynchronous processing, because it limits the number of READY goroutines (every other goroutine performing a download will be PARKED waiting on the channel), which the Go runtime multiplexes whenever they block on IO.
There is already a default limit on parallelism set by GONUMPROCS that matches the number of CPUs by default. Should I default to match that one unless explicitly set otherwise? Currently it's hardcoded as 10, which is why I suggested that as a default.

On the second one (temp fs), just fixing it using the os package seems fine for now, no need to add more complexity until we need it. I think there's a PR already for it, I have a ton of notifications to go through.

I'm all for avoiding/postponing complexity, but understood the abstraction was the preferred way.

Oppen · 2024-08-27T18:31:56Z

Another possibility would be to limit on what we actually care for, which is the number/size of temporary files. It would be more complex, but if it sounds sensible I may give it some thought.

Oppen · 2024-08-27T18:56:37Z

I sent a new PR that I think implements that idea. Testing and review welcome.

Skivling added the type:bug Something isn't working label Aug 20, 2024

github-project-automation bot added this to Roadmap Aug 20, 2024

github-project-automation bot moved this to To do in Roadmap Aug 20, 2024

Oppen mentioned this issue Aug 25, 2024

fix: temporary files not being deleted #969

Merged

Oppen mentioned this issue Aug 27, 2024

fix: make number of concurrent downloads configurable #972

Open

fmartingr closed this as completed in #969 Aug 28, 2024

github-project-automation bot moved this from To do to Done in Roadmap Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

shiori update doesn't clear cache after it's finished, maxing out /tmp/ dir. #966

shiori update doesn't clear cache after it's finished, maxing out /tmp/ dir. #966

Skivling commented Aug 20, 2024

Oppen commented Aug 24, 2024

Oppen commented Aug 24, 2024

Oppen commented Aug 25, 2024

Oppen commented Aug 25, 2024

Oppen commented Aug 25, 2024

Skivling commented Aug 25, 2024

Oppen commented Aug 25, 2024

Skivling commented Aug 26, 2024

Oppen commented Aug 26, 2024

Oppen commented Aug 26, 2024

Oppen commented Aug 26, 2024

Oppen commented Aug 26, 2024 •

edited

Loading

Skivling commented Aug 26, 2024

Skivling commented Aug 26, 2024

Oppen commented Aug 27, 2024

Oppen commented Aug 27, 2024

fmartingr commented Aug 27, 2024

Oppen commented Aug 27, 2024

Oppen commented Aug 27, 2024

Oppen commented Aug 27, 2024

shiori update doesn't clear cache after it's finished, maxing out /tmp/ dir. #966

shiori update doesn't clear cache after it's finished, maxing out /tmp/ dir. #966

Comments

Skivling commented Aug 20, 2024

Data

Describe the bug / actual behavior

Expected behavior

To Reproduce

Oppen commented Aug 24, 2024

Oppen commented Aug 24, 2024

Oppen commented Aug 25, 2024

Oppen commented Aug 25, 2024

Oppen commented Aug 25, 2024

Skivling commented Aug 25, 2024

Oppen commented Aug 25, 2024

Skivling commented Aug 26, 2024

Oppen commented Aug 26, 2024

Oppen commented Aug 26, 2024

Oppen commented Aug 26, 2024

Oppen commented Aug 26, 2024 • edited Loading

Skivling commented Aug 26, 2024

Skivling commented Aug 26, 2024

Oppen commented Aug 27, 2024

Oppen commented Aug 27, 2024

fmartingr commented Aug 27, 2024

Oppen commented Aug 27, 2024

Oppen commented Aug 27, 2024

Oppen commented Aug 27, 2024

Oppen commented Aug 26, 2024 •

edited

Loading