Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shiori update doesn't clear cache after it's finished, maxing out /tmp/ dir. #966

Closed
Skivling opened this issue Aug 20, 2024 · 20 comments · Fixed by #969 · May be fixed by #972
Closed

shiori update doesn't clear cache after it's finished, maxing out /tmp/ dir. #966

Skivling opened this issue Aug 20, 2024 · 20 comments · Fixed by #969 · May be fixed by #972
Labels
type:bug Something isn't working

Comments

@Skivling
Copy link

Data

  • Shiori version: 1.7.0
  • Database Engine: If unknown, SQLite is the default.
  • Operating system: Fedora GNU/Linux
  • CLI/Web interface/Web Extension: CLI

Describe the bug / actual behavior

I ran shiori update 500-600 to update the descriptions, archives and images for 100 bookmarks. After it was completed, the last few hadn't worked because they 'ran out of storage'. I have 50GB free on my disk, so didn't know what it meant. Turns out it had filled up the /tmp/ directory, which I think is RAM / cache storage. This resulted in my computer running very slowly until I restarted and it was cleared.

Expected behavior

After a bookmark update is finished, it shouldn't need to have anything in /tmp/ anymore and clear it.

To Reproduce

Steps to reproduce the behavior:

  1. Run shiori update with a certain amount of entries, depending on how much ram you have. e.g. it took about 100 entries to fill up my 4GB ram.
  2. Notice that they may start failing and give you an error relating to not enough space available.
  3. Stop the command.
  4. See lots of files in /tmp/ prefixed with either archive or image and suffixed by random numbers. Notice that the /tmp/ directory has little or no space left in it.
@Skivling Skivling added the type:bug Something isn't working label Aug 20, 2024
@github-project-automation github-project-automation bot moved this to To do in Roadmap Aug 20, 2024
@Oppen
Copy link
Contributor

Oppen commented Aug 24, 2024

I came to report just this. I have my suspicions that it has to do with a mismatch of packages used to create and remove the files. In these lines the file is created with the stdlib but the afero abstraction is used to clean up instead. That abstraction provides its own MakeTemp utility, so I suspect maybe it won't touch it if it doesn't own it.

@Oppen
Copy link
Contributor

Oppen commented Aug 24, 2024

It looks like there's also a double delete 🤔
That defer would run the removal twice.

@Oppen
Copy link
Contributor

Oppen commented Aug 25, 2024

OK, I can confirm the cause of the issue, I executed under strace and got this sequence:

pwrite64(8</tmp/archive1692607475>, "\16\0\0\0\0\0\0\0\20\0\3\0\0\0\0\0\30\0\0\0\0\0\0\0\32\0\0\0\0\0\0\0"..., 4096, 5
unlinkat(AT_FDCWD</srv/shiori/bin>, "/srv/shiori/data/tmp/archive1692607475", 0) = -1 ENOENT

As seen, using the abstraction constructs a different path because the FS member is relative to the shiori data dir, while os.CreateTemp is the global /tmp directory.
I may send a PR with a fix tomorrow, it's simple but I need to try it and for reasons I'm too lazy to explain that takes time on my side.

@Oppen
Copy link
Contributor

Oppen commented Aug 25, 2024

BTW, using git blame I saw the lack of a removal on success case was intentional, although due to faulty logic: the original author of that piece seemed to believe they could resume where they left if the file persisted, but that's not really true, because they don't know the random component of the file name.

@Oppen
Copy link
Contributor

Oppen commented Aug 25, 2024

I can confirm #969 fixed it for me. @Skivling if you wish you could try my branch.
To build run:

go build -tags osusergo,netgo,fts5 main.go

Then run the main binary produced as you would run shiori, or replace (after backing up) your shiori binary with it.

@Skivling
Copy link
Author

To quickly backup shiori before this do I need to copy a directory or database file or what? Then I can build and test that version fine.

@Oppen
Copy link
Contributor

Oppen commented Aug 25, 2024

I would say only copy the binary, but if you want to be extra cautious, maybe copy the data dir.

@Skivling
Copy link
Author

It didn't work - I cloned your forked repo, switched to the branch, and built it with the command. I ran it using the binary (./main instead of shiori) but it still crashed my computer after the /tmp was full and nothing could manage to run.

@Oppen
Copy link
Contributor

Oppen commented Aug 26, 2024

What do the files look like for you? Mine were archive<random_string>. Those seem to appear and disappear as expected on my system.

@Oppen
Copy link
Contributor

Oppen commented Aug 26, 2024

Looking more closely, I think I understand why I didn't solve your issue. I was dealing with the leak itself by running shiori add (found the issue while migrating from Wallabag). The update command has a second issue: it fires up an unbounded number of concurrent goroutines to update the articles, each storing a file on /tmp. Your RAM gets full not because of the leak, but because of the concurrency.

@Monirzadeh I think a reasonable fix would be to have an env var to limit concurrency here, what do you think? If that's OK with you I may make a PR. I can first do a test version for @Skivling to test.

@Oppen
Copy link
Contributor

Oppen commented Aug 26, 2024

Oh, there is a semaphore already there, nevermind. I still think the concurrency is the issue tho. Maybe the archives are individually too big. Concurrency is set to 10 simultaneous archives right now.

@Oppen
Copy link
Contributor

Oppen commented Aug 26, 2024

@Skivling this branch on my fork should be useful to confirm or reject the hypothesis: test/limit_concurrency.
Same procedure as before. What to expect: if I'm right, the operation should be slow (as it is sequential) but succeed without crashes (because it lowers the peak number of simultaneous temporary files to one archive plus its images at a time).

@Skivling
Copy link
Author

The limit concurreny worked 🎉, although now I think I didn't switch branches properly the first time and actually just build the main branch, bc I only put git branch which creates one instead of git checkout to switch. I'll try the original fix again in case it did fix it as well; then you can figure out what to merge.

@Skivling
Copy link
Author

Ok. The limit concurreny branch was required to fix the issue.

a reasonable fix would be to have an env var to limit concurrency here

setting it as an ENV variable would be good. Maybe in my system I'd set it to 3 so it isn't as slow but avoids the issue. I'd be happy to test a new branch to test it.

What do the files look like for you? Mine were archive<random_string>. Those seem to appear and disappear as expected on my system.

Yes that's what mine are like.

@Oppen
Copy link
Contributor

Oppen commented Aug 27, 2024

@Monirzadeh @fmartingr since adding a config flag may be invasive I'd like to have your input before doing so.
The idea would be to simply add an env var named SHIORI_MAX_CONCURRENT_DL or something like that, default it to 10 (the current default) if unset, and use that number as the limit for the semaphore on the update command.
Do note the other fix is still necessary because the temp files were leaking.

@Oppen
Copy link
Contributor

Oppen commented Aug 27, 2024

Re: the FS abstraction. I would add another domain that represents the tmp filesystem, which may or may not be based on the data dir depending on config. By default it should still be /tmp because it's what most people would expect but, for example, could be an on-disk directory to avoid filling up the RAM on constrained environments. What do you think?
If I get the time to do that I may try, I'd still much rather have the other fix merged first so we have something working.

@fmartingr
Copy link
Member

Hey @Oppen, I'm good on your two proposals. Notes:

  • On the first one (concurrency) I would limit to the number of CPU cores by default, optionally modifying it using the proposed env var.
  • On the second one (temp fs), just fixing it using the os package seems fine for now, no need to add more complexity until we need it. I think there's a PR already for it, I have a ton of notifications to go through.

@Oppen
Copy link
Contributor

Oppen commented Aug 27, 2024

  • On the first one (concurrency) I would limit to the number of CPU cores by default, optionally modifying it using the proposed env var.

IIUC, it's two different kinds of concurrency at play here. What the semaphore limits is asynchronous processing, because it limits the number of READY goroutines (every other goroutine performing a download will be PARKED waiting on the channel), which the Go runtime multiplexes whenever they block on IO.
There is already a default limit on parallelism set by GONUMPROCS that matches the number of CPUs by default. Should I default to match that one unless explicitly set otherwise? Currently it's hardcoded as 10, which is why I suggested that as a default.

  • On the second one (temp fs), just fixing it using the os package seems fine for now, no need to add more complexity until we need it. I think there's a PR already for it, I have a ton of notifications to go through.

I'm all for avoiding/postponing complexity, but understood the abstraction was the preferred way.

@Oppen
Copy link
Contributor

Oppen commented Aug 27, 2024

Another possibility would be to limit on what we actually care for, which is the number/size of temporary files. It would be more complex, but if it sounds sensible I may give it some thought.

@Oppen
Copy link
Contributor

Oppen commented Aug 27, 2024

I sent a new PR that I think implements that idea. Testing and review welcome.

@github-project-automation github-project-automation bot moved this from To do to Done in Roadmap Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
Archived in project
3 participants