Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread Exception when Downloading #81

Open
naortega opened this issue Oct 9, 2016 · 13 comments
Open

Thread Exception when Downloading #81

naortega opened this issue Oct 9, 2016 · 13 comments
Labels

Comments

@naortega
Copy link

naortega commented Oct 9, 2016

Forgive me if this is a duplicate, but I don't believe I've seen it anywhere when passing through the issues.

Here's the error:

minamoto-kun.monogatari.[mh].c029: 100% |######################| ETA:  00:00:00

Exception in thread Thread-21:031:  60% |#############         | ETA:  00:00:09
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/home/nicolas/bin/manga_downloader/src/parsers/base.py", line 403, in run
    raise FatalError("Thread crashed while downloading chapter: %s" % str(exception))
FatalError: Thread crashed while downloading chapter: CRC check failed 0xe0e6dd4f != 0xfeb86693L

At the same time I also get the following error, which I am not sure if it is related:

Exception in thread Thread-27:033:  11% |##                    | ETA:  00:00:00
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/home/nicolas/bin/manga_downloader/src/parsers/base.py", line 403, in run
    raise FatalError("Thread crashed while downloading chapter: %s" % str(exception))
FatalError: Thread crashed while downloading chapter: Error -3 while decompressing: invalid distance too far back

The Python version I'm running this on is the default for my system (Debian Testing):

nicolas@pulse $ python --version
Python 2.7.12+

It also seems that although it gives this error it continues to download the next files in the background (not showing any progress), and also continues to have errors every now and then with other files.

I hope this is helpful.

@CharlieCorner
Copy link
Contributor

Do you have the exact steps you followed before you found the exception? I've never seen it before, what site were you trying to download from? What manga and what chapters? Any particular arguments you passed on to the application?

@naortega
Copy link
Author

First, I'm using the code from the following git commit: 13e4eaa4c23ce99a5e5c0ddffbb5e928a34aa991, since the code from the last release is old af.

As for what I did, here's what it is:

nicolas@pulse $ manga_downloader "minamoto-kun monogatari"

Program: Copyright (c) 2010. GPL v3 (http://www.gnu.org/licenses/gpl.html).
Icon:      Copyright (c) 2006. GNU Free Document License v1.2 (Author:Kasuga).
           http://ja.wikipedia.org/wiki/%E5%88%A9%E7%94%A8%E8%80%85:Kasuga

minamoto-kun monogatari
Which site?
(1) MangaFox
(2) MangaReader
(3) MangaPanda
(4) MangaHere
(5) EatManga
(6) Batoto
4
Beginning MangaHere check: minamoto-kun monogatari
(1) c001
(2) c002
(3) c003
...
(193) c193
(194) c194

Download which chapters?
190-194
...

On the last ... is where it starts downloading and every now and then gives me the error with one of the downloads (it seems quite random to me). If I limit the number of threads to 1 it doesn't seem to give me any errors (which isn't surprising since this is a thread exception I'm getting, and I can't get one if the program doesn't use threads).

Just in case you were wondering, the manga_downlaoder command is a bash alias I made to the file, literally as follows:

alias manga_downloader="~/bin/manga_downloader/src/manga.py"

That is, I run the python file as an executable (since it has executable permissions and also starts of the file with #!/usr/bin/env python, which tells my shell what environment to use). Should I change my alias to use python3?

@CharlieCorner
Copy link
Contributor

@jiaweihli can correct me if I'm wrong but while there is (was?) an effort to migrate manga_downloader to Python3 it is my understanding that the migration was never completed, Python2 continues to be the supported interpreter.

As for your bug, and prepare yourself for a long post, I'm not able to reproduce it.

You do mention that this happens randomly and by looking at your traces it doesn't look like this is related to thread handling code, but rather, something is happening in the underlying code; quite possibly related to a ZIP file getting corrupted somewhere and for some reason. Both CRC check failed 0xe0e6dd4f != 0xfeb86693L and Error -3 while decompressing: invalid distance too far back are the actual culprits of your problem.

Which brings me to the next point. I think you've found a bigger issue with the current code: we're not able to correctly debug the application even in edge cases because we're silencing all Exceptions in the DownloadChapterThread code.

The code that is raising this FatalErroris the following in base.py lines 391-403 of the DownloadChapterThread class:

    def run (self):
        try:
            self.siteParser.processChapter(self, self.chapter)  
        except Exception as exception:
            # Assume semaphore has not been release
            # This assumption could be faulty if the error was thrown in the compression function
            # The worst case is that releasing the semaphore would allow one more thread to 
            # begin downloading than it should
            #
            # If the semaphore was not released before the exception, it could cause deadlock
            chapterThreadSemaphore.release()
            self.isThreadFailed = True
            raise FatalError("Thread crashed while downloading chapter: %s" % str(exception))

If an Exception is raised in any part of the underlying code in self.siteParse.processChapter(self, self.chapter) it will be silenced by the code on line 403 where we raise FatalError.

While we may not be able to further troubleshoot your problem at least for now I believe we should tag your issue as a bug and proceed to open another issue to take care of the silenced exceptions, maybe then we'll be able to get more information on what is going on.

If nobody has any objection, I'll proceed to document that in another ticket.

@naortega
Copy link
Author

I just remembered another issue that may have to do with this (I don't know how your file streaming works, so it's hard for me to say this is part of the problem). My connection to the internet from my room is rather iffy, sometimes it cuts off completely and unexpectedly and then goes back 100%. Could any slight cuts in the connection like this cause for the file stream to be corrupted or in general cause issues with an individual thread? As I said, if one thread has an error downloading it gives me this error, but the the other threads seem to continue.

@jklmli
Copy link
Owner

jklmli commented Oct 11, 2016

My hunch is that you're seeing high packet loss, and that this is causing issues when creating an archive.

@naortega
Copy link
Author

@jiaweihli, that seems like it would be it, but that shouldn't be a problem if these packets were transfered over TCP (which would be normal considering it's a file transfer). I'm not sure as to the workings of manga_downloader (mostly 'cause I don't know any real Python), but it should be able to use TCP.

Of course, this is all assuming that the issue is packet loss.

@naortega naortega added the Bug label Nov 7, 2016
@jklmli
Copy link
Owner

jklmli commented Nov 7, 2016

If your internet is cutting out intermittently, it's possible that the connection to the server is getting closed and you're unable to resume the original connection.

In days past when I had a wonky internet setup, I would sometimes end up with incomplete/truncated images. I'm not sure if you're experiencing a similar problem, since I never ran into an issue building the archive.

Fixing #82 will help diagnose this issue.

@naortega
Copy link
Author

naortega commented Nov 7, 2016

@jiaweihli, I'm not getting any truncated images, just this thread exception. It seems to have stopped since I created an alias and now it only runs on one thread, but that isn't necessarily fixing the problem but rather avoiding it ('cause I want to download my next chapter of manga). So yeah, it's just the exception.

@jklmli
Copy link
Owner

jklmli commented Dec 29, 2016

Fixed in #90

@jklmli jklmli closed this as completed Dec 29, 2016
@jklmli jklmli reopened this Dec 29, 2016
@jklmli
Copy link
Owner

jklmli commented Dec 29, 2016

Whoops, not fixed - but should be easier to debug.

@naortega
Copy link
Author

So for quite a while now I've been using only a single thread for downloading manga, which has worked just fine (with my shitty internet), however recently I got a PLC so I can have a much more consistent connection. I'll try using multi-threaded again and see if it works fine, which should also help to see where this is coming from.

Have a great New Year's!

@jklmli
Copy link
Owner

jklmli commented Dec 29, 2016

@Deathsbreed
Thanks for keeping us updated. I'm actually really curious what's wrong, this is a puzzler 🤔

And happy holidays! 🎉

@jtara1
Copy link

jtara1 commented May 9, 2017

This is still an issue, I think it may be the same issue anyways:

Thread crashed while downloading chapter: tokyo ghoul - 39
Exception in thread Thread-42:
Traceback (most recent call last):
  File "C:\Python27\lib\threading.py", line 801, in __bootstrap_inner
    self.run()
  File "C:\Users\James\Documents\_Github-Projects\manga_downloader\src\parsers\base.py", line 395, in run
    self.siteParser.processChapter(self, self.chapter)
  File "C:\Users\James\Documents\_Github-Projects\manga_downloader\src\parsers\base.py", line 274, in processChapter
    self.downloadChapter(downloadThread, max_pages, url, manga_chapter_prefix, current_chapter)
  File "C:\Users\James\Documents\_Github-Projects\manga_downloader\src\parsers\mangafox.py", line 161, in downloadChapter
    self.downloadImage(downloadThread, page, pageUrl, manga_chapter_prefix)
  File "C:\Users\James\Documents\_Github-Projects\manga_downloader\src\parsers\base.py", line 156, in downloadImage
    img_url = self.__class__.re_getImage.search(source_code).group(1)
TypeError: expected string or buffer

Workaround is to disable multi-threading as Deathsbreed suggested using -t numb optional cli argument with numb = 1.

e.g.

python manga.py -t 1 "naruto"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants