-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eating up all my memory #79
Comments
I was just about to post the same here. I was running a previous version until a few days ago (2.2.18) which wasn't working anymore (problem started at monday, the 13th) so I upgraded to 2.2.20 which does not complete either. This suggests some problem with the data itself. Perhaps some infinite loop? |
Same thing happened again this morning. Attached the log file. |
Forgot to mention, that I renamed the cache just as a test to see whether it would make a difference, since the cache was about 650 MB in size. |
I do not know what is happening, but there were some issues with vpro and primo. If you upgrade to 2.2.21 those are addressed through sourcematching. Possibly there are also some issues with oorboekje. Anyhow a limited test in fast mode (not using rtl, vrt and oorboekje) runs OK. I'm now testing slow mode and will test some channels on the other sources. |
The last version doesn't seem to fix Primo and VPRO issues. I can't attach a zip-file to this message, although it is mentioned as supported type.... (tar.gz doesn't work either) I use this as script:
I hope this may help to debug the problems. |
First I suggest you run --configure to update your configuration. The last time was April 30th last year with version 2.2.14. Several channelids (the ids for the separate sources) have changed, been removed or are new! Then try again with that updated configuration. |
Tried that and still in no-time the swapping starts. |
I'll see what happens if I use your channel set, which actually, aside from radio and regional, is almost mine. Only Fox and History, so possibly the problem lies there. But I run version 3 on my production system. My working/testing machine has 24 Gb so I don't expect swapping. |
Well, it starts swapping in seconds. I could take out Fox and the History channel and see what happens. |
I'm now up to 5+ Gb swap. I threw out one of my two VMs but that Gb is almost filled now. I am thinking of closing this VM too. Swap is up to 9, 2 of 15 and not growing anymore. |
It was quiet some time staying at using the full 24 Gb + some 7.5 Gb Swap. Then suddenly the log started running again and now it is stationairy at 13.7 Gb swap. I closed everything so it's using the full 24 Gb minus what system and X use. I never saw anything like this. I'm now using my laptop and I leave it running for now. It must be one of the html sources, that goes into some weird loop. If I have to kill it I will run it for half the days. |
Here's a log of mine that ends when I had to reboot my computer after it locked up. This grab had primo.eu disabled. Not sure if it will give you an idea on which source caused it. |
My guess is tvgids.tv. They have before had weird stuff. The way things go there I think the various channels are divided among several "managers" and some are sloppy. Like putting descriptions in subtitles, which only happens on some channels and then most of the time. They also regularly have malformed tags. Even so far malformed that the browser who are quite good at correcting, display sometimes nonsence. |
I'll leave it running tonight and if it does not run out of swap (2 Gb left) longer. |
I did break it of and now am running the same fetch but for 7 instead of 14 days. I also updated your config, which I hadn't done jet, removed matchlogging and added a debug key to get more output. And it seems to run normal. Memory usage stays at 606 Mb and it is happily fetching detail pages. |
I just tried a grab with tvgids.tv disabled, it still locked up. |
So unless it has to do with those minor configuration changes, it almost for sure must be tvgids.tv, as only they, horizon, rtl, vrt and nieuwsblad give data for longer then 7 days and of those only tvgids.tv and nieuwsblad.be are html. |
Try disabling horizon. It could be that there goes something wrong with the last page detection and that it keeps fetching pages, although that should not fill up memory. |
It however could explain why version 3 is not affected as it uses a different mechanism. |
Is it possible that the parser creates some loop? |
The culprit is vrt.be. It starts happening when you fetch 9 days or more |
I also have this problem. I can reliably avoid the problem by removing Belgian channel Eén from my configuration, and then bring it back by uncommenting that again. Eén;2;0-5;5;een;;;;24443943058;22;een;een;een;O8;;;4;een_v2.png Another observation: when this runaway memory issue happens all other processes are forced out of physical memory and onto swap, but tv_grab_nl.py itself does not seem to allow its memory to be swapped out. So regardless of how much swap is available the machine grinds to a halt once tv_grab_nl.py's hits the physical memory barrier. I have observed this on two separate machines (ubuntu 12.04 with 4GB physical memory and ubuntu 16.04 with 8GB). |
Yes, It when it reaches day 9 it suddenly starts adding 400Mb/s |
So the fast solution is either limit your grab to 8 days or remove O8 in your configuration from the channelstring for Één. Do not remove any of the ";" |
Possibly something goes wrong with filling up the night loop. Those programs are only listed once and marked for repeat. I then fill in that loop. |
I just commented out the line for Een and indeed, it seems to work like normal. Glad some work-around is found. |
You do not need to remove the whole line, only the channelid for vrt.be "O8" |
It was just a quick test, but good to know that clearing those 2 letters will be enough. |
For those who chose the --days 7 workaround instead of disabling vrt entirely, will this be a problem about 2 days from now when the day that's causing problems is in that 7 day range? |
Yes it looks like it. It's now past midnight and the issue now also shows on setting days = 8. Also now if I set the offset higher then 6 it goes good again. So it seems to be isolated to the data from Saturday 25th |
Correction Sunday 26th |
Is it isolated to a specific channel like Een or is it for everything from VRT on that day? |
That day is the time change. Any chance that's the cause for the bad data, missing an hour of overnight causing a bad calculation? |
Canvas is going OK. I'm thinking it might be related to Summertime start. The swap from +1 to +2 happens at the end of the last day coming in good. |
The swap happens during a regular program before the start of the nightloop, but that is not filled for the last day of a fetch. I anyhow see multiple iregularities. The news directly following it is missing and on version3 the programme itself too. The weird thing is that it does not happen on Canvas and also not last year er last fall. |
I now see it, they messed up the timmings. They use GMT on their site but made an error translating it from CET:
Obviously something goes wrong in our program on continueing the next day and filling op the night loop with those last two programmes. |
Is this just for Een or does it extend to the radio stations too? |
Don't know for sure but I think they do not have that grouping of the night programmes. Canvas does but that goes good. |
Ah, Canvas stops with the regular programming before the time jump. This is the nightloop programme
|
And Ketnet/Een+/Canvas+ shouldn't be a problem either since overnight it's just a still image with audio from Ketnet Hits after the random program from Een+ or Canvas+ Should we put Een O8 from VRT in empty_channels in sourcematching.json as a temporary fix until after the time change or are you going to fix this in the code? |
They always update the timings on the day itself, so possibly they correct it next Saturday. |
We can mask the channelid in sourcematching till next Sunday by putting it in empty programs |
We really thing synchronously tonight! ;-) |
No need to do it for Version 3 though as, although it does not handle it perfectly, it throws out the item with negative length before filling the night loop. |
Please disregard the part of my previous comment that talked about the lack of swapping. It seems that I had set /proc/sys/vm/swappiness to 1 on that machine, which isn't exactly default. I set it back to 60 now. |
Perhaps I misunderstand, but is there not a weakness in the
currently-released code? Although the weakness can only be triggered by
incorrect input data this episode has show that it can indeed be
triggered. The consequences are quite severe, in that when this happens
the machine is brought down due to severe memory pressure.
So I would recommend still fixing this in version 2 as well.
|
I'm thinking about that. But this is only applicable to vrt.be and due to the winter/summertime switch and they using GMT, so I have till this fall or actually till next year! The easy solution would be to throw out any programme with a negative length (as is happening in Version 3). But the data from vrt.be has also a length field and it should be possible through the timezone data to detect any local time shift. But that would mean also shifting any following programmes. |
I run tv_grab_py every day and have been for the last four years or so. Very happy with it. I'm now at version 2.2.20, running it at 6:00 am.
Since last weekend, the utility starts eating all my memory, bringing my machine to a grinding halt and eventually failing with a zero length output file.
I have removed the program_cache.db file but that did not help.
Attached is my config file.
tv_grab_nl_py.conf.gz
The text was updated successfully, but these errors were encountered: