-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WAL recovery issue #37
Comments
Another WAL related question: Is it expected to grow really fast? I don't scrape a lot of metrics (I'd guess 1000 metrics / 15s), but after 2 minutes the WAL file has already reached 11MB in file size. |
@weeco Thanks for giving it a quick try, and good catch! Let me look into what's going on.
To be honest, the current WAL format prioritizes simplicity. I'm in the middle of looking for another format that reduces the size and doesn't slow down writing and recovery. Anyway, your try and report boost me a lot. A big thank you goes out to you! |
It turns out the root cause is the unit to be flushed to disk is now left to I'll fix it ASAP, but until then, possibly you can workaround by setting |
I haven't solved this problem yet, but for now I've made the WAL allow invalid rows at v0.3.1. You should now be able to recover with the same settings as before (the last few lines may still be discarded though). |
I tried this version with two of my previously failed cases (the WAL and Disk files are stored in separate disks, hence easy to test). The first one reports this issue upon starting:
The second one threw a panic:
A WAL file written with v0.3.1 also caused the panic I pasted above. |
Nice catch! I addressed the second one: https://github.com/nakabonne/tstorage/releases/tag/v0.3.2 But I'm not sure why the first one happen as I'm not familiar with Windows file system. Is there |
@nakabonne Now I don't get an error on startup anymore, but at query/select time. The returned error says (I think the first part is from within Kowl):
--
The file does not exist, but I'm unsure why it doesn't exist. I'll keep observing it and I will test more as soon as WAL is a bit more stable. Given that I just run it in debug mode currently the missing/malfunctioning WAL functionality doesn't make it too much fun to test the TSDB atm, because it takes some time until the DB is filled with meaningful data :). |
Seems like that partition is expired. |
@nakabonne I think we made it! Stopping the application abruptly and restarting it works just fine at the moment. I'll keep playing around with it, but generally it seems to work now :). Thank you for your support! |
Happy to hear that! |
Hey @nakabonne ,
I appreciate your quick work on the WAL, that's awesome! I just tried it and apparently I ran into an issue with the recovery from the WAL file. I let it run for ~2minutes where it added a few metrics into the TSDB and I noticed that the WAL file is also written. During recovery the following error popped up:
My settings are:
I sent you the WAL file via EMail.
The text was updated successfully, but these errors were encountered: