uncaught exceptions in audio captcha generation #238

anarcat · 2024-12-02T20:12:14Z

In our environment, we're getting spurious exceptions like this:

Oct 03 13:27:05 donate-01 donate[949]: /usr/bin/sox FAIL formats: can't open input file `/tmp/1a0e906c0bb38acf4526812068c1e497d90c1b4d_arbitrary.wav': No such file or directory
Oct 03 13:27:05 donate-01 donate[949]: [ERROR: django.request.log_response] Internal Server Error: /captcha/audio/1a0e906c0bb38acf4526812068c1e497d90c1b4d.wav
Oct 03 13:27:05 donate-01 donate[949]: Traceback (most recent call last):
Oct 03 13:27:05 donate-01 donate[949]:   File "/home/tordonate/venv/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
Oct 03 13:27:05 donate-01 donate[949]:     response = get_response(request)
Oct 03 13:27:05 donate-01 donate[949]:                ^^^^^^^^^^^^^^^^^^^^^
Oct 03 13:27:05 donate-01 donate[949]:   File "/home/tordonate/venv/lib/python3.11/site-packages/django/core/handlers/base.py", line 197, in _get_response
Oct 03 13:27:05 donate-01 donate[949]:     response = wrapped_callback(request, *callback_args, **callback_kwargs)
Oct 03 13:27:05 donate-01 donate[949]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 03 13:27:05 donate-01 donate[949]:   File "/home/tordonate/venv/lib/python3.11/site-packages/captcha/views.py", line 197, in captcha_audio
Oct 03 13:27:05 donate-01 donate[949]:     os.remove(arbnoisepath)
Oct 03 13:27:05 donate-01 donate[949]: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/1a0e906c0bb38acf4526812068c1e497d90c1b4d_arbitrary.wav'
Oct 03 13:27:05 donate-01 donate[949]: 0.0.0.0 - "GET /captcha/audio/1a0e906c0bb38acf4526812068c1e497d90c1b4d.wav HTTP/1.1" 500 6628 45
Oct 03 13:27:05 donate-01 donate[949]: 0.0.0.0 - "GET /captcha/audio/1a0e906c0bb38acf4526812068c1e497d90c1b4d.wav HTTP/1.1" 200 33088 33
Oct 03 13:27:05 donate-01 donate[949]: 0.0.0.0 - "GET /captcha/image/a0f3b95e27cdadccd06aacb165004bf80ea4b97d/ HTTP/1.1" 200 6001 7
Oct 03 13:27:05 donate-01 donate[949]: 0.0.0.0 - "GET /captcha/audio/a0f3b95e27cdadccd06aacb165004bf80ea4b97d.wav HTTP/1.1" 206 32044 31
Oct 03 13:27:07 donate-01 donate[949]: 0.0.0.0 - "GET /captcha/audio/a0f3b95e27cdadccd06aacb165004bf80ea4b97d.wav HTTP/1.1" 200 32044 30
Oct 03 13:27:07 donate-01 donate[949]: /usr/bin/sox FAIL formats: can't open input file `/tmp/a0f3b95e27cdadccd06aacb165004bf80ea4b97d_arbitrary.wav': No such file or directory
Oct 03 13:27:07 donate-01 donate[949]: [ERROR: django.request.log_response] Internal Server Error: /captcha/audio/a0f3b95e27cdadccd06aacb165004bf80ea4b97d.wav
Oct 03 13:27:07 donate-01 donate[949]: Traceback (most recent call last):
Oct 03 13:27:07 donate-01 donate[949]:   File "/home/tordonate/venv/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
Oct 03 13:27:07 donate-01 donate[949]:     response = get_response(request)
Oct 03 13:27:07 donate-01 donate[949]:                ^^^^^^^^^^^^^^^^^^^^^
Oct 03 13:27:07 donate-01 donate[949]:   File "/home/tordonate/venv/lib/python3.11/site-packages/django/core/handlers/base.py", line 197, in _get_response
Oct 03 13:27:07 donate-01 donate[949]:     response = wrapped_callback(request, *callback_args, **callback_kwargs)
Oct 03 13:27:07 donate-01 donate[949]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 03 13:27:07 donate-01 donate[949]:   File "/home/tordonate/venv/lib/python3.11/site-packages/captcha/views.py", line 197, in captcha_audio
Oct 03 13:27:07 donate-01 donate[949]:     os.remove(arbnoisepath)
Oct 03 13:27:07 donate-01 donate[949]: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/a0f3b95e27cdadccd06aacb165004bf80ea4b97d_arbitrary.wav'
Oct 03 13:27:07 donate-01 donate[949]: 0.0.0.0 - "GET /captcha/audio/a0f3b95e27cdadccd06aacb165004bf80ea4b97d.wav HTTP/1.1" 500 6628 24
[...]
Oct 03 14:31:41 donate-01 donate[949]: /usr/bin/sox FAIL formats: can't open input file `/tmp/ed9879ca11de0a7ad7f3d910793f888af26b4d3d_arbitrary.wav': WAVE: RIFF header not found
Oct 03 14:31:41 donate-01 donate[949]: [ERROR: django.request.log_response] Internal Server Error: /captcha/audio/ed9879ca11de0a7ad7f3d910793f888af26b4d3d.wav
Oct 03 14:31:41 donate-01 donate[949]: Traceback (most recent call last):
Oct 03 14:31:41 donate-01 donate[949]:   File "/home/tordonate/venv/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
Oct 03 14:31:41 donate-01 donate[949]:     response = get_response(request)
Oct 03 14:31:41 donate-01 donate[949]:                ^^^^^^^^^^^^^^^^^^^^^
Oct 03 14:31:41 donate-01 donate[949]:   File "/home/tordonate/venv/lib/python3.11/site-packages/django/core/handlers/base.py", line 197, in _get_response
Oct 03 14:31:41 donate-01 donate[949]:     response = wrapped_callback(request, *callback_args, **callback_kwargs)
Oct 03 14:31:41 donate-01 donate[949]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 03 14:31:41 donate-01 donate[949]:   File "/home/tordonate/venv/lib/python3.11/site-packages/captcha/views.py", line 199, in captcha_audio
Oct 03 14:31:41 donate-01 donate[949]:     os.rename(mergedpath, path)
Oct 03 14:31:41 donate-01 donate[949]: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ed9879ca11de0a7ad7f3d910793f888af26b4d3d_merged.wav' -> '/tmp/ed9879ca11de0a7ad7f3d910793f888af26b4d3d.wav'
Oct 03 14:31:41 donate-01 donate[949]: 0.0.0.0 - "GET /captcha/audio/ed9879ca11de0a7ad7f3d910793f888af26b4d3d.wav HTTP/1.1" 500 6628 33
Oct 03 14:31:41 donate-01 donate[949]: /usr/bin/sox FAIL formats: can't open input file `/tmp/ed9879ca11de0a7ad7f3d910793f888af26b4d3d.wav': No such file or directory
Oct 03 14:31:41 donate-01 donate[949]: [ERROR: django.request.log_response] Internal Server Error: /captcha/audio/ed9879ca11de0a7ad7f3d910793f888af26b4d3d.wav
Oct 03 14:31:41 donate-01 donate[949]: Traceback (most recent call last):
Oct 03 14:31:41 donate-01 donate[949]:   File "/home/tordonate/venv/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
Oct 03 14:31:41 donate-01 donate[949]:     response = get_response(request)
Oct 03 14:31:41 donate-01 donate[949]:                ^^^^^^^^^^^^^^^^^^^^^
Oct 03 14:31:41 donate-01 donate[949]:   File "/home/tordonate/venv/lib/python3.11/site-packages/django/core/handlers/base.py", line 197, in _get_response
Oct 03 14:31:41 donate-01 donate[949]:     response = wrapped_callback(request, *callback_args, **callback_kwargs)
Oct 03 14:31:41 donate-01 donate[949]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 03 14:31:41 donate-01 donate[949]:   File "/home/tordonate/venv/lib/python3.11/site-packages/captcha/views.py", line 197, in captcha_audio
Oct 03 14:31:41 donate-01 donate[949]:     os.remove(arbnoisepath)
Oct 03 14:31:41 donate-01 donate[949]: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/ed9879ca11de0a7ad7f3d910793f888af26b4d3d_arbitrary.wav'
Oct 03 14:31:41 donate-01 donate[949]: 0.0.0.0 - "GET /captcha/audio/ed9879ca11de0a7ad7f3d910793f888af26b4d3d.wav HTTP/1.1" 500 6628 30

it's unclear why this is happening, we suspect it might be related with multithreading issues in the codebase and lack of locking. we also have issues with old captcha files not being cleaned up for example.

but exactly why this is happening is somewhat besides the point: such failures should at least be handled better than a 500 error with a backtrace. we should add exception handling in (e.g.) this code:

django-simple-captcha/captcha/views.py

Lines 165 to 199 in 153cfd9

    
           if settings.CAPTCHA_SOX_PATH: 
        
               arbnoisepath = str( 
        
                   os.path.join(tempfile.gettempdir(), "%s_arbitrary.wav") % key 
        
               ) 
        
               mergedpath = str(os.path.join(tempfile.gettempdir(), "%s_merged.wav") % key) 
        
               subprocess.call( 
        
                   [ 
        
                       settings.CAPTCHA_SOX_PATH, 
        
                       "-r", 
        
                       "8000", 
        
                       "-n", 
        
                       arbnoisepath, 
        
                       "synth", 
        
                       "2", 
        
                       "brownnoise", 
        
                       "gain", 
        
                       "-15", 
        
                   ] 
        
               ) 
        
               subprocess.call( 
        
                   [ 
        
                       settings.CAPTCHA_SOX_PATH, 
        
                       "-m", 
        
                       arbnoisepath, 
        
                       path, 
        
                       "-t", 
        
                       "wavpcm", 
        
                       "-b", 
        
                       "16", 
        
                       mergedpath, 
        
                   ] 
        
               ) 
        
               os.remove(arbnoisepath) 
        
               os.remove(path) 
        
               os.rename(mergedpath, path)

so that we can somewhat recover more gracefully when that happens. if at least this was a custom exception from this module (instead of a fairly generic OSError/FileNotFoundError, we could handle it (or ignore it, at least for now) differently than other exceptions...

i'd be happy to propose a PR to at least do the latter if there's an opening here.

Note that this issue is tracked at https://gitlab.torproject.org/tpo/web/donate-neo/-/issues/142 where you can also see the entire source code for this project.

The text was updated successfully, but these errors were encountered:

mbi · 2024-12-03T10:19:09Z

Hello, thanks for opening this issue.

but exactly why this is happening is somewhat besides the point:

I suspect this is due to #196, i.e. sox producing wav files at different sample rates and then being unable to merge them. If you'd like to invest some time into investigating the real source of this and submitting a pull request, I'd really appreciate this.

so that we can somewhat recover more gracefully when that happens. if at least this was a custom exception from this module (instead of a fairly generic OSError/FileNotFoundError, we could handle it (or ignore it, at least for now) differently than other exceptions

I mean, I agree, but in the end I'm more concerned by the user experience, so I'd rather fix the underlying problem. In the end the user will get a 404 instead of a 500 when requesting the audio, and both suck in pretty much the same way.

…av files prior to merging with sox

mbi · 2024-12-03T14:29:18Z

@anarcat I just pushed a fix for #196 and, if it is indeed related, could potentially also fix this issue.
Could you please test against HEAD and report back?

anarcat · 2024-12-03T19:15:04Z

awesome! we're going to deploy this for testing soon, thanks, you can follow our progress in https://gitlab.torproject.org/tpo/web/donate-neo/-/issues/142 but i'll try to report back here specifically soon!

part of the challenge is this is intermittent, so it's hard to tell whether it's fixed... but i'll do my best!

anarcat · 2024-12-16T19:45:38Z

@anarcat I just pushed a fix for #196 and, if it is indeed related, could potentially also fix this issue.
Could you please test against HEAD and report back?

we're currently running 0613733 and we're still getting FileNotFound exceptions, here's an example:

Dec 16 17:31:00 donate-01 donate[1311653]: Traceback (most recent call last):
Dec 16 17:31:00 donate-01 donate[1311653]:   File "/home/tordonate/venv/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
Dec 16 17:31:00 donate-01 donate[1311653]:     response = get_response(request)
Dec 16 17:31:00 donate-01 donate[1311653]:                ^^^^^^^^^^^^^^^^^^^^^
Dec 16 17:31:00 donate-01 donate[1311653]:   File "/home/tordonate/venv/lib/python3.11/site-packages/django/core/handlers/base.py", line 197, in _get_response
Dec 16 17:31:00 donate-01 donate[1311653]:     response = wrapped_callback(request, *callback_args, **callback_kwargs)
Dec 16 17:31:00 donate-01 donate[1311653]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Dec 16 17:31:00 donate-01 donate[1311653]:   File "/home/tordonate/venv/lib/python3.11/site-packages/captcha/views.py", line 213, in captcha_audio
Dec 16 17:31:00 donate-01 donate[1311653]:     os.remove(arbnoisepath)
Dec 16 17:31:00 donate-01 donate[1311653]: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/7ba478e90cc50f10d9c8aff4c658a571d245f942_arbitrary.wav'

update: fix copy-paste, previous version was from before the fix, but the error is mostly unchanged.

mbi · 2024-12-17T10:14:49Z

Right, I suppose that what is happening is that for some reason the URL to the audio file is being requested multiple times at a very short interval.

The way audio files are handled is

Generate a text-to-speach wav file at /tmp/hash-of-captcha.wav
Generate random noise file at /tmp/hash-of-captcha-noise.wav
Merge the two files into /tmp/hash-of-captcha-noise-merged.wav
Delete /tmp/hash-of-captcha-noise.wav
Delete /tmp/hash-of-captcha.wav
Rename /tmp/hash-of-captcha-noise-merged.wav to /tmp/hash-of-captcha.wav
Read /tmp/hash-of-captcha.wav into a temporary file
Delete /tmp/hash-of-captcha.wav
Serve the temporary file in a RangedFileResponse

Now, your errors all happen at steps 4, 5 and 6, which probably means that a second process is serving another request to the same url. The second process generates the files (steps 1, 2, 3) but the first process deletes them (steps 4, 5, 6) before they can be deleted / renamed / read again.

The underlying issue is that each process should have its own temporary directory to avoid one process deleting the other's files, but that's not the case today.

As a workaround, 1c92409 adds a random postfix to the three filenames generated in the first three steps, so that each process has their own set of files.

Could you please test this commit?

anarcat · 2024-12-17T15:21:59Z

Hi!

As a workaround, 1c92409 adds a random postfix to the three filenames generated in the first three steps, so that each process has their own set of files.

I don't want to be that guy, but random, in general, and randint in particular, seems like a Bad Idea. I would suggest using secrets.token_urlsafe instead, which has higher entropy, is not predictable, and will be less likely to result in a crash.

(now, i know, you'd need like half a billion parallel requests to hit that collision, but i always feel seeing random in codebases is code smell and just generally avoid it.)

Could you please test this commit?

will do!

The underlying issue is that each process should have its own temporary directory to avoid one process deleting the other's files, but that's not the case today.

note that we're using multithreading here, so having "processes" in their own directories wouldn't fix this.

in general though, it seems we're fixing this the wrong way: if the files already exist, why not just reuse them or wait for another (async?) process to complete the generation, instead of rebuilding the files?

mbi · 2024-12-18T08:26:50Z

I don't want to be that guy, but random, in general, and randint in particular, seems like a Bad Idea. I would suggest using secrets.token_urlsafe instead, which has higher entropy, is not predictable, and will be less likely to result in a crash.

No that's a good point, fixed in b7abe2e

note that we're using multithreading here, so having "processes" in their own directories wouldn't fix this.

Yes I was thinking of using tempfile.TemporaryDirectory which ought to be threadsafe and also handles automatic deletion when used as a context manager. This way every thread should get their own copy.

in general though, it seems we're fixing this the wrong way: if the files already exist, why not just reuse them or wait for another (async?) process to complete the generation, instead of rebuilding the files?

Because repetition attacks. This is precisely why we're injecting noise into the generated wav file, otherwise one cleartext will generate the same output sound file when requested multiple times, which will give an attacker agency to create a rainbow table of file hashes. And also, it makes the architecture of the audio generation view utterly complicated for no good reason at all IMO. 🤷

mbi added a commit that referenced this issue Dec 3, 2024

Fix for #196 and possibly #238: detect sample rate of generated PCM W…

0613733

…av files prior to merging with sox

mbi closed this as completed in 1c92409 Dec 17, 2024

mbi reopened this Dec 17, 2024

mbi added a commit that referenced this issue Dec 18, 2024

chore:use secrets.token_urlsafe instead of random.randint #238

b7abe2e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uncaught exceptions in audio captcha generation #238

uncaught exceptions in audio captcha generation #238

anarcat commented Dec 2, 2024 •

edited

Loading

mbi commented Dec 3, 2024

mbi commented Dec 3, 2024

anarcat commented Dec 3, 2024 •

edited

Loading

anarcat commented Dec 16, 2024 •

edited

Loading

mbi commented Dec 17, 2024 •

edited

Loading

anarcat commented Dec 17, 2024

mbi commented Dec 18, 2024 •

edited

Loading

uncaught exceptions in audio captcha generation #238

uncaught exceptions in audio captcha generation #238

Comments

anarcat commented Dec 2, 2024 • edited Loading

mbi commented Dec 3, 2024

mbi commented Dec 3, 2024

anarcat commented Dec 3, 2024 • edited Loading

anarcat commented Dec 16, 2024 • edited Loading

mbi commented Dec 17, 2024 • edited Loading

anarcat commented Dec 17, 2024

mbi commented Dec 18, 2024 • edited Loading

anarcat commented Dec 2, 2024 •

edited

Loading

anarcat commented Dec 3, 2024 •

edited

Loading

anarcat commented Dec 16, 2024 •

edited

Loading

mbi commented Dec 17, 2024 •

edited

Loading

mbi commented Dec 18, 2024 •

edited

Loading