New concurrency strategy, tested against Datasette pre and post 1.0a7 #36

simonw · 2024-01-24T06:04:50Z

Improved compatibility with Datasette 1.0a+ #37

simonw · 2024-01-24T06:07:50Z

Weird, it's failing in CI against 1.0+ with this error:

>       assert rows == expected_rows
E       AssertionError: assert [{'age': '5',...: 'Pancakes'}] == [{'age': 5, '...: 'Pancakes'}]
E         At index 0 diff: {'name': 'Cleo', 'age': '5'} != {'name': 'Cleo', 'age': 5}
E         Full diff:
E         - [{'age': 5, 'name': 'Cleo'}, {'age': 4, 'name': 'Pancakes'}]
E         + [{'age': '5', 'name': 'Cleo'}, {'age': '4', 'name': 'Pancakes'}]
E         ?          + +                           + +

But passes on my laptop.

simonw · 2024-01-24T06:11:47Z

Since it's the type conversions that are failing I am suspicious of this code:

datasette-upload-csvs/datasette_upload_csvs/__init__.py

Lines 166 to 176 in a0b64f7

    
           def insert_docs_catch_errors(conn): 
        
               database = sqlite_utils.Database(conn) 
        
               try: 
        
                   insert_docs(database) 
        
               except Exception as error: 
        
                   database["_csv_progress_"].update( 
        
                       task_id, 
        
                       {"error": str(error)}, 
        
                   ) 
        
           await db.execute_write_fn(insert_docs_catch_errors, block=False)

Why is that block=False? The insert_docs() call inside of that uses TypeTracker to set the types.

I want to see all the test failures to spot patterns in what fails.

simonw · 2024-01-25T18:15:36Z

Weird, still that same test failure where types are not converted on some Python versions for Datasette 1.0.

Even weirder: I saw it pass on 3.7, 3.10 and 3.11 with Datasette <1.0 in this run: https://github.com/simonw/datasette-upload-csvs/actions/runs/7658565625

But then when I added fail-fast: false it failed on everything except for 3.7 and 3.9: https://github.com/simonw/datasette-upload-csvs/actions/runs/7658713572?pr=36

Maybe a race condition or something else that's intermittent?

simonw · 2024-01-25T18:20:06Z

Some of those tests are taking way longer than they should, deadlock or race condition of some sort?

simonw · 2024-01-25T18:21:36Z

Looks like that is affecting ALL of the >=1.0a test runs:

simonw · 2024-01-25T18:22:36Z

Tests pass without that weird pause on my laptop against Datasette 1.0ax.

simonw · 2024-01-29T22:07:19Z

I'll try running this is Codespaces, see if I can recreate that weird testing bug.

simonw · 2024-01-29T22:12:25Z

In Codespaces the tests passed... but then it hung at the end of the test run rather than giving me back the terminal.

Hitting Ctrl+C there showed this error:

^CException ignored in: <module 'threading' from '/usr/local/python/3.10.13/lib/python3.10/threading.py'>
Traceback (most recent call last):
  File "/usr/local/python/3.10.13/lib/python3.10/threading.py", line 1537, in _shutdown
    atexit_call()
  File "/usr/local/python/3.10.13/lib/python3.10/concurrent/futures/thread.py", line 31, in _python_exit
    t.join()
  File "/usr/local/python/3.10.13/lib/python3.10/threading.py", line 1096, in join
    self._wait_for_tstate_lock()
  File "/usr/local/python/3.10.13/lib/python3.10/threading.py", line 1116, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
KeyboardInterrupt:

simonw · 2024-01-29T22:20:49Z

Still a problem.

The Python jobs that are hung all seem to have run their test but then hung on quitting the pytest process.

simonw · 2024-01-29T22:25:18Z

OK, at least the tests fail after a minute now rather than hanging around for hours.

simonw · 2024-01-30T05:26:12Z

I'm going to try a different strategy: I'm going to run the CSV parsing in an async task on the main thread, but have it yield that task every 100 items inserted. Let's see if that works better.

Backup plan after that: could I run the CSV parsing in a separate thread but have 100 rows at a time passed from that thread back to a task in the main thread which then sends them to the write database?

simonw · 2024-01-30T05:36:25Z

OK, that async strategy seems to work pretty well! While uploading a large CSV file the app stayed responsive to other requests.

simonw · 2024-01-30T05:37:21Z

Test failures still look like this:

>       assert rows == expected_rows
E       AssertionError: assert [{'age': '5',...: 'Pancakes'}] == [{'age': 5, '...: 'Pancakes'}]
E         At index 0 diff: {'name': 'Cleo', 'age': '5'} != {'name': 'Cleo', 'age': 5}
E         Full diff:
E         - [{'age': 5, 'name': 'Cleo'}, {'age': 4, 'name': 'Pancakes'}]
E         + [{'age': '5', 'name': 'Cleo'}, {'age': '4', 'name': 'Pancakes'}]
E         ?          + +                           + +

Suggesting a timing issue where the types transform sometimes hasn't completed when the test runs.

simonw · 2024-01-30T05:40:26Z

Weird that the tests ONLY fail for Datasette >= 1.0a

simonw · 2024-01-30T05:44:05Z

That debug output looks correct to me:

Transforming types to {'IncidentNumber': 'integer', 'DateTimeOfCall': 'text', 'CalYear': 'integer', 'FinYear': 'text', 'TypeOfIncident': 'text', 'PumpCount': 'integer', 'PumpHoursTotal': 'integer', 'HourlyNotionalCost(£)': 'float', 'IncidentNotionalCost(£)': 'float'}

So why isn't that taking effect before the test assertion runs?

datasette-upload-csvs/tests/test_datasette_upload_csvs.py

Lines 167 to 186 in 233010e

    
               # Now things get tricky... the upload is running in a task, so poll for completion 
        
               fail_after = 20 
        
               iterations = 0 
        
               while True: 
        
                   response = await client.get( 
        
                       "http://localhost/data/_csv_progress_.json?_shape=array" 
        
                   ) 
        
                   rows = json.loads(response.content) 
        
                   assert 1 == len(rows) 
        
                   row = rows[0] 
        
                   assert row["table_name"] == expected_table 
        
                   assert not row["error"], row 
        
                   if row["bytes_todo"] == row["bytes_done"]: 
        
                       break 
        
                   iterations += 1 
        
                   assert iterations < fail_after, "Took too long: {}".format(row) 
        
                   await asyncio.sleep(0.5) 
        
           rows = list(db[expected_table].rows) 
        
           assert rows == expected_rows

Test against Datasette pre and post 1.0a7

e8da06f

simonw added the enhancement New feature or request label Jan 24, 2024

Show Datasette version in pytest header

7a34bb1

Run everything in a transaction

4c19d5e

simonw mentioned this pull request Jan 24, 2024

Redesign how this plugin handles parsing CSV and writing to the DB #38

Closed

simonw added 2 commits January 25, 2024 10:00

Run CSV parsing in a separate thread, refs #38

8461488

fail-fast: false

1d53b13

I want to see all the test failures to spot patterns in what fails.

Wait for transform() to run using future.complete()

1a0e0ab

simonw added 2 commits January 29, 2024 22:16

add more future.result() calls

1342c44

Do not use future temp variables

70e80ed

timeout-minutes: 1 on pytest steps

79a3db1

simonw changed the title ~~Test against Datasette pre and post 1.0a7~~ New concurrency strategy, tested against Datasette pre and post 1.0a7 Jan 29, 2024

Different strategy, avoiding threads

cac7b60

Run the writes in transactions

20c6877

Debug output

233010e

simonw added 2 commits January 29, 2024 21:45

Extra 0.5s sleep in the test

9e6cc3f

Fixed some ruff lint errors

a425d85

simonw merged commit cc2c6ab into main Jan 30, 2024
9 checks passed

simonw deleted the test-against-multiple-versions branch January 30, 2024 05:49

simonw mentioned this pull request Jan 30, 2024

Improved compatibility with Datasette 1.0a+ #37

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New concurrency strategy, tested against Datasette pre and post 1.0a7 #36

New concurrency strategy, tested against Datasette pre and post 1.0a7 #36

simonw commented Jan 24, 2024 •

edited

Loading

simonw commented Jan 24, 2024

simonw commented Jan 24, 2024

simonw commented Jan 25, 2024 •

edited

Loading

simonw commented Jan 25, 2024

simonw commented Jan 25, 2024

simonw commented Jan 25, 2024

simonw commented Jan 29, 2024

simonw commented Jan 29, 2024

simonw commented Jan 29, 2024

simonw commented Jan 29, 2024

simonw commented Jan 30, 2024

simonw commented Jan 30, 2024

simonw commented Jan 30, 2024

simonw commented Jan 30, 2024

simonw commented Jan 30, 2024

New concurrency strategy, tested against Datasette pre and post 1.0a7 #36

New concurrency strategy, tested against Datasette pre and post 1.0a7 #36

Conversation

simonw commented Jan 24, 2024 • edited Loading

simonw commented Jan 24, 2024

simonw commented Jan 24, 2024

simonw commented Jan 25, 2024 • edited Loading

simonw commented Jan 25, 2024

simonw commented Jan 25, 2024

simonw commented Jan 25, 2024

simonw commented Jan 29, 2024

simonw commented Jan 29, 2024

simonw commented Jan 29, 2024

simonw commented Jan 29, 2024

simonw commented Jan 30, 2024

simonw commented Jan 30, 2024

simonw commented Jan 30, 2024

simonw commented Jan 30, 2024

simonw commented Jan 30, 2024

simonw commented Jan 24, 2024 •

edited

Loading

simonw commented Jan 25, 2024 •

edited

Loading