Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New concurrency strategy, tested against Datasette pre and post 1.0a7 #36

Merged
merged 14 commits into from
Jan 30, 2024

Conversation

simonw
Copy link
Owner

@simonw simonw commented Jan 24, 2024

@simonw simonw added the enhancement New feature or request label Jan 24, 2024
@simonw
Copy link
Owner Author

simonw commented Jan 24, 2024

Weird, it's failing in CI against 1.0+ with this error:

>       assert rows == expected_rows
E       AssertionError: assert [{'age': '5',...: 'Pancakes'}] == [{'age': 5, '...: 'Pancakes'}]
E         At index 0 diff: {'name': 'Cleo', 'age': '5'} != {'name': 'Cleo', 'age': 5}
E         Full diff:
E         - [{'age': 5, 'name': 'Cleo'}, {'age': 4, 'name': 'Pancakes'}]
E         + [{'age': '5', 'name': 'Cleo'}, {'age': '4', 'name': 'Pancakes'}]
E         ?          + +                           + +

But passes on my laptop.

@simonw
Copy link
Owner Author

simonw commented Jan 24, 2024

Since it's the type conversions that are failing I am suspicious of this code:

def insert_docs_catch_errors(conn):
database = sqlite_utils.Database(conn)
try:
insert_docs(database)
except Exception as error:
database["_csv_progress_"].update(
task_id,
{"error": str(error)},
)
await db.execute_write_fn(insert_docs_catch_errors, block=False)

Why is that block=False? The insert_docs() call inside of that uses TypeTracker to set the types.

I want to see all the test failures to spot patterns in what fails.
@simonw
Copy link
Owner Author

simonw commented Jan 25, 2024

Weird, still that same test failure where types are not converted on some Python versions for Datasette 1.0.

Even weirder: I saw it pass on 3.7, 3.10 and 3.11 with Datasette <1.0 in this run: https://github.com/simonw/datasette-upload-csvs/actions/runs/7658565625

But then when I added fail-fast: false it failed on everything except for 3.7 and 3.9: https://github.com/simonw/datasette-upload-csvs/actions/runs/7658713572?pr=36

Maybe a race condition or something else that's intermittent?

@simonw
Copy link
Owner Author

simonw commented Jan 25, 2024

Some of those tests are taking way longer than they should, deadlock or race condition of some sort?

CleanShot 2024-01-25 at 10 19 39@2x

@simonw
Copy link
Owner Author

simonw commented Jan 25, 2024

Looks like that is affecting ALL of the >=1.0a test runs:

CleanShot 2024-01-25 at 10 21 11@2x

@simonw
Copy link
Owner Author

simonw commented Jan 25, 2024

Tests pass without that weird pause on my laptop against Datasette 1.0ax.

@simonw
Copy link
Owner Author

simonw commented Jan 29, 2024

I'll try running this is Codespaces, see if I can recreate that weird testing bug.

@simonw
Copy link
Owner Author

simonw commented Jan 29, 2024

In Codespaces the tests passed... but then it hung at the end of the test run rather than giving me back the terminal.

Hitting Ctrl+C there showed this error:

^CException ignored in: <module 'threading' from '/usr/local/python/3.10.13/lib/python3.10/threading.py'>
Traceback (most recent call last):
  File "/usr/local/python/3.10.13/lib/python3.10/threading.py", line 1537, in _shutdown
    atexit_call()
  File "/usr/local/python/3.10.13/lib/python3.10/concurrent/futures/thread.py", line 31, in _python_exit
    t.join()
  File "/usr/local/python/3.10.13/lib/python3.10/threading.py", line 1096, in join
    self._wait_for_tstate_lock()
  File "/usr/local/python/3.10.13/lib/python3.10/threading.py", line 1116, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
KeyboardInterrupt: 

@simonw
Copy link
Owner Author

simonw commented Jan 29, 2024

Still a problem.

CleanShot 2024-01-29 at 14 20 05@2x

The Python jobs that are hung all seem to have run their test but then hung on quitting the pytest process.

@simonw
Copy link
Owner Author

simonw commented Jan 29, 2024

OK, at least the tests fail after a minute now rather than hanging around for hours.

@simonw simonw changed the title Test against Datasette pre and post 1.0a7 New concurrency strategy, tested against Datasette pre and post 1.0a7 Jan 29, 2024
@simonw
Copy link
Owner Author

simonw commented Jan 30, 2024

I'm going to try a different strategy: I'm going to run the CSV parsing in an async task on the main thread, but have it yield that task every 100 items inserted. Let's see if that works better.

Backup plan after that: could I run the CSV parsing in a separate thread but have 100 rows at a time passed from that thread back to a task in the main thread which then sends them to the write database?

@simonw
Copy link
Owner Author

simonw commented Jan 30, 2024

OK, that async strategy seems to work pretty well! While uploading a large CSV file the app stayed responsive to other requests.

@simonw
Copy link
Owner Author

simonw commented Jan 30, 2024

Test failures still look like this:

>       assert rows == expected_rows
E       AssertionError: assert [{'age': '5',...: 'Pancakes'}] == [{'age': 5, '...: 'Pancakes'}]
E         At index 0 diff: {'name': 'Cleo', 'age': '5'} != {'name': 'Cleo', 'age': 5}
E         Full diff:
E         - [{'age': 5, 'name': 'Cleo'}, {'age': 4, 'name': 'Pancakes'}]
E         + [{'age': '5', 'name': 'Cleo'}, {'age': '4', 'name': 'Pancakes'}]
E         ?          + +                           + +

Suggesting a timing issue where the types transform sometimes hasn't completed when the test runs.

@simonw
Copy link
Owner Author

simonw commented Jan 30, 2024

Weird that the tests ONLY fail for Datasette >= 1.0a

CleanShot 2024-01-29 at 21 40 10@2x

@simonw
Copy link
Owner Author

simonw commented Jan 30, 2024

That debug output looks correct to me:

Transforming types to {'IncidentNumber': 'integer', 'DateTimeOfCall': 'text', 'CalYear': 'integer', 'FinYear': 'text', 'TypeOfIncident': 'text', 'PumpCount': 'integer', 'PumpHoursTotal': 'integer', 'HourlyNotionalCost(£)': 'float', 'IncidentNotionalCost(£)': 'float'}

So why isn't that taking effect before the test assertion runs?

# Now things get tricky... the upload is running in a task, so poll for completion
fail_after = 20
iterations = 0
while True:
response = await client.get(
"http://localhost/data/_csv_progress_.json?_shape=array"
)
rows = json.loads(response.content)
assert 1 == len(rows)
row = rows[0]
assert row["table_name"] == expected_table
assert not row["error"], row
if row["bytes_todo"] == row["bytes_done"]:
break
iterations += 1
assert iterations < fail_after, "Took too long: {}".format(row)
await asyncio.sleep(0.5)
rows = list(db[expected_table].rows)
assert rows == expected_rows

@simonw simonw merged commit cc2c6ab into main Jan 30, 2024
9 checks passed
@simonw simonw deleted the test-against-multiple-versions branch January 30, 2024 05:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant