Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support max_concurrency for running Looker queries #110

Merged
merged 8 commits into from
Nov 25, 2019

Conversation

iserko
Copy link
Contributor

@iserko iserko commented Nov 11, 2019

Situation was that once we ran Spectacles on our Looker instance
with 128 explores in our model, it killed the instance in a way that
it stopped responding and eventually crashed. I've diagnosed that it
was merely the number of running queries at a time that caused it.
It scheduled pretty much all 128 queries immediately.

So this adds a flag --max-concurrency which you can use to tweak
and set how much load you want to put on your Looker instance.

I've tried our staging server with 10 and it works nicely and doesn't
crash anything.

Of the things I did:

  • made the validate SQL function async and moved the loop code in the parent function
  • got rid of creating an asyncio loop as I don’t believe you need it (get_event_loop creates it if one does not exist)
  • when you specify max_concurrency the code will only create <max_concurrency> queries as will not queue up more until the previous ones are finished

@iserko iserko force-pushed the max_concurrency_queries branch 2 times, most recently from f0f31b5 to 07c4691 Compare November 12, 2019 14:41
@codecov
Copy link

codecov bot commented Nov 12, 2019

Codecov Report

Merging #110 into master will decrease coverage by 0.97%.
The diff coverage is 31.74%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #110      +/-   ##
==========================================
- Coverage   66.89%   65.92%   -0.98%     
==========================================
  Files           9        9              
  Lines         716      719       +3     
==========================================
- Hits          479      474       -5     
- Misses        237      245       +8
Impacted Files Coverage Δ
spectacles/client.py 58.33% <25%> (+1.29%) ⬆️
spectacles/validators.py 56.04% <30.9%> (-1.59%) ⬇️
spectacles/runner.py 47.36% <50%> (ø) ⬆️
spectacles/cli.py 77.39% <50%> (+0.19%) ⬆️
spectacles/exceptions.py 52.17% <0%> (-26.09%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e875b3c...bd289f7. Read the comment docs.

@iserko iserko force-pushed the max_concurrency_queries branch 3 times, most recently from d15410c to e29c39d Compare November 14, 2019 12:08
Situation was that once we ran Spectacles on our Looker instance
with 128 explores in our model, it killed the instance in a way that
it stopped responding and eventually crashed. I've diagnosed that it
was merely the number of running queries at a time that caused it.
It scheduled pretty much all 128 queries immediately.

So this adds a flag --max-concurrency which you can use to tweak
and set how much load you want to put on your Looker instance.

I've tried our staging server with 10 and it works nicely and doesn't
crash anything.
@iserko iserko force-pushed the max_concurrency_queries branch from e29c39d to 106fedd Compare November 14, 2019 12:09
Copy link
Collaborator

@joshtemple joshtemple left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @iserko, this is really awesome and will be extremely important so people don't overload their instances. This is my first time writing async code in Python, so would love it if you could elaborate on the general approach here and if/why it's better so I can learn.

I want to benchmark this a bit because I have a hunch it will be faster, but I'm getting an error with --max_concurrency 0. Added a few more comments as well.

spectacles/cli.py Outdated Show resolved Hide resolved
@@ -352,6 +353,14 @@ def _build_sql_subparser(
user's branch to the revision of the branch that is on the remote. \
WARNING: This will delete any uncommited changes in the user's workspace.",
)
subparser.add_argument(
"--max-concurrency",
default=0,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to set the default to a higher number (10? 20?) so that a user who doesn't read the documentation avoids maxing out their instance accidentally. Better to provide a performance benefit for an informed user than a performance risk for an uniformed one. Open to suggestions on what that number should be.

sql_validator = SqlValidator(self.client, self.project)
sql_validator.build_project(selectors)
errors = sql_validator.validate(mode)
errors = sql_validator.validate(mode, max_concurrent)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the variable name consistent throughout.

len(task_calls),
max_concurrency,
)
while task_calls:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run this with max_concurrency = 0, I get stuck in this loop forever and queries are never launched. query_task_ids is empty as defined above, so it continually waits for non-existent query task IDs to complete. Any idea what's going on here?

I wonder if you need to change if len(query_task_ids) >= max_concurrency: to a >?

@DylanBaker
Copy link
Collaborator

One thing I realised here is that I think we lose the 250 limit on fetching queries? I can test this later on a big enough explore, but I think we will get errors when fetching the queries if it does over 293 or whatever that number is?

@joshtemple
Copy link
Collaborator

@iserko, inspired by your PR, I did a bunch of digging on asyncio and found some ways to improve this approach.

A few major changes:

  • I've added a asyncio.BoundedSemaphore object to the validator which is used to block when the maximum number of queries are running against the data warehouse (releasing a slot once we get the result back)
  • I've used an asyncio.Queue object to handle tracking of the running queries. This allows us to periodically check for query results as a coroutine as well, pulling all Looker query task IDs, extracting results, and pushing the running or pending queries back onto the Queue.
  • I renamed --max-concurrency to --query-slots, which I think is a more intuitive name for the flag. I set the default to 10. Interested to hear your and @DylanBaker's thoughts on this though, the name is open for discussion for sure.

Let me know what you think of these changes.

I've adjusted the approach to use an asyncio Queue (to preserve state 
for running query task IDs) and a BoundedSemaphore (to limit the number 
of queries running concurrently). I've also renamed the flag to 
query_slots, which is certainly open to discussion.
@joshtemple joshtemple force-pushed the max_concurrency_queries branch from cc3d447 to 68ebf6b Compare November 23, 2019 01:48
@joshtemple joshtemple added this to the 0.1.1 milestone Nov 23, 2019
@joshtemple joshtemple merged commit 14cb800 into spectacles-ci:master Nov 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants