Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Tracing for SQLAlchemy and Flask-SQLAlcemy #14

Merged
merged 21 commits into from
Feb 20, 2018

Conversation

therealryanbonham
Copy link
Contributor

Traces calls from SQLAlchemy ORM query and session classes, and calls from Flask-SQLAlchemy.
Also patches tox.ini so that py36 is used for coverage run, as if your default python was 27 it errors.

Added a couple of basic unit test for both sqlalchemy and flask-sqlalcemy tracing. Updated README as well.

Welcome feedback, this is my first attempt to extend X-Ray Tracing.

README.md Outdated
@@ -175,7 +175,61 @@ app.router.add_get("/", handler)

web.run_app(app)
```
**Add Flask middleware**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a duplicate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Removing.

README.md Outdated

xray_recorder.end_segment()
app = Flask(__name__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for a general SQLAlchemy guide there is no need to involve the web framework.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to initialize the app from Flask in order to Initialize Flask-SQLAlchemy

db = XRayFlaskSqlAlchemy(app)

was just trying to show complete example..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. I'm fine with this.

README.md Outdated

```python
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you use wild import?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in fact that entire import can go. I refactored the way i was doing imports and that is not needed.

README.md Outdated
from aws_xray_sdk.core.context import Context
from aws_xray_sdk.ext.sqlalchemy.query import XRaySessionMaker

xray_recorder.configure(service='test', sampling=False, context=Context())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For quick start guide for a SQL library you don't need to configure anything on the recorder. And you could also remove the import on context as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

from flask_sqlalchemy import SQLAlchemy, BaseQuery, _SessionSignalEvents, get_state
from aws_xray_sdk.ext.sqlalchemy.query import XRaySession, XRayQuery

def decorate_all_functions(function_decorator):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this file be in the ext/flask folder? aws_xray_sdk/ext/flask/sqlalchemy.py for example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flask-SQLAlchemy is a separate python package then Flask, which is why i kept them separated. Willing to change if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leave it for now.

@@ -0,0 +1,95 @@
from __future__ import print_function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused import? Print statement should not be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Was used for debugging.

from functools import wraps


def decorate_all_functions(function_decorator):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can create a sqlalchemy util module that holds these shared functions so that you don't duplicate code.
decorate_all_functions and xray_on_call can be shared in flask_sqlalchemy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good point. I meant to go back and do this, and completely forgot.

pass

@decorate_all_functions(xray_on_call)
class XRayQuery(Query):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason you decorate functions from Query class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Query Class called by flask-sqlalchemyt, so this gets us complete traces

@@ -0,0 +1,61 @@
from __future__ import print_function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

@@ -35,7 +35,7 @@
'Programming Language :: Python :: 3.6',
],

install_requires=['jsonpickle', 'wrapt', 'requests'],
install_requires=['jsonpickle', 'wrapt', 'requests', 'future'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need future if you remove all print statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No future will still be needed for the super() builtin
from builtins import super

@haotianw465
Copy link
Contributor

Hi, could you take a look at http://docs.aws.amazon.com/xray/latest/devguide/xray-api-segmentdocuments.html#api-segmentdocuments-sql
When you are sending data to X-Ray back-end service, if you have invalid key under subsegment.sql, your data will not be rejected, but the data will not propagate through our system. Some of the listed keys in that linked doc might be unavailable for specific sql libraries, which is OK.

Also for the actual query recording you have to submit that change to a separate CR. We have more strict policies for reviewing code that related to query recording. And also the query is required to be sanitized.

password = db.Column(db.String(255), nullable=False)


def _search_entity(entity, name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This along with find_sub could also be a helper class shared between your two query_test.py. And since these two functions are very general they could be under tests/ root so any other unit test can utilize your contribution.

return None


def find_sub(segment, name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend rename it to find_subsegment. Abbreviate on functions is not a good practice in general if it is not absolutely needed.

@therealryanbonham
Copy link
Contributor Author

@haotianw465 I made the change and used the set_sql to set the sanitized SQL, and added one unit test to test sanitization. That should not me an issue with sqlalchemy orm, as it passes everything via params and those are not included in the sql output i exposed. I will go ahead and comment out the set_sql and test for that, and then make second PR for that work once this first batch is accepted.

@haotianw465
Copy link
Contributor

Hi, per doc link http://docs.aws.amazon.com/xray/latest/devguide/xray-api-segmentdocuments.html#api-segmentdocuments-sql
This is the sample sql section

  "sql" : {
    "url": "jdbc:postgresql://aawijb5u25wdoy.cpamxznpdoq8.us-west-2.rds.amazonaws.com:5432/ebdb",
    "preparation": "statement",
    "database_type": "PostgreSQL",
    "database_version": "9.5.4",
    "driver_version": "PostgreSQL 9.4.1211.jre7",
    "user" : "dbuser",
    "sanitized_query" : "SELECT  *  FROM  customers  WHERE  customer_id=?;"
  }

Could you try to get some other info and add them to the sql section as well? For an architecture that has multiple databases the info like database_type, user, url could be very helpful for customers.

For sql subsegment name we have convention of using databas@host just like the one in the sample showed "name": "[email protected]". This makes sure when you have multiple databases the back-end won't aggregate the latency altogether.

@therealryanbonham
Copy link
Contributor Author

therealryanbonham commented Dec 16, 2017

I have a separate branch started to get the set_sql working. I have it adding: url, user, database_type and sanitized_query.

https://github.com/therealryanbonham/aws-xray-sdk-python/tree/set_sql

One issue i see with the way this tracing I implemented works is it is tracing each function in sqlalchemy and thus setting a local namespace on the subsegments, so it is not creating a remote segment..

Thoughts? Want that as a seperate PR?

@haotianw465
Copy link
Contributor

trace = xray_recorder.begin_subsegment(class_name+'.'+func.__name__) could you change the variable name from trace to subsegments? Trace is a collection of segments and subsegments. Then you can do subsegment.namespace = 'remote'. Database subsegment should always be remote.

For setting sql metadata could you have url user database_type included in this PR and a separated PR only for sanitized_query?

@therealryanbonham
Copy link
Contributor Author

The combination or using the class and function name as the subsegment name, and namespace of remote, makes a very noisy service map, as every remote call shows up as it's own object. It seems remote namespace wants the name of the subsegment to be something like the hostname you are calling.

Something like
xray_recorder.begin_subsegment(sql['url'])
instead of
xray_recorder.begin_subsegment(class_name+'.'+func.name)

The issue in that though is the trace become just a bunch of repeat named subsegment calls.. It seems like remote namespaces need a way to be named, separate from the subsegment name... so that multiple sql functions can be traced as subsegments to a single remote namespace..

Something like

xray_recorder.begin_subsegment(class_name+'.'+func.name, namespace=remote, namespace_name=sql['url'])

Is that supported in any way? I have tried to figure out some solution to this but am coming up blank...

@haotianw465
Copy link
Contributor

Hi, as I pointed out in the previous thread:
For sql subsegment name we have convention of using databas@host just like the one in the sample showed "name": "[email protected]". This provides proper level of aggregation on service graph.

Using url only will result in all databases on that instance aggregated to one node, and looks like the naming of your current PR result in too many nodes on service graph.

The mentioned naming convention works well on all of our SDKs so far and we haven't heard any complain. But it really depends on if the patching code can retrieve the info about database name and endpoint. In this case looks like you can have all the info you need to construct database@endpoint.

@haotianw465
Copy link
Contributor

Hi, the PR is very close to a ready state. Do you have any update on SQL subsegment naming? Or do you prefer us to take whatever you have right now and finish the work by ourselves?

@therealryanbonham
Copy link
Contributor Author

@haotianw465 I think this is ready for review again. It has SQL URL, user, and datatype added to the sql metadata. The sanatized_query is commented out for this PR. Traces are named by URL and have the function name as an annotation with the key "sqlalchemy".

I added another set of helper function to util.py to search for subsegments based on annotation key/value and updated unit test to use this as well.

if u.password is None:
safe_url = u.geturl()
else:
# String password from URL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean strip?

if sql is not None:
if getattr(c._local, 'entities', None) is not None:
subsegment = xray_recorder.begin_subsegment(sql['url'], namespace='remote')
# subsegment = xray_recorder.begin_subsegment(class_name+'.'+func.__name__, )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick. Probably should be removed?

""" Test calling all() on get all records.
Verify we run the query and return the SQL as metdata"""
# with capsys.disabled():
with capsys.disabled():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate code on line63 and 64?

@haotianw465 haotianw465 merged commit d110386 into aws:master Feb 20, 2018
@chanchiem chanchiem mentioned this pull request Jan 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants