Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MANTA-4341 Optimize conditional object operations #47

Merged
merged 44 commits into from
Apr 6, 2020

Conversation

chudley
Copy link
Contributor

@chudley chudley commented Mar 12, 2020

This change gives buckets-mdapi the option of handling conditional requests itself for the getobject, createobject, updateobject, and deleteobject RPCs. I attempted to mirror the existing behaviour from buckets-api (which I think is based on Restify's approach) where possible. I'm happy to revisit any of that if we're OK with API changes that might make sense here.

There's a few parts in this change that I'm not totally sure about, but I wanted to get some eyes on it while it's being finished up. I still would also like to add more tests to tests/rpc_handlers.rs, and I'm sure there's some Rust-related no-nos I've committed. I also haven't actually verified that the transactions at the database level are working as I expect (I would like to include this in the test suite but I'm not sure what that would look like).

Probably my biggest concern is that the conditionals here aren't strictly following HTTP's definitions, such as error codes and supported conditions for certain operations (e.g. "If-Modified-Since" is supported on "updateobject" RPCs, but HTTP says it should only be used on GETs and HEADs). That enforcement happens in TritonDataCenter/manta-buckets-api#53 by way of selectively passing these conditional headers to buckets-mdapi.

This is on purpose for now, at least because of the precondition failures that would lead to a 304 Not Modified, which isn't a HTTP error but a redirect, which buckets-api handles as a successful response. It also helps with the CreateBucketObject case, where having the initial "getobject" in that request fail with the same conditions that the subsequent "createobject" would fail on is nice. I'd be interested to hear thoughts on this, and I also plan on documenting this somewhere with whatever conclusion we reach.

The change is make check clean. I've also written some new tests for node-manta that flex these conditional requests. They can be seen here.

A few examples of the getobject RPC using this new (optional) conditions object:

No conditions object

./target/debug/examples/fastcall --method getobject --args '[ { "bucket_id": "852e53fb-d09b-4b37-844c-d06a497d215f", "owner": "f55a289e-d1a4-476b-bc3f-6e15cd8e26d2", "name": "testing", "vnode": 0, "request_id": "bd0b716a-2873-4d75-a3d4-13f30c2c850b" } ]' | json
[
  {
    "bucket_id": "852e53fb-d09b-4b37-844c-d06a497d215f",
    "content_length": 0,
    "content_md5": "test",
    "content_type": "application/json",
    "created": "2019-10-24T08:14:21.945237Z",
    "headers": {},
    "id": "d2420c61-9bec-406c-a0e2-b992dfdbc0a9",
    "modified": "2019-10-24T08:14:21.945237Z",
    "name": "testing",
    "owner": "f55a289e-d1a4-476b-bc3f-6e15cd8e26d2",
    "properties": null,
    "sharks": [
      {
        "datacenter": "test-0",
        "manta_storage_id": "1.stor"
      }
    ]
  }
]

if-match success

./target/debug/examples/fastcall --method getobject --args '[ { "bucket_id": "852e53fb-d09b-4b37-844c-d06a497d215f", "owner": "f55a289e-d1a4-476b-bc3f-6e15cd8e26d2", "name": "testing", "vnode": 0, "request_id": "bd0b716a-2873-4d75-a3d4-13f30c2c850b", "conditions": { "if-match": [ "d2420c61-9bec-406c-a0e2-b992dfdbc0a9" ] } } ]' | json
[
  {
    "bucket_id": "852e53fb-d09b-4b37-844c-d06a497d215f",
    "content_length": 0,
    "content_md5": "test",
    "content_type": "application/json",
    "created": "2019-10-24T08:14:21.945237Z",
    "headers": {},
    "id": "d2420c61-9bec-406c-a0e2-b992dfdbc0a9",
    "modified": "2019-10-24T08:14:21.945237Z",
    "name": "testing",
    "owner": "f55a289e-d1a4-476b-bc3f-6e15cd8e26d2",
    "properties": null,
    "sharks": [
      {
        "datacenter": "test-0",
        "manta_storage_id": "1.stor"
      }
    ]
  }
]

if-match failure

./target/debug/examples/fastcall --method getobject --args '[ { "bucket_id": "852e53fb-d09b-4b37-844c-d06a497d215f", "owner": "f55a289e-d1a4-476b-bc3f-6e15cd8e26d2", "name": "testing", "vnode": 0, "request_id": "bd0b716a-2873-4d75-a3d4-13f30c2c850b", "conditions": { "if-match": [ "xxx" ] } } ]' | json
[
  {
    "error": {
      "message": "if-match '\"xxx\"' didn't match etag 'd2420c61-9bec-406c-a0e2-b992dfdbc0a9'",
      "name": "PreconditionFailedError"
    }
  }
]

if-modified-since failure

./target/debug/examples/fastcall --method getobject --args '[ { "bucket_id": "852e53fb-d09b-4b37-844c-d06a497d215f", "owner": "f55a289e-d1a4-476b-bc3f-6e15cd8e26d2", "name": "testing", "vnode": 0, "request_id": "bd0b716a-2873-4d75-a3d4-13f30c2c850b", "conditions": { "if-modified-since": "2020-03-12T10:00:00Z" } } ]' | json
[
  {
    "error": {
      "message": "object was modified at '2019-10-24T08:14:21.945237+00:00'; if-modified-since '2020-03-12T10:00:00+00:00'",
      "name": "PreconditionFailedError"
    }
  }
]

Richard Bradley and others added 30 commits January 15, 2020 12:21
Copy link

@kellymclaughlin kellymclaughlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't gone over everything yet, but I wanted to share my comments thus far just to give you some early feedback. This is looking great btw!

pub if_unmodified_since: Option<types::Timestamptz>,
}

pub fn error(error_type: error::BucketsMdapiErrorType, msg: String) -> serde_json::Value {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is expected to be a specialization for this module. I don't see it used anywhere else, but I may have just missed it. If that's true then maybe it makes sense to specialize it a bit further here to the BucketsMdapiErrorType::PreconditionFailedError type rather than having that parameterized. It also probably doesn't need to be pub if it's only used in this module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea good point. I've changed this to be an internal function now that always returns a PreconditionFailedError. It's a bit thin on the ground with the changes to BucketsMdapiError as well, and I'm not totally sure on its utility any more. Thinking about it, it could probably just do with being removed

.and_then(|_rows| {
// XXX
//
// Somehow this should skip this second get if the above conditional already gets the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing the conditional::request first if we first do this query for the object since the SQL used is the same and then apply the conditional checks if appropriate afterwards in the and_then block that follows. I think that would work to get the same result, but avoid the extra query. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started looking at this, but I wasn't totally keen on having to also check for 0 or >1 rows in do_get, similar to what conditional::request has to do.

I've made some changes in fe7e575 that causes conditional::request to return the rows it finds in its query, and then go_get will just pass those back if there are Some(rows). Otherwise, None causes go_get to do the query itself (which would be the case if the request wasn't conditional).

I think it's doing the right thing, but I'd appreciate eyes on it!

* Is reaching into the Value like this safe? I think we get `Value::Null` on a bad
* value, so possibly it's ok. Is it fast enough?
*/
if e["name"] == "PostgresError" {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could possibly improve on this a bit and not have to deal with the raw Value here. What I'm thinking is maybe we can have do_get return a Result<Option<ObjectResponse>, BucketsMdapiErrorType> rather than Result<Option<ObjectResponse>, Value>. If we extend BucketsMdapiErrorTyep with a new variant that looks like PostgresError(String) maybe we can achieve similar results without the need to muck directly with that Value. This is pretty hand wavy at this point so if you want I can try this out and share more details about what I mean if it seems like it will work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one ended up being a bit of a mission! I agree though because the error type was a little bit confusing for me at first.

Basically I've made enum BucketsMdapiErrorType into our internal representation of an error, and it's now enum BucketsMdapiError. Like you've said, we've now got String variants that include a message with the error which is nice. Only when it comes to serializing and deserializing do we wrap this into BucketsMdapiWrappedError, which buys us some niceties from Serde with wrapping it up into what Fast expects our errors to look like. I've kind of just inverted the existing BucketsMdapiErrorType and BucketsMdapiError types.

I hope that makes sense. It took me a while to get to this point because Serde can almost afford us to do away with ToString (except for our test suite) and BucketsMdapiWrappedError. I've written that up into a comment.

The delta for just this change is between fe7e575 and 1ceb15b.

Again, I think this is doing what I expect, but I'd appreciate extra eyes on it.

})
}

fn is_conditional(conditions: &Conditions) -> bool {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For is_conditional and check_conditional you could consider making them methods attached to the Conditions type just to enable a call syntax of conditions.is_conditional() or conditions.check(). Doesn't affect functionality at all so feel free to ignore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, thanks! check and is_conditional are now methods of the Conditions type.

* return those rows as-is.
*/
if let Some(r) = rows {
return Ok(r);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One caution here is that this would bail out early without committing the transaction which would result in a rollback. It doesn't matter in this case since we're only reading data out. You could actually just avoid using an explicit transaction for do_get because we're only doing the single query so there won't be any transaction id savings by explicitly managing a transaction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, didn't plan on that. I thought the rollback happened only when txn gets dropped at the end of do_get if txt.commit() hasn't been called, and that my early return would still lead to that commit happening. Maybe I've misunderstood this a bit. Mind pointing out where the rollback happens?

You're right though on the whole transaction handling thing. Your initial recommendation is probably the way to go. My main problem with having go_get check the conditions itself was that it would also have to check the row count for 0 or >1 rows, but maybe if I have Conditions::check() take RowSlice instead I can have that method do that row count check also. I'll have a little play around!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've taken another go at this in chudley@17b9772. do_get now just runs the SQL for a get and passes an ObjectResponse to the conditional check, which has also been changed to accept that instead.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I like the way that looks now. Very nice!

buckets-mdapi/src/error.rs Show resolved Hide resolved
buckets-mdapi/src/sql.rs Outdated Show resolved Hide resolved
@kellymclaughlin
Copy link

I think the CI checks failed just because it needs a merge from master.

@chudley
Copy link
Contributor Author

chudley commented Apr 2, 2020

Looks like checks are passing for this now after the merge from master.

I wanted to check whether this could be integrated ahead of all the buckets-api parts. It looks like that's the case, where the new node-manta tests in TritonDataCenter/node-manta#382 around conditional requests all pass except for the 400 Bad Request ones, which is for a bad date being supplied (as expected, noted in MANTA-4341).

So subject to review, we should be able to integrate this and followup later with the remaining up-stack pieces that are also in review.

@chudley chudley merged commit 548b0f2 into TritonDataCenter:master Apr 6, 2020
@chudley chudley deleted the dev-MANTA-4341 branch April 6, 2020 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants