-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Support borrowed deserializaton #309
Conversation
Tested with rustc 1.51.0 (2fd73fabe 2021-03-23). |
let client = &runtime.client; | ||
let incoming = incoming(client).take(1); | ||
runtime.run(incoming, f).await?; | ||
do_run(runtime, crate::handler_fn(func)).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why this code was taken out into a separate function? The previous version was easier to follow. Just my 2c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's because of rust-lang/rust#44721, we currently can't mix impl Trait
and explicit generic arguments.
body: response, | ||
let request_id = ctx.request_id.clone(); | ||
let request_id_for_panic = ctx.request_id.clone(); | ||
let task = tokio::spawn(async move { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this part a bit confusing. It was easier to follow after I renamed a few variables and added comments.
Here is my edit for lines 169 - 222:
let handler_call = tokio::spawn(async move {
let event_body = serde_json::from_slice(&body);
// call the handler function if `event_body` was deserialized successfully or log an error
// in either case return `Result<Request<Body>, Error>`
match event_body {
Ok(event_body) => match handler.call(event_body, ctx).await {
Ok(response) => {
// handler call success
trace!("Ok response from handler (run loop)");
EventCompletionRequest {
request_id: &request_id,
body: response,
}
.into_req()
}
Err(err) => {
// handler call error
error!("{}", err); // logs the error in CloudWatch
EventErrorRequest {
request_id: &request_id,
diagnostic: Diagnostic {
error_type: type_name_of_val(&err).to_owned(),
error_message: format!("{}", err), // returns the error to the caller via Lambda API
},
}
.into_req()
}
},
Err(err) => {
// `event_body` deserialization error
error!("{}", err); // logs the error in CloudWatch
EventErrorRequest {
request_id: &request_id,
diagnostic: Diagnostic {
error_type: type_name_of_val(&err).to_owned(),
error_message: format!("{}", err), // returns the error to the caller via Lambda API
},
}
.into_req()
}
}
});
// execute handler call from the task above
// the error handling is done inside the `handler_call` task,
// so the only way it may fail is if it panics
let handler_response = match handler_call.await {
Ok(req) => req,
Err(err) if err.is_panic() => {
error!("{:?}", err); // inconsistent with other log record formats - to be reviewed
EventErrorRequest {
request_id: &request_id_for_panic,
diagnostic: Diagnostic {
error_type: type_name_of_val(&err).to_owned(),
error_message: format!("Lambda panicked: {}", err),
},
}
.into_req()
}
Err(_) => unreachable!("tokio::task should not be canceled"),
}?;
// send the handler function response back to AWS API
client
.call(handler_response)
.await
.expect("Unable to send response to Runtime APIs");
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll incorporate those changes asap.
@@ -31,7 +31,7 @@ async fn main() -> Result<(), Error> { | |||
SimpleLogger::new().with_level(LevelFilter::Info).init().unwrap(); | |||
|
|||
let func = handler_fn(my_handler); | |||
lambda_runtime::run(func).await?; | |||
lambda_runtime::run::<Request, _, _>(func).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment about _,_>
here would be good.
To be honest, I don't understand how this change works and why we need lambda_runtime::run::<Value, _, _>(func).await?;
, but that's just me not know how GATs and SERDE well enough. What I think would be good is to explain what the _,_>
part means, why need it and if it should be copied as-is or there can be something in place of _
. Quite often _
is used in examples for brevity, which raises the question, what do I put in there in my case? Do I go with _
or do I have to provide some value?
A macro will definitely help resolve the confusion for dummies like me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a final version of this PR will definitely include the macro - there's just a lot of incidental complexity without it. The _
's are currently used for brevity, because we are required to provide the first generic parameter. Because that first one is different from the actual handler argument for borrowed deserialize, there's no way for the compiler to infer it. That's not true for the handler itself and the return type, so we can just say _
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@duarten, thank you for explaining it to me. I think I get it now :)
I paraphrased your explanation as a comment:
// The 1st param (Request) has to be specified every time because the compiler cannot infer it.
// The 2nd and 3rd parameters can use their default types and be omitted (_).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening this. While most of my comments were focused on "bit of this are a breaking change" and "this might need to be sealed", I also want to ask—perhaps unfairly—whether this is change is necessary in the first place. Given the architecture of the Lambda service itself, I'm not sure that zero-copy deserialization will help here for a while—I think there's significantly lower-hanging, performance-related, fruit available (such as using different allocators like jemalloc or mimalloc). I'm worried that this change does not carry its weight in complexity in return for the performance gains. Have you benchmarked these changes?
type Error; | ||
type Error: fmt::Display + Send + Sync; | ||
/// The type of Response this Handler will return | ||
type Response: IntoResponse; | ||
type Response: IntoResponse + Send + Sync; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these bounds need to be added? If they're necessary, we'd need to prepare a new breaking change, which I'd rather avoid if not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These bounds are lifted from run()
, so in that sense they're not really breaking (any Handler
would already need to satisfy those bounds for it to be useful).
R: IntoResponse + Send + Sync, | ||
E: Send + Sync, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Repeating this comment) Do these bounds need to be added? If they're necessary, we'd need to prepare a new breaking change, which I'd rather avoid if not necessary.
@@ -107,7 +108,7 @@ pub fn handler<H: Handler>(handler: H) -> Adapter<H> { | |||
impl<F, R, Fut> Handler for F | |||
where | |||
F: Fn(Request, Context) -> Fut, | |||
R: IntoResponse, | |||
R: IntoResponse + Send + Sync, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Repeating this comment) Do these bounds need to be added? If they're necessary, we'd need to prepare a new breaking change, which I'd rather avoid if not necessary.
/// A helper trait implemented by an event type that is deserialized with a 'de lifetime. | ||
pub trait Deserializable<'de> { | ||
/// The associated type that is the concrete event type. | ||
type Deserialize: Deserialize<'de> + Send; | ||
} | ||
|
||
impl<T: Send + DeserializeOwned> Deserializable<'_> for T { | ||
type Deserialize = T; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few notes:
- If this is gonna be added, it might make sense for this to be a sealed trait. That being said, do you intend for this trait to be implemented by consumers of this library?
- Most traits in Rust are verbs, not adjectives. If we keep this trait, I'd call it
Deserialize
and refer to it aslambda_runtime::Deserialize
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the trait would have to be implemented by consumers wanting to use borrowed deserialization, hidden behind a macro.
pub trait Handler<'de, A: Deserializable<'de>, B: Send> { | ||
/// Errors returned by this handler. | ||
type Error; | ||
type Error: fmt::Display + Send + Sync; | ||
/// Response of this handler. | ||
type Fut: Future<Output = Result<B, Self::Error>>; | ||
type Fut: Future<Output = Result<B, Self::Error>> + Send; | ||
/// Handle the incoming event. | ||
fn call(&self, event: A, context: Context) -> Self::Fut; | ||
fn call(&self, event: <A as Deserializable<'de>>::Deserialize, context: Context) -> Self::Fut; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've mentioned this before, but these changes are breaking, so we'd need to make a breaking release pretty soon.
Didn't we have to remove Sync bounds in issue #287 because it was breaking something in other libraries? |
Regarding Sync: note this PR doesn't (intentionally) change anything, it merely lifts bounds to the trait. The error and response type ( |
I'm sure there are lots of other performance gains to have, and I agree that without higher-kinded types the implementation is complex (maybe it can be simplified by someone with more rust experience). I haven't benchmarked these specific changes (although over time I've benchmarked my fair share of zero-copy algorithms), but I can write a quick benchmark if that will influence the merge/don't merge decision. For now we can assume there will always be gains, proportional to the payload size. This being said, not making this change means that event definitions must be As a consumer of this create I want to write the most performant code I can right now, and reap cost/latency benefits over time without having to do anything. But I totally understand wanting to wait until the impact is more significant or the language makes the change trivial :) |
(Disclosure: I no longer work at Amazon, so my vote/opinion will matter less—this project is more in @coltonweaver et. al.'s hands now, but I figured it made sense for me to respond rather than to ghost you.) Broadly speaking, I think I agree with this sentiment of this statement. However, Rust's present limitations, such as the absence of GATs, implies that pervasive, cheap cloning in the style of bytes or tower is often necessary. If you're interested in making performance improvements here, I'd suggest looking into Arc'ing/Bytes'ing the payload. However, I feel obligated to warn you that you might not see any benefit in this runtime due to factors outside of this library's control. Since I don't have the permissions to do this, my suggestion to the maintainers of this project would be close this PR and revisit this approach if/when Lambda supports microsecond-level billing and Rust has stabilized support for GATs. |
Weighing the possible gains here against the complexity of the solution, and the need to prepare another breaking change so soon after 0.3 I'm inclined to agree with @davidbarsky here. @duarten thank you for all your work on this, but I'm going to go ahead and close this PR. Perhaps this is something we can revisit in the future, but I'm not sure it's the right direction considering the current environment. |
@duarten , keep us updated if you make any progress on this. Closing this PR is probably the right thing to do, but I think this problem is worth solving one way or the other. |
Description of changes:
This PR allows the request passed to a handler to be deserialized with a lifetime
'de
, thus avoiding copies when deserializing. Due to the lack of higher kinded types in Rust, the implementation relies on GATs. This makes things ugly on the Lambda side. For example, callingrun()
forDeserializeOwned
types now becomes:For borrowed deserialized types, this becomes:
This ugliness can be fixed by macros, like:
If the PR looks good, I can prepare a follow-up with the macro.
By submitting this pull request