-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add reqest properties for azure_kusto_data
crate
#685
Conversation
In general, with the naming convention in other kusto sdks the crate should be called "azure-kusto-data" |
Hey @AsafMah, we followed https://azure.github.io/azure-sdk/general_design.html#namespaces when naming the other crates. In the guidelines it says
I think that the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments from going over the code
&mut request, | ||
) | ||
.unwrap(); | ||
add_mandatory_header2(&Accept::new("application/json"), &mut request).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for all the unwraps in this file?
Using ?
seems to compile without error, and of course won't trigger a panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an artifact from migration from where i developed the code ... the plan is to remove all unwraps throughout the code before merging!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to new header handling and removed the panics.
use http::request::Builder; | ||
|
||
#[derive(Debug, Clone, Copy, PartialEq, Eq)] | ||
pub struct Accept<'a>(&'a str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file seems a bit sus,
Accept and AcceptEncoding are really common headers, it's a bit weird that we need to define them like that in the kusto package specfically.
It seems like something that should be defined in azure_core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
completely agree, this should be moved to core! Having it here to get things to work initially!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved the request options to core crate
AUTHORITY_ID_NAME => authority_id = Some(v), | ||
MSI_PARAMS_NAME => msi_params = Some(v), | ||
FEDERATED_SECURITY_NAME => match v { | ||
"true" => federated_security = Some(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DRY - you're doing this exact comparison for multiple props
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved boolean parsing into its own reusable function.
match k { | ||
DATA_SOURCE_NAME => data_source = Some(v), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parsing connection strings are a bit more complex than that - the names aren't case sensitive, and some of them have a aliases.
Take a look at https://github.com/Azure/azure-kusto-node/blob/master/azure-kusto-data/source/connectionBuilder.ts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a basic function to support aliases for keys based on the link. Need to look into the implementation there a bit more, but i think we need to normalize a bit more via lower casing input strings.
request: &mut Request, | ||
next: &[Arc<dyn Policy>], | ||
) -> PolicyResult { | ||
assert!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why assert over an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was copied from other pipelines. My understanding would be that this should either never or always happen at runtime, since creating the pipeline is crate internal. Thus - i think - the decision was made to just panic. @cataggar - is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct. Triggering this would mean a bug in the SDK code itself and thus we want to fail fast here.
context: Context => context, | ||
} | ||
|
||
pub fn into_future(self) -> ExecuteQuery { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can completely avoid Box pin in this method and also simplify it:
pub async fn into_future(self) -> crate::error::Result<KustoResponseDataSetV2> {
let url = self.client.query_url();
let mut request = self.client.prepare_request(url, http::Method::POST);
add_mandatory_header2(
&ContentType::new("application/json; charset=utf-8"),
&mut request,
)
.unwrap();
add_mandatory_header2(&Accept::new("application/json"), &mut request).unwrap();
//add_mandatory_header2(&AcceptEncoding::new("gzip,deflate"), &mut request).unwrap();
add_optional_header2(&self.client_request_id, &mut request).unwrap();
let body = QueryBody {
db: self.database,
csl: self.query,
};
request.set_body(bytes::Bytes::from(serde_json::to_string(&body)?).into());
let mut context = self.context;
let response = self
.client
.pipeline()
.send(&mut context, &mut request)
.await?;
KustoResponseDataSetV2::try_from(response).await
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to adjust the into_future
method to the latest approach from the cosmos crate and also added some more optional request options to illustrate the usage of the request builder. Also added the implementation for the so far only in nightly IntoFuture
trait to show how the explicit inio_future
call will dissapear once that feature lands in stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still left the box::pin and copying which again I don't think are needed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had one issue in the past, that i think will re-appear if we drop that. Essentially, when used in API we may want to return the future rather then evaluating the future in the API itself, and then turning the response into a future again. Main reason being that if we don't copy the client and box/pin it, the lifetime of the future is directly tied to the lifetime of the client itself, which sometimes significantly limits usability in async contexts. However I am not entirely sure that I fully understand all implications - I think @rylev and @yoshuawuyts did the majority of the designs around this topic.
a related issues (i think):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct that we currently don't need to Box::pin
the futures when not using IntoFuture
, but with std::future::IntoFuture
this becomes necessary (at least for now). This is because std::future::IntoFuture
requires you to name the IntoFuture
associated type. We can't currently do that if we use an anonymous future type which is created when using async/await notation. In the future, with the introduction of impl Trait in associated types we'll be able to drop this requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to understand this. Aside from the Box::pin
, if we do not clone the client and do the async move
, my understanding was we would again run into the situation, where the futures and clients lifetimes would be correlated, and could not have a function that return the future directly for use in downstream functions, but rather have to await the future and then wrap the returned value into a future again. We ran into a situation like this with the list_blobs
function from storage in the delta-rs crate, and I wanted to understand the actual root cause, since we have yet to clean that up in delta.
) | ||
.unwrap(); | ||
add_mandatory_header2(&Accept::new("application/json"), &mut request).unwrap(); | ||
add_mandatory_header2(&AcceptEncoding::new("gzip,deflate"), &mut request).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This header causes the response to get back as gzip, which the azure core library doesn't seem to decypher, so I get an error unless I remove it. Is it the same experience for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm.. when i run it locally, it actually seems to be required, strange. One way to compare would be to record a query using the mock transport framework and compare how the same response behaves on different machines. Do you maybe have a test cluster to run queries against that we can record? I could then replay the test and see if it fails or passes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turns out, now it's the same experience for me :D. SO removed the header for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was related to the core crate not including the gzip
feature of reqwest
, added it as an optional feature so we can use gzip in this crate.
use crate::headers::{self, Header}; | ||
|
||
#[derive(Debug, Clone, Copy, PartialEq, Eq)] | ||
pub struct Accept<'a>(&'a str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use String
or Cow<'static, str>
instead of &'a str
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used String
since this seems more "common" right now. Was almost about to migrate more headers, but that's better left for another PR :)
request: &mut Request, | ||
next: &[Arc<dyn Policy>], | ||
) -> PolicyResult { | ||
assert!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct. Triggering this would mean a bug in the SDK code itself and thus we want to fail fast here.
sdk/data_kusto/src/error.rs
Outdated
use thiserror; | ||
|
||
#[derive(thiserror::Error, Debug)] | ||
pub enum Error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as an FYI: we want to move away from crates have their own error types in favor of just a single error type defined in azure_core. This is fine for now though - we can make that change later once we finish making the change in the cosmos crate.
sdk/data_kusto/src/lib.rs
Outdated
@@ -0,0 +1,9 @@ | |||
#[macro_use] | |||
extern crate serde_derive; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get rid of this and just refer to Serialize
and Deserialize
through full paths: e.g., serde::Deserialize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, much cleaner now. Do I understand correctly that the use of macro_use
could be considered deprecated in recent Rust versions in favour of direct imports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deprecated is perhaps too strong of a word, but it is becoming less and less idiomatic over time, and there is movement in the language team to deprecate this perhaps in the next edition.
sdk/data_kusto/Cargo.toml
Outdated
futures = "0.3" | ||
http = "0.2" | ||
serde = "1" | ||
serde_derive = "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just bring this in through serde
: serde = { version = "1.0", features = ["derive"] }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes :)
#[derive(Debug, Serialize, Deserialize)] | ||
struct QueryBody { | ||
db: String, | ||
csl: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to use short names like this we should document what they need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some docs.
.send(&mut ctx.clone(), &mut request) | ||
.await?; | ||
|
||
<KustoResponseDataSetV2 as TryFrom<HttpResponse>>::try_from(response).await |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yoshuawuyts - I tried using your async-convert
crate here to eventually make this builder generic over the response type. The management and query request builders would likely look more or less the same, and differ only in the response type. I wanted to end up up with something like
response.try_into().await
but I am doing something wrong. Or am completely off track :). Could you give us a hint if this is the right place to use it, and if so, how?
@AsafMah - when writing some tests i took data from the python implementation and ran into some errors. specifically in some example data over there the json response contains the tokens i found the crate python-json-read-adapter which replaces these tokens with 0.0 so it can be parsed, but altering data seems like a bad idea to me. I thought about converting the bytes to string and doing some string manipulation, but that feels like it would yield a significant performance hit. do you have any ideas / thoughts how to best handle that situation? Also, do you know if its just Rust that thinks this is invalid, or doe json parsers in other languages share that opinion? Update: This seems to only related to the test json files in the python kusto repo. Tried it with some queries that return infinity form the service, and there we can parse the response. |
Yeah, I've been through this exact thing - sorry for not updating you. Btw - have you seen my PR? roeap#5 |
Also we need to consider where we want this project to live - |
no i had not. Somehow GitHub failed to notify me :). Looks much better then what we have now - merged it! |
This pretty much answers it for me :). As you mentioned, since the repo is still moving quite fast, there is a question if this should be now or later. But from a technical standpoint we could keep using git dependencies for a while. This would block releasing it to crates.io but my personal experience is that integrating changes in this repo is quite fast and I also believe that the maintainers here are quite willing to do fairly frequent updates / releases. As such this does not need to be the case for long ... |
@roeap |
.filter(|t| matches!(t, ResultTable::DataTable(tbl) if tbl.table_kind == TableKind::PrimaryResult)) | ||
.map(|t| match t { | ||
ResultTable::DataTable(tbl) => tbl, | ||
_ => unreachable!("All other variants are excluded by filter"), | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.filter_map(|table| match table {
ResultTable::DataTable(table) if table.table_kind == TableKind::PrimaryResult => Some(table),
_ => None,
})
sdk/data_kusto/src/arrow.rs
Outdated
const TICK_TO_NANOSECONDS: i64 = 100; | ||
|
||
#[inline] | ||
fn to_nanoseconds(days: i64, hours: i64, minutes: i64, seconds: i64, ticks: i64) -> i64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might have easier time with this using the time or chrono crates
@roeap but basically - I've created a repo detached from this one that follows the conventions of our other kusto SDKs, and has the code from here - Azure/azure-kusto-rust#3 Also there is a roadmap onenote document. We obviously don't want you to implement the whole SDK yourself - so I'd like to discuss which features you'd like to do and what you'd want to leave for us. I'd like to speak to you more, but again - we need a better way to communicate :) |
@AsafMah - I finally read my emails :). |
After talking with the Kusto PG we moved the kusto specific code into a separate repository - https://github.com/Azure/azure-kusto-rust, so I removed the specific code here. I think the changes to header / request properties in the core crate are still relevant as they are used by other azure services as well, so it would be great to still get them merged here. Thanks everyone for the great feedback here! |
azure_data_kusto
crateazure_data_kusto
crate
azure_data_kusto
crateazure_kusto_data
crate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it is just adding support for a few standard headers now.
With Kusto being my favourite cloud analytics engine, i have been working with a rust implementation to interact with the service. Recently I updated the code to conform with what I hope is the current best practices in this repository. Talking with some members of the Kusto product group, they mentioned they would welcome a contribution towards a rust SDK for their service (@adieldar @cosh). MY naive hope is, that is would be OK to contribute the code to this repo, such that it can evolve alongside the best practices developed in this repo.
So far the client only supports executing queries, but i this hope that it could be useful to a wider audience.
Any feedback on whether such a contribution would be welcome is greatly appreciated.
@yoshuawuyts @ctaggart @rylev @thovoll