Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reqest properties for azure_kusto_data crate #685

Merged
merged 29 commits into from
Apr 27, 2022
Merged

Conversation

roeap
Copy link
Contributor

@roeap roeap commented Mar 12, 2022

With Kusto being my favourite cloud analytics engine, i have been working with a rust implementation to interact with the service. Recently I updated the code to conform with what I hope is the current best practices in this repository. Talking with some members of the Kusto product group, they mentioned they would welcome a contribution towards a rust SDK for their service (@adieldar @cosh). MY naive hope is, that is would be OK to contribute the code to this repo, such that it can evolve alongside the best practices developed in this repo.

So far the client only supports executing queries, but i this hope that it could be useful to a wider audience.

Any feedback on whether such a contribution would be welcome is greatly appreciated.

@yoshuawuyts @ctaggart @rylev @thovoll

@AsafMah
Copy link
Contributor

AsafMah commented Mar 14, 2022

In general, with the naming convention in other kusto sdks the crate should be called "azure-kusto-data"

@cataggar
Copy link
Member

cataggar commented Mar 14, 2022

Hey @AsafMah, we followed https://azure.github.io/azure-sdk/general_design.html#namespaces when naming the other crates. In the guidelines it says data is for "Dealing with structured data stores like databases". We followed those guidelines for naming both:

  • azure_data_cosmos
  • azure_data_tables

I think that the azure_data_kusto matches the guidelines and our other crates the best. Please ping me on Teams if you want to discuss. Good to have you on here.

Copy link
Contributor

@AsafMah AsafMah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments from going over the code

&mut request,
)
.unwrap();
add_mandatory_header2(&Accept::new("application/json"), &mut request).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for all the unwraps in this file?
Using ? seems to compile without error, and of course won't trigger a panic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an artifact from migration from where i developed the code ... the plan is to remove all unwraps throughout the code before merging!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to new header handling and removed the panics.

sdk/data_kusto/src/operations/query.rs Outdated Show resolved Hide resolved
use http::request::Builder;

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct Accept<'a>(&'a str);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file seems a bit sus,
Accept and AcceptEncoding are really common headers, it's a bit weird that we need to define them like that in the kusto package specfically.
It seems like something that should be defined in azure_core.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

completely agree, this should be moved to core! Having it here to get things to work initially!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved the request options to core crate

AUTHORITY_ID_NAME => authority_id = Some(v),
MSI_PARAMS_NAME => msi_params = Some(v),
FEDERATED_SECURITY_NAME => match v {
"true" => federated_security = Some(true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DRY - you're doing this exact comparison for multiple props

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved boolean parsing into its own reusable function.

Comment on lines 213 to 214
match k {
DATA_SOURCE_NAME => data_source = Some(v),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing connection strings are a bit more complex than that - the names aren't case sensitive, and some of them have a aliases.
Take a look at https://github.com/Azure/azure-kusto-node/blob/master/azure-kusto-data/source/connectionBuilder.ts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a basic function to support aliases for keys based on the link. Need to look into the implementation there a bit more, but i think we need to normalize a bit more via lower casing input strings.

request: &mut Request,
next: &[Arc<dyn Policy>],
) -> PolicyResult {
assert!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why assert over an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was copied from other pipelines. My understanding would be that this should either never or always happen at runtime, since creating the pipeline is crate internal. Thus - i think - the decision was made to just panic. @cataggar - is that correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct. Triggering this would mean a bug in the SDK code itself and thus we want to fail fast here.

context: Context => context,
}

pub fn into_future(self) -> ExecuteQuery {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can completely avoid Box pin in this method and also simplify it:

pub async fn into_future(self) -> crate::error::Result<KustoResponseDataSetV2> {
        let url = self.client.query_url();
        let mut request = self.client.prepare_request(url, http::Method::POST);

        add_mandatory_header2(
            &ContentType::new("application/json; charset=utf-8"),
            &mut request,
        )
            .unwrap();
        add_mandatory_header2(&Accept::new("application/json"), &mut request).unwrap();
        //add_mandatory_header2(&AcceptEncoding::new("gzip,deflate"), &mut request).unwrap();
        add_optional_header2(&self.client_request_id, &mut request).unwrap();

        let body = QueryBody {
            db: self.database,
            csl: self.query,
        };
        request.set_body(bytes::Bytes::from(serde_json::to_string(&body)?).into());

        let mut context = self.context;

        let response = self
            .client
            .pipeline()
            .send(&mut context, &mut request)
            .await?;

        KustoResponseDataSetV2::try_from(response).await
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to adjust the into_future method to the latest approach from the cosmos crate and also added some more optional request options to illustrate the usage of the request builder. Also added the implementation for the so far only in nightly IntoFuture trait to show how the explicit inio_future call will dissapear once that feature lands in stable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still left the box::pin and copying which again I don't think are needed here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had one issue in the past, that i think will re-appear if we drop that. Essentially, when used in API we may want to return the future rather then evaluating the future in the API itself, and then turning the response into a future again. Main reason being that if we don't copy the client and box/pin it, the lifetime of the future is directly tied to the lifetime of the client itself, which sometimes significantly limits usability in async contexts. However I am not entirely sure that I fully understand all implications - I think @rylev and @yoshuawuyts did the majority of the designs around this topic.

a related issues (i think):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct that we currently don't need to Box::pin the futures when not using IntoFuture, but with std::future::IntoFuture this becomes necessary (at least for now). This is because std::future::IntoFuture requires you to name the IntoFuture associated type. We can't currently do that if we use an anonymous future type which is created when using async/await notation. In the future, with the introduction of impl Trait in associated types we'll be able to drop this requirement.

Copy link
Contributor Author

@roeap roeap Mar 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand this. Aside from the Box::pin, if we do not clone the client and do the async move, my understanding was we would again run into the situation, where the futures and clients lifetimes would be correlated, and could not have a function that return the future directly for use in downstream functions, but rather have to await the future and then wrap the returned value into a future again. We ran into a situation like this with the list_blobs function from storage in the delta-rs crate, and I wanted to understand the actual root cause, since we have yet to clean that up in delta.

)
.unwrap();
add_mandatory_header2(&Accept::new("application/json"), &mut request).unwrap();
add_mandatory_header2(&AcceptEncoding::new("gzip,deflate"), &mut request).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header causes the response to get back as gzip, which the azure core library doesn't seem to decypher, so I get an error unless I remove it. Is it the same experience for you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm.. when i run it locally, it actually seems to be required, strange. One way to compare would be to record a query using the mock transport framework and compare how the same response behaves on different machines. Do you maybe have a test cluster to run queries against that we can record? I could then replay the test and see if it fails or passes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

turns out, now it's the same experience for me :D. SO removed the header for now.

Copy link
Contributor Author

@roeap roeap Mar 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was related to the core crate not including the gzip feature of reqwest, added it as an optional feature so we can use gzip in this crate.

use crate::headers::{self, Header};

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct Accept<'a>(&'a str);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use String or Cow<'static, str> instead of &'a str?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used String since this seems more "common" right now. Was almost about to migrate more headers, but that's better left for another PR :)

sdk/core/src/request_options/accept.rs Outdated Show resolved Hide resolved
request: &mut Request,
next: &[Arc<dyn Policy>],
) -> PolicyResult {
assert!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct. Triggering this would mean a bug in the SDK code itself and thus we want to fail fast here.

sdk/data_kusto/src/connection_string.rs Outdated Show resolved Hide resolved
sdk/data_kusto/src/connection_string.rs Outdated Show resolved Hide resolved
sdk/data_kusto/src/connection_string.rs Outdated Show resolved Hide resolved
use thiserror;

#[derive(thiserror::Error, Debug)]
pub enum Error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as an FYI: we want to move away from crates have their own error types in favor of just a single error type defined in azure_core. This is fine for now though - we can make that change later once we finish making the change in the cosmos crate.

@@ -0,0 +1,9 @@
#[macro_use]
extern crate serde_derive;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of this and just refer to Serialize and Deserialize through full paths: e.g., serde::Deserialize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, much cleaner now. Do I understand correctly that the use of macro_use could be considered deprecated in recent Rust versions in favour of direct imports?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecated is perhaps too strong of a word, but it is becoming less and less idiomatic over time, and there is movement in the language team to deprecate this perhaps in the next edition.

futures = "0.3"
http = "0.2"
serde = "1"
serde_derive = "1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just bring this in through serde: serde = { version = "1.0", features = ["derive"] }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes :)

#[derive(Debug, Serialize, Deserialize)]
struct QueryBody {
db: String,
csl: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to use short names like this we should document what they need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some docs.

.send(&mut ctx.clone(), &mut request)
.await?;

<KustoResponseDataSetV2 as TryFrom<HttpResponse>>::try_from(response).await
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yoshuawuyts - I tried using your async-convert crate here to eventually make this builder generic over the response type. The management and query request builders would likely look more or less the same, and differ only in the response type. I wanted to end up up with something like

response.try_into().await

but I am doing something wrong. Or am completely off track :). Could you give us a hint if this is the right place to use it, and if so, how?

@roeap
Copy link
Contributor Author

roeap commented Mar 26, 2022

@AsafMah - when writing some tests i took data from the python implementation and ran into some errors. specifically in some example data over there the json response contains the tokens Infinity / -Infinity which serde_json considers to be invalid and thus panics on parsing. it seems that the python package is the only one from go / java / node / python where this scenario is covered, so not sure how the other SDKs behave in that case.

i found the crate python-json-read-adapter which replaces these tokens with 0.0 so it can be parsed, but altering data seems like a bad idea to me. I thought about converting the bytes to string and doing some string manipulation, but that feels like it would yield a significant performance hit.

do you have any ideas / thoughts how to best handle that situation? Also, do you know if its just Rust that thinks this is invalid, or doe json parsers in other languages share that opinion?

Update: This seems to only related to the test json files in the python kusto repo. Tried it with some queries that return infinity form the service, and there we can parse the response.

@AsafMah
Copy link
Contributor

AsafMah commented Mar 27, 2022

Yeah, I've been through this exact thing - sorry for not updating you.

Btw - have you seen my PR? roeap#5

@AsafMah
Copy link
Contributor

AsafMah commented Mar 27, 2022

Also we need to consider where we want this project to live -
From our team's end, we want it to be in its own repo like the other sdks,
But for now it seems very rooted here, and putting it in a different repo would mean waiting until changes will be inserted into core, which may take a while.

@roeap
Copy link
Contributor Author

roeap commented Mar 27, 2022

Btw - have you seen my PR? roeap#5

no i had not. Somehow GitHub failed to notify me :). Looks much better then what we have now - merged it!

@roeap
Copy link
Contributor Author

roeap commented Mar 27, 2022

From our team's end, we want it to be in its own repo like the other sdks

This pretty much answers it for me :).

As you mentioned, since the repo is still moving quite fast, there is a question if this should be now or later. But from a technical standpoint we could keep using git dependencies for a while. This would block releasing it to crates.io but my personal experience is that integrating changes in this repo is quite fast and I also believe that the maintainers here are quite willing to do fairly frequent updates / releases. As such this does not need to be the case for long ...

@AsafMah
Copy link
Contributor

AsafMah commented Mar 28, 2022

@roeap
Hey, sorry for "abusing" this platform, but it seems like you don't get my emails, so I'll write to you here.
Maybe I got your address wrong - I'm in [email protected] .
How do you prefer I contact you? Thanks!

Comment on lines 133 to 137
.filter(|t| matches!(t, ResultTable::DataTable(tbl) if tbl.table_kind == TableKind::PrimaryResult))
.map(|t| match t {
ResultTable::DataTable(tbl) => tbl,
_ => unreachable!("All other variants are excluded by filter"),
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

        .filter_map(|table| match table {
            ResultTable::DataTable(table) if table.table_kind == TableKind::PrimaryResult => Some(table),
            _ => None,
        })

const TICK_TO_NANOSECONDS: i64 = 100;

#[inline]
fn to_nanoseconds(days: i64, hours: i64, minutes: i64, seconds: i64, ticks: i64) -> i64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might have easier time with this using the time or chrono crates

@AsafMah
Copy link
Contributor

AsafMah commented Mar 28, 2022

@roeap but basically - I've created a repo detached from this one that follows the conventions of our other kusto SDKs, and has the code from here - Azure/azure-kusto-rust#3

Also there is a roadmap onenote document.

We obviously don't want you to implement the whole SDK yourself - so I'd like to discuss which features you'd like to do and what you'd want to leave for us.

I'd like to speak to you more, but again - we need a better way to communicate :)

@roeap
Copy link
Contributor Author

roeap commented Mar 28, 2022

@AsafMah - I finally read my emails :).

@roeap
Copy link
Contributor Author

roeap commented Apr 23, 2022

After talking with the Kusto PG we moved the kusto specific code into a separate repository - https://github.com/Azure/azure-kusto-rust, so I removed the specific code here. I think the changes to header / request properties in the core crate are still relevant as they are used by other azure services as well, so it would be great to still get them merged here.

Thanks everyone for the great feedback here!

@cataggar @AsafMah

@roeap roeap marked this pull request as ready for review April 23, 2022 11:42
@roeap roeap changed the title Add azure_data_kusto crate Add reqest properties for azure_data_kusto crate Apr 23, 2022
@roeap roeap changed the title Add reqest properties for azure_data_kusto crate Add reqest properties for azure_kusto_data crate Apr 23, 2022
Copy link
Member

@cataggar cataggar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it is just adding support for a few standard headers now.

@cataggar cataggar requested a review from rylev April 25, 2022 23:10
@cataggar cataggar merged commit a0ef5e5 into Azure:main Apr 27, 2022
@roeap roeap deleted the kusto branch April 27, 2022 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants