Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming query does not work in release mode. #6

Open
zliang-min opened this issue Feb 14, 2024 · 5 comments
Open

Streaming query does not work in release mode. #6

zliang-min opened this issue Feb 14, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@zliang-min
Copy link
Collaborator

Describe what's wrong

Proton has two sets of HTTP APIs, one is running on port 8123, this is the same as clickhouse. And the other is running on port 3218 which supports streaming queries. Simply put, the same query SELECT * FROM some_stream behaves differently on these two APIs. On port 8123 it returns immediately, but on port 3128, it will keep running and returnning fresh data.

This isssue is about running streaming query on port 3128 with this library. It won't work in release mode (e.g. using cargo run --release). The reason why it worked for non-release mode is because of this:

[dev-dependencies]
clickhouse = { version = "0.11", features = ["test-util"] }

with the test-util, the clickhouse crate does not use any compression by default (otherwise, it uses lz4), this is the tricky part. And with lz4, streaming query does not work. So, in order to make streaming query work, we need to disable compression when do streaming query:

let client = Client::default()....;
/// do non-streaming query with the `client` ...

/// when you want to do streaming query, disable compression
client.with_compression(Compression::None);
client.query("<... my streaming query ...>").fetch ...

How to reproduce

Error message and/or stacktrace

Additional context

@zliang-min zliang-min added the bug Something isn't working label Feb 14, 2024
@marvin-hansen marvin-hansen self-assigned this Feb 14, 2024
@marvin-hansen
Copy link
Contributor

@zliang-min

I've had a look at LZ4 in ClickHouse and found an extensive blogpost about compression.

https://clickhouse.com/blog/lz4-compression-in-clickhouse

From the post:

"Our conclusion is thus that it's always preferable to store data in a compressed format. Always write data to disk in compressed format. Transmit data over the network with compression, as well. "

Apparently, they only added the no compression option for a very specific corner case.

The bigger question is:

Why does streaming query on Port 3128 does not work with compression?

Doesn't Proton compresses outgoing streaming data by default or is it related to how the client decompresses the incoming data?

In the first case, something need to be worked out in Proton whereas in the second case I have to talk to the folks maintaining the underlying ClickHouse client.

For the time being, I simply patch in a new constructor for streaming that sets the port correctly and disables compression.

This is a workaround until the root cause of the issue has been identified and hopefully resolved.

@zliang-min
Copy link
Collaborator Author

Why does streaming query on Port 3128 does not work with compression?

The key difference here is, for streaming query, the returned result is a unbounded stream, and this is totally a new thing introduced by Proton, it does not exists in ClickHouse. Compressing / de-compressing a stream is quite different from compressing / de-compressing the whole response ( for non-streaming queries ). There are quite some details need to be designed on how we want the compression should be implemented.

Let's put the complexity aside for now, once implement the compression on streaming, we need to implement the de-compression logic on the client side as well. And this unlikely will be part of the clickhouse crate, because, again, this is not a clickhouse feature, but a Proton one. So this will be implemented in this crate I believe.

But yes, for now, we need to work around the issue first. I think a new constructor probably is not the best solution, because it could be troublesome to construct different clients for different use cases. IMO, maybe keeping two clickhouse::Client inside ProtonClient is a better way. Something like this:

    pub fn new(url: &str) -> Self {
        let client = Client::default().with_url(url.to_string());
        let client_without_compression = client.clone().with_compression(clickhouse::Compression::None);
        let url = url.to_string();
        Self { client, client_without_compression, url }
    }

Then we introduce a function fetch_streaming for streaming queries:

    pub async fn fetch_streaming<T: Row>(&self, query: &str) -> Result<RowCursor<T>> {
        match self.client_without_compression.query(query).fetch::<T>() {
            Ok(cursor) => Ok(cursor),
            Err(e) => Err(ProtonClientError::FetchFailed(e.to_string())),
        }
    }

@marvin-hansen
Copy link
Contributor

marvin-hansen commented Feb 14, 2024 via email

@jovezhong
Copy link
Contributor

Thank you both for looking into the details and refine the proton rust client

@marvin-hansen
Copy link
Contributor

@zliang-min

PR #7 resolves this issue. When @jovezhong has merged both, PR #7 and PR #5 , a new version 0.1.1. should be available on crates.io within a few minutes after the release bot completed. Please test this patched version to verify if the issue has been resolved.

If you encounter any problem, please report back here. Otherwise, feel free to close the issue.

@jovezhong I think its best to push back the blog post to next week so that I have a bit more time to write some meaningful code examples and update the post accordingly. While the current workaround might be imperfect, at least it should resolve this issue for the time being.

Details for better and more future proof solution can be discussed in Slack. I will take another look once I am back in Office on Monday,

jovezhong added a commit that referenced this issue Feb 14, 2024
* Updated Readme.

Signed-off-by: Marvin Hansen <[email protected]>

* Enabled relase-plz

Signed-off-by: Marvin Hansen <[email protected]>

* Added client without compression to resolve issue with handling streaming data in release mode.

Added new method fetch_stream to ,well, fetch data from a continuous query of a stream.

Signed-off-by: Marvin Hansen <[email protected]>

* Fixed Docstring test in fetch_stream.

Signed-off-by: Marvin Hansen <[email protected]>

* Update src/lib/stream.rs

Co-authored-by: Gimi Liang <[email protected]>

* Update src/lib/mod.rs

Co-authored-by: Gimi Liang <[email protected]>

---------

Signed-off-by: Marvin Hansen <[email protected]>
Co-authored-by: Jove Zhong <[email protected]>
Co-authored-by: Gimi Liang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants