Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for ClickHouse #377

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

foxy-eyed
Copy link

@foxy-eyed foxy-eyed commented Jan 17, 2022

ClickHouse adapter implementation (resolves #378).

Requires click_house Ruby driver to interact with clickhouse-server via HTTP interface.

Adapter methods:

  • run_statement
  • tables
  • schema
  • preview_statement
  • explain

Readme instructions also added.

@ankane
Copy link
Owner

ankane commented Jan 20, 2022

Hey @foxy-eyed, thanks for the PR! Overall, looks great.

Have you tried clickhouse-activerecord? It looks like they both have ~60k downloads, and I think we could use the SQL adapter if it works.

@foxy-eyed
Copy link
Author

foxy-eyed commented Jan 20, 2022

Hi @ankane, thank you for feedback!

I saw clickhouse_activerecord but I prefer not to choose it)

My understanding is, that ClickHouse is made for (and good for) processing big analytical data, like user activity, events etc into reports. In real-time.
ActiveRecord feels like unnecessary abstraction that does not fit scenarios because:

  • singe events are not as interesting (with all its numerous attributes) as chains of events and aggregated data; we read large number of rows, but only a small subset of columns;
  • in most cases we need some kind of aggregated report => we have to build custom SQL including subexpressions, functions, specific response format etc => dsl does not fit;
  • #create method mostly useless because data is inserted in large batches (otherwise it's inefficient);
  • we don't have relations;
  • usually data is not mutable after is is added to the DB.

Also support of different response format (https://clickhouse.com/docs/en/interfaces/formats/) would be handy, but clickhouse_activerecord processes only JSON. In other cases it returns response body as a string (btw, clickhouse_activerecord also uses HTTP interface under the hood but with Net::Http instead of Faraday).

But if someone already uses clickhouse_activerecord we can take it, right?)

I added detection of driver dependency and created classes for both drivers with common interface, blazer adapter now can work with both. Please take a look, what do you think? Should we keep both?

@ankane
Copy link
Owner

ankane commented Jan 22, 2022

The main benefit of clickhouse-activerecord is we wouldn't need to maintain a separate adapter (assuming it works with SqlAdapter).

@ankane
Copy link
Owner

ankane commented Jan 22, 2022

It looks like queries work with SqlAdapter.

data_sources:
  my_source:
    url: clickhouse://playground:[email protected]:8443/datasets?ssl=true

However, information_schema isn't available until 21.11.2.2-stable (ClickHouse/ClickHouse#28691), so the queries for tables and the schema would need to be updated to support earlier versions.

Also, it looks like neither library typecasts date and time columns, which is needed for charting, and FixedString(N) types aren't showing up properly (SELECT * FROM visits_v1 LIMIT 10 in data source above).

With that additional info, I'm leaning towards clickhouse-activerecord still being a better fit for Blazer.

@foxy-eyed
Copy link
Author

and FixedString(N) types aren't showing up properly (SELECT * FROM visits_v1 LIMIT 10 in data source above).

It looks like dataset issue, not ruby drivers.
I checked the same query with curl, with Tabix and with RubyMine database tool (with official clickhouse jdbc driver) — same result.

$ curl "https://play-api.clickhouse.com:8443/?query=SELECT+ClientIP6+FROM+datasets.visits_v1+LIMIT+3;&user=playground&password=clickhouse&database=datasets"
{=�������bn�|I
{=�������bn�|I
{=�������bn�|I

Also, it looks like neither library typecasts date and time columns, which is needed for charting

My bad, forgot to check charting.
I found out that click_house lib correctly typecasts query result in JSON format if one of Clickhouse::Extend::ConnectionSelective methods was called.
You can check the demo with fix:
https://clickhouse-blazer-demo.herokuapp.com/blazer/queries/1
https://clickhouse-blazer-demo.herokuapp.com/blazer/queries/2

Unfortunately, I couldn't make it work with clickhouse_activerecord.

Now it looks like click_house is better option.

@foxy-eyed
Copy link
Author

Now I am little confused)
I'll try to summarize:

  1. clickhouse_activerecord + sql-adapter:
    + no need to maintain a separate adapter
    - growing complexity of sql-adapter (need to add some 'if's to support clickhouse < 21.11.2.2-stable)
    - issues with date/time columns (I guess we could handle it on our own, but => even more complexity)
    - force to install clickhouse_activerecord for Blazer even if click_house already used in project

  2. click_house + separate clickhouse adapter
    + keep the code separate and simple
    + date/time columns + charts based on them works
    - one more class to maintain
    - force to install click_house for Blazer even if clickhouse_activerecord already used in project

  3. Support both libs + separate clickhouse adapter
    + no need to install extra gem for Blazer (there's a high probability that one of them already installed)
    + separate adapter still keeps the code quite simple
    - one more class to maintain
    +/- ability of charting depends on the choice of lib.

@ankane what do you think?

@ankane
Copy link
Owner

ankane commented Jan 25, 2022

Let me think on this for a bit. I don't really want to support multiple adapters, so it'll likely be 1 or 2.

@foxy-eyed
Copy link
Author

@ankane hi!)
Have you come to any conclusion?

@ankane
Copy link
Owner

ankane commented Feb 6, 2022

Let's go with option 2 since date/time columns already work. A few other notes:

  1. If DateTime objects are converted to Time in the adapter, they'll appear in Blazer.time_zone in the UI
  2. In the schema method, think we can use WHERE database = currentDatabase()
  3. In the config method, let's only remove the starting / from the path for the database, like this adapter
  4. Would be good to support the ssl_verify option in the YAML config

@foxy-eyed
Copy link
Author

@ankane hi!

If DateTime objects are converted to Time in the adapter, they'll appear in Blazer.time_zone in the UI

I'm not sure I got it right, but as I see none of ClickHouse types (Date, DateTime, DateTime64) are converted by click_house driver to ruby Time objects:

  • CH Date parses as Ruby Date
  • CH DateTime, DateTime64 — as Ruby DateTime

DateTime#to_s produces strings like 2022-02-06T01:10:11+00:00 (with original tz offset) in the UI.

All other comments — done ✅
Hope it is ok now)

@ankane
Copy link
Owner

ankane commented Feb 7, 2022

Looks great! For times, we'll want Blazer to display them in Blazer.time_zone to be consistent with other adapters (rather than the original tz offset), but to do that, they'll need to be converted to Time objects (value.to_time) when building rows.

Copy link
Author

@foxy-eyed foxy-eyed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ankane I believe I fixed last thing from your notes)

I was hoping to use custom type casting as described here and overwrite default behaviour (parse DateTime columns as Time). But it can affect main app if it uses ClickHouse for business logic.
That is why I came to a decision to process rows in adapter and apply conversion DateTime#to_time after type casting.

def convert_time_columns(row, date_time_columns)
time_values = row.slice(*date_time_columns).transform_values!(&:to_time)
row.merge(time_values)
rescue NoMethodError
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot be absolutely sure that we handle DateTime object here, because built-in types can be overwritten. If conversion to Time fails, we return initial object.

@foxy-eyed
Copy link
Author

foxy-eyed commented Feb 19, 2022

Hi @ankane, sorry for bothering you. Do you have any other concerns about this PR? Can it be merged?

UPD: after I wrote this I found a bug, sorry. Fixed. Now it looks like everything is ok. Updated demo.

@okdas
Copy link

okdas commented May 18, 2022

Hi @ankane, this adapter would be extremely useful as clickhouse is a very popular column store often utilized as analytics data store. Are there any blockers for this pr? Thanks.

@abuisman
Copy link

I also have a PR open and e-mailed @ankane about this a while back. It seems like for now the priorities are not with this gem, which is perfectly fine of course. So I'd recommend forking Blazer if you want to keep updating it.

The downside is missing out on various PRs, so maybe it is an idea to create a maintained fork and to start pulling in PRs from here if @ankane is no longer able or interested in developing this gem. If there isn't such a fork already of course.

@khasinski
Copy link

khasinski commented Oct 28, 2023

@abuisman I've forked the project and added Clickhouse support from this branch (including a small bugfix which handles the situation when schema doesn't have any tables).

My plan is to selectively merge some PRs from blazer and rewrite a couple components to make it easier extending the project (for example, I'm using bootstrap 5 and stimulus for interactions). It's still a work in progress, but feel free to share your thoughts, here is the fork

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Idea] Support for ClickHouse
5 participants