Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for group by Decimal numbers #210

Closed
alamb opened this issue Apr 27, 2021 · 4 comments
Closed

Add support for group by Decimal numbers #210

alamb opened this issue Apr 27, 2021 · 4 comments
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Apr 27, 2021

In the context of #107 from @joshuataylor

This will likely require apache/arrow-rs#230 (support for pretty-printing decimal numbers)

Describe the solution you'd like
It should be possible to group by data in columns of DataType::Decimal type

Right now you get an error such as:

(Internal("Unsupported GROUP BY type creating key Decimal(9, 0)"))

** Reproducer **
In the datafusion-cli:

CREATE EXTERNAL TABLE something STORED AS PARQUET LOCATION 'demo.parquet';
select O_ORDERKEY from something group by O_ORDERKEY;

Where demo.parquet is here: https://drive.google.com/file/d/1aCW7SW2rUVioSePduhgo_91F5-xDMyjp/view?usp=sharing

(note the file is large, so I am not sure how long this query will take)

@alamb alamb added the enhancement New feature or request label Apr 27, 2021
@liukun4515
Copy link
Contributor

@alamb
this issue can be closed

❯ \d food
+---------------+--------------+------------+-------------+-----------------+-------------+
| table_catalog | table_schema | table_name | column_name | data_type       | is_nullable |
+---------------+--------------+------------+-------------+-----------------+-------------+
| datafusion    | public       | food       | a           | Decimal(10, 5)  | NO          |
| datafusion    | public       | food       | b           | Decimal(20, 15) | NO          |
| datafusion    | public       | food       | c           | Boolean         | NO          |
+---------------+--------------+------------+-------------+-----------------+-------------+
3 rows in set. Query took 0.010 seconds.
❯ select count(*),a from food group by a;
+-----------------+---------+
| COUNT(UInt8(1)) | a       |
+-----------------+---------+
| 3               | 0.00003 |
| 1               | 0.00001 |
| 4               | 0.00004 |
| 5               | 0.00005 |
| 2               | 0.00002 |
+-----------------+---------+

@houqp
Copy link
Member

houqp commented Jan 27, 2022

good work @liukun4515 🎉

@houqp houqp closed this as completed Jan 27, 2022
@liukun4515
Copy link
Contributor

liukun4515 commented Jan 27, 2022

thanks for @alamb and @houqp pushing this feature forward.
There are many tasks about decimal or data type to do.
For example:

  1. move some operation or logic to arrow-rs、kernel
  2. improve the performance of decimal operation.

@alamb
Copy link
Contributor Author

alamb commented Jan 27, 2022

Thanks for all your help @liukun4515 and pushing it through. Really nicely done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants