Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support struct array in pretty display #579

Merged
merged 5 commits into from
Jul 23, 2021
Merged

Conversation

houqp
Copy link
Member

@houqp houqp commented Jul 21, 2021

Which issue does this PR close?

support struct array in pretty print

Rationale for this change

support display of nested fields in downstream projects that displays record batches in console.

What changes are included in this PR?

code to format struct arrays in pretty print
print null value as null string to be consistent with postgresql

Are there any user-facing changes?

better pretty print

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jul 21, 2021
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to add some test coverage to this code so can avoid introducing regressions (aka breaking your code in the future)?

@houqp
Copy link
Member Author

houqp commented Jul 21, 2021

Yep, I will add some tests. I was thinking about this for the arrow2 migration as well ;)

@houqp houqp requested a review from alamb July 22, 2021 05:02
@codecov-commenter
Copy link

codecov-commenter commented Jul 22, 2021

Codecov Report

Merging #579 (da79ba9) into master (68a4b5b) will decrease coverage by 0.05%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #579      +/-   ##
==========================================
- Coverage   82.46%   82.41%   -0.06%     
==========================================
  Files         167      167              
  Lines       46205    46232      +27     
==========================================
- Hits        38101    38100       -1     
- Misses       8104     8132      +28     
Impacted Files Coverage Δ
arrow/src/util/display.rs 24.03% <0.00%> (-8.43%) ⬇️
parquet/src/encodings/encoding.rs 94.85% <0.00%> (-0.20%) ⬇️
arrow/src/ffi.rs 87.07% <0.00%> (ø)
arrow/src/compute/kernels/sort.rs 94.15% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 68a4b5b...da79ba9. Read the comment docs.

@houqp
Copy link
Member Author

houqp commented Jul 22, 2021

@alamb added struct test, also changed null display to be consistent with postgres and mysql.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@houqp

THe struct tests look good 👍 thank you

also changed null display to be consistent with postgres / mysql / sqlite.

I am not sure the string"null" is consistent with postgres. Here is what happens for me with psql:

alamb=# create table bar(x int);
CREATE TABLE
alamb=# insert into bar values (1);
INSERT 0 1
alamb=# insert into bar values (null);
INSERT 0 1
alamb=# select * from bar;
 x 
---
 1
  
(2 rows)

Sqlite also uses an empty string:

(arrow_dev) alamb@MacBook-Pro:~/Software/influxdb_iox2$ sqlite3 
SQLite version 3.32.3 2020-06-18 14:16:19
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> create table bar(x int);
sqlite> insert into bar values (1);
sqlite> insert into bar values (null);
sqlite> select * from bar;
1

sqlite> 

mysql seems to use the string "NULL" rather than "null":

mysql> use foo;
Database changed
mysql> create table bar(x int);
Query OK, 0 rows affected (0.00 sec)

mysql> insert into bar values (1);
Query OK, 1 row affected (0.01 sec)

mysql> insert into bar values (null);
Query OK, 1 row affected (0.01 sec)

mysql> select * from bar;
+------+
| x    |
+------+
|    1 |
| NULL |
+------+
2 rows in set (0.00 sec)

r#"+-------------------------------------+----+"#,
r#"| c1 | c2 |"#,
r#"+-------------------------------------+----+"#,
r#"| {"c11": 1, "c12": {"c121": "e"}} | a |"#,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is cool -- it is almost like JSON

@houqp
Copy link
Member Author

houqp commented Jul 23, 2021

I am not sure the string"null" is consistent with postgres. Here is what happens for me with psql:

Good call, I was using a different postgres client to test the display. I have reverted that commit and updated the code to handle null only within struct display code.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @houqp -- thanks!

@alamb alamb merged commit 87d2840 into apache:master Jul 23, 2021
r#"| c1 | c2 |"#,
r#"+-------------------------------------+----+"#,
r#"| {"c11": 1, "c12": {"c121": "e"}} | a |"#,
r#"| {"c11": null, "c12": {"c121": "f"}} | b |"#,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could have been useful to have a struct with a validity tested, to make sure we do not miss that case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will send a follow up PR

r#"| c1 | c2 |"#,
r#"+-------------------------------------+----+"#,
r#"| {"c11": 1, "c12": {"c121": "e"}} | a |"#,
r#"| {"c11": null, "c12": {"c121": "f"}} | b |"#,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have a null in the fields' values but not in all other types?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion on this. I went with this way because I feel like it looks more clear than "c11": ,. There is also the other option of skipping the column with null value entirely in the display. Let me know if you feel strongly on one way or the other, happy to send a PR to change the behavior as well.

@houqp houqp deleted the qp_struct branch July 25, 2021 05:10
alamb pushed a commit that referenced this pull request Jul 25, 2021
* support struct array in pretty display

* print null value as null

to be consistent with postgresql

* add test for struct array pretty print

* Revert "print null value as null"

This reverts commit 3d620f6.

* handle null value as sepcial case in struct display
alamb added a commit that referenced this pull request Jul 26, 2021
* support struct array in pretty display

* print null value as null

to be consistent with postgresql

* add test for struct array pretty print

* Revert "print null value as null"

This reverts commit 3d620f6.

* handle null value as sepcial case in struct display

Co-authored-by: QP Hou <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants