Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-47628][SQL] Fix Postgres bit array issue 'Cannot cast to boolean'
### What changes were proposed in this pull request? This PR fixes the below error when reading the bit array from Postgres. ``` [info] Cause: org.postgresql.util.PSQLException: Cannot cast to boolean: "10101" [info] at org.postgresql.jdbc.BooleanTypeUtil.cannotCoerceException(BooleanTypeUtil.java:99) [info] at org.postgresql.jdbc.BooleanTypeUtil.fromString(BooleanTypeUtil.java:67) [info] at org.postgresql.jdbc.ArrayDecoding$7.parseValue(ArrayDecoding.java:267) [info] at org.postgresql.jdbc.ArrayDecoding$AbstractObjectStringArrayDecoder.populateFromString(ArrayDecoding.java:128) [info] at org.postgresql.jdbc.ArrayDecoding.readStringArray(ArrayDecoding.java:763) [info] at org.postgresql.jdbc.PgArray.buildArray(PgArray.java:320) [info] at org.postgresql.jdbc.PgArray.getArrayImpl(PgArray.java:179) [info] at org.postgresql.jdbc.PgArray.getArray(PgArray.java:116) [info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$25(JdbcUtils.scala:548) [info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.nullSafeConvert(JdbcUtils.scala:561) [info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$24(JdbcUtils.scala:548) [info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$24$adapted(JdbcUtils.scala:545) [info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:365) [info] at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:346) [info] at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) [info] at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) ``` The issue is caused by both an upstream limitation and an improper mapping on our side. The issue of Postges' own is that it does not distinguish bit(1) and bit(n>1) arrays and gets them both as boolean arrays, which causes a cast error on our task execution side. The issue of our own is similar. We map both bit(1)[] and bit(n>1)[] as `ArrayType(BinaryType)`. It is exactly the opposite of Postgres' behaviour. This PR fixes the mapping and makes a special getter for bit(n>1)[] values to fix both of the problems ### Why are the changes needed? bugfix ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new tests ### Was this patch authored or co-authored using generative AI tooling? no Closes #45751 from yaooqinn/SPARK-47628. Authored-by: Kent Yao <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
- Loading branch information