[Spark] Honor codegen configs in DataSkippingStatsTracker #4120

LukasRupprecht · 2025-02-03T21:26:53Z

Which Delta project/connector is this regarding?

Description

This PR uses MutableProjection.create instead of directly generating the MutableProjection via GeneratMutableProjection.generate. As the latter forces codegen, it can fail if the generated code is too large, which, e.g., can happen for very wide schemas. The new code uses the correct Spark API to automatically fall back to an InterpretedMutableProjection if codegen fails for any reason.

How was this patch tested?

This PR just uses the correct, higher-level API call to create a MutableProjection so existing tests are sufficient.

Does this PR introduce any user-facing changes?

No

scovich

Nice stability fix!

scovich · 2025-02-04T16:18:44Z

spark/src/main/scala/org/apache/spark/sql/delta/stats/DataSkippingStatsTracker.scala

+    val expressions = JoinedProjection.bind(
+      aggBufferAttrs, dataCols, aggregates.flatMap(_.updateExpressions))


Tiny nit -- while we're cleaning up the nesting, it may be worth cleaning up all the way, for maximum readability?

Suggested change

val expressions = JoinedProjection.bind(

aggBufferAttrs, dataCols, aggregates.flatMap(_.updateExpressions))

val aggs = aggregates.flatMap(_.updateExpressions)

val expressions = JoinedProjection.bind(aggBufferAttrs, dataCols, aggs)

makes DataSkippingStatsTracker aware of codegen configs

a9c9add

LukasRupprecht force-pushed the stats-codegen-fix branch from 3818b7e to a9c9add Compare February 4, 2025 15:39

scovich approved these changes Feb 4, 2025

View reviewed changes

addresses comment

0b93912

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spark] Honor codegen configs in DataSkippingStatsTracker #4120

[Spark] Honor codegen configs in DataSkippingStatsTracker #4120

LukasRupprecht commented Feb 3, 2025

scovich left a comment

scovich Feb 4, 2025

LukasRupprecht Feb 4, 2025

		val expressions = JoinedProjection.bind(
		aggBufferAttrs, dataCols, aggregates.flatMap(_.updateExpressions))

[Spark] Honor codegen configs in DataSkippingStatsTracker #4120

Are you sure you want to change the base?

[Spark] Honor codegen configs in DataSkippingStatsTracker #4120

Conversation

LukasRupprecht commented Feb 3, 2025

Which Delta project/connector is this regarding?

Description

How was this patch tested?

Does this PR introduce any user-facing changes?

scovich left a comment

Choose a reason for hiding this comment

scovich Feb 4, 2025

Choose a reason for hiding this comment

LukasRupprecht Feb 4, 2025

Choose a reason for hiding this comment