Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept Type Widening RFC #4094

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 125 additions & 1 deletion PROTOCOL.md
Original file line number Diff line number Diff line change
Expand Up @@ -1387,6 +1387,130 @@ Furthermore, when attempting timestamp-based time travel where table state must
1. If `timestamp X` >= `delta.inCommitTimestampEnablementTimestamp`, only table versions >= `delta.inCommitTimestampEnablementVersion` should be considered for the query.
2. Otherwise, only table versions less than `delta.inCommitTimestampEnablementVersion` should be considered for the query.

# Type Widening

The Type Widening feature enables changing the type of a column or field in an existing Delta table to a wider type.

The supported type changes are:
- Integer widening:
- `Byte` -> `Short` -> `Int` -> `Long`
- Floating-point widening:
- `Float` -> `Double`
- `Byte`, `Short` or `Int` -> `Double`
- Date widening:
- `Date` -> `Timestamp without timezone`
- Decimal widening - `p` and `s` denote the decimal precision and scale respectively.
- `Decimal(p, s)` -> `Decimal(p + k1, s + k2)` where `k1 >= k2 >= 0`.
- `Byte`, `Short` or `Int` -> `Decimal(10 + k1, k2)` where `k1 >= k2 >= 0`.
- `Long` -> `Decimal(20 + k1, k2)` where `k1 >= k2 >= 0`.

To support this feature:
- The table must be on Reader version 3 and Writer Version 7.
- The feature `typeWidening` must exist in the table `protocol`'s `readerFeatures` and `writerFeatures`, either during its creation or at a later stage.

When supported:
- A table may have a metadata property `delta.enableTypeWidening` in the Delta schema set to `true`. Writers must reject widening type changes when this property isn't set to `true`.
- The `metadata` for a column or field in the table schema may contain the key `delta.typeChanges` storing a history of type changes for that column or field.

### Type Change Metadata

Type changes applied to a table are recorded in the table schema and stored in the `metadata` of their nearest ancestor [StructField](#struct-field) using the key `delta.typeChanges`.
The value for the key `delta.typeChanges` must be a JSON list of objects, where each object contains the following fields:
Field Name | optional/required | Description
-|-|-
`fromType`| required | The type of the column or field before the type change.
`toType`| required | The type of the column or field after the type change.
`fieldPath`| optional | When updating the type of a map key/value or array element only: the path from the struct field holding the metadata to the map key/value or array element that was updated.

The `fieldPath` value is "key", "value" and "element" when updating resp. the type of a map key, map value and array element.
The `fieldPath` value for nested maps and nested arrays are prefixed by their parents's path, separated by dots.

The following is an example for the definition of a column that went through two type changes:
```json
{
"name" : "e",
"type" : "long",
"nullable" : true,
"metadata" : {
"delta.typeChanges": [
{
"fromType": "short",
"toType": "integer"
},
{
"fromType": "integer",
"toType": "long"
}
]
}
}
```

The following is an example for the definition of a column after changing the type of a map key:
```json
{
"name" : "e",
"type" : {
"type": "map",
"keyType": "double",
"valueType": "integer",
"valueContainsNull": true
},
"nullable" : true,
"metadata" : {
"delta.typeChanges": [
{
"fromType": "float",
"toType": "double",
"fieldPath": "key"
}
]
}
}
```

The following is an example for the definition of a column after changing the type of a map value nested in an array:
```json
{
"name" : "e",
"type" : {
"type": "array",
"elementType": {
"type": "map",
"keyType": "string",
"valueType": "decimal(10, 4)",
"valueContainsNull": true
},
"containsNull": true
},
"nullable" : true,
"metadata" : {
"delta.typeChanges": [
{
"fromType": "decimal(6, 2)",
"toType": "decimal(10, 4)",
"fieldPath": "element.value"
}
]
}
}
```

## Writer Requirements for Type Widening

When Type Widening is supported (when the `writerFeatures` field of a table's `protocol` action contains `typeWidening`), then:
- Writers must reject applying any unsupported type change.
- Writers must record type change information in the `metadata` of the nearest ancestor [StructField](#struct-field). See [Type Change Metadata](#type-change-metadata).
- Writers must preserve the `delta.typeChanges` field in the metadata fields in the schema when the table schema is updated.
- Writers may remove the `delta.typeChanges` metadata in the table schema if all data files use the same column and field types as the table schema.

When Type Widening is enabled (when the table property `delta.enableTypeWidening` is set to `true`), then:
- Writers should allow updating the table schema to apply a supported type change to a column, struct field, map key/value or array element.

## Reader Requirements for Type Widening
When Type Widening is supported (when the `readerFeatures` field of a table's `protocol` action contains `typeWidening`), then:
- Readers must allow reading data files written before the table underwent any supported type change, and must convert such values to the current, wider type.
- Readers must validate that they support all type changes in the `delta.typeChanges` field in the table schema for the table version they are reading and fail when finding any unsupported type change.

# Requirements for Writers
This section documents additional requirements that writers must follow in order to preserve some of the higher level guarantees that Delta provides.
Expand Down Expand Up @@ -1974,7 +2098,7 @@ delta.columnMapping.*| These keys are used to store information about the mappin
delta.identity.*| These keys are for defining identity columns. See [Identity Columns](#identity-columns) for details.
delta.invariants| JSON string contains SQL expression information. See [Column Invariants](#column-invariants) for details.
delta.generationExpression| SQL expression string. See [Generated Columns](#generated-columns) for details.

delta.typeChanges| JSON string containing information about previous type changes applied to this column. See [Type Change Metadata](#type-change-metadata) for details.

### Example

Expand Down
2 changes: 1 addition & 1 deletion protocol_rfcs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ Here is the history of all the RFCs propose/accepted/rejected since Feb 6, 2024,

| Date proposed | RFC file | Github issue | RFC title |
|:--------------|:---------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------|:---------------------------------------|
| 2023-02-09 | [type-widening.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/type-widening.md) | https://github.com/delta-io/delta/issues/2623 | Type Widening |
| 2023-02-14 | [managed-commits.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/managed-commits.md) | https://github.com/delta-io/delta/issues/2598 | Managed Commits |
| 2023-02-26 | [column-mapping-usage.tracking.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/column-mapping-usage-tracking.md) | https://github.com/delta-io/delta/issues/2682 | Column Mapping Usage Tracking |
| 2023-04-24 | [variant-type.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/variant-type.md) | https://github.com/delta-io/delta/issues/2864 | Variant Data Type |
Expand All @@ -30,6 +29,7 @@ Here is the history of all the RFCs propose/accepted/rejected since Feb 6, 2024,
|:-|:-|:-|:-|:-|
| 2023-02-28 | 2023-03-26 |[vacuum-protocol-check.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/vacuum-protocol-check.md)| https://github.com/delta-io/delta/issues/2630 | Enforce Vacuum Protocol Check |
| 2023-02-02 | 2023-07-24 |[in-commit-timestamps.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/in-commit-timestamps.md) | https://github.com/delta-io/delta/issues/2532 | In-Commit Timestamps |
| 2023-02-09 | 2025-01-28 |[type-widening.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/type-widening.md) | https://github.com/delta-io/delta/issues/2623 | Type Widening |

### Rejected RFCs

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ The following is an example for the definition of a column after changing the ty
{
"fromType": "decimal(6, 2)",
"toType": "decimal(10, 4)",
"fieldPath": "element.key"
"fieldPath": "element.value"
}
]
}
Expand All @@ -117,7 +117,7 @@ The following is an example for the definition of a column after changing the ty

## Writer Requirements for Type Widening

When Type Widening is supported (when the `writerFeatures` field of a table's `protocol` action contains `enableTypeWidening`), then:
When Type Widening is supported (when the `writerFeatures` field of a table's `protocol` action contains `typeWidening`), then:
- Writers must reject applying any unsupported type change.
- Writers must record type change information in the `metadata` of the nearest ancestor [StructField](#struct-field). See [Type Change Metadata](#type-change-metadata).
- Writers must preserve the `delta.typeChanges` field in the metadata fields in the schema when the table schema is updated.
Expand All @@ -127,7 +127,7 @@ When Type Widening is enabled (when the table property `delta.enableTypeWidening
- Writers should allow updating the table schema to apply a supported type change to a column, struct field, map key/value or array element.

## Reader Requirements for Type Widening
When Type Widening is supported (when the `readerFeatures` field of a table's `protocol` action contains `enableTypeWidening`), then:
When Type Widening is supported (when the `readerFeatures` field of a table's `protocol` action contains `typeWidening`), then:
- Readers must allow reading data files written before the table underwent any supported type change, and must convert such values to the current, wider type.
- Readers must validate that they support all type changes in the `delta.typeChanges` field in the table schema for the table version they are reading and fail when finding any unsupported type change.

Expand Down
Loading