[BUG] StructuredDataset file_format
becomes an empty str through dataclass attribute access
#6096
Open
2 tasks done
Describe the bug
In this case, a workflow runs with an input dataclass, which contains a
StructuredDataset
attribute. Following shows a simple definition:When we run the workflow remotely, we observe that the
file_format
field becomes an empty string, as illustrated in the following screenshot:Initial Thoughts
We think that msgpack serialization doesn't process
file_format
properly, because thefile_format
is an empty string right afterinputs.pb
is loaded asinput_proto
:If I've not misunderstood it,
\240
(0xA0
in hex) is a fixstr with a length of zero, which meansfile_format
is an empty string.Expected behavior
file_format
ofStructuredDataset
should keep the original input value (i.e.,"parquet"
in this case).Additional context to reproduce
Run the following script to trigger the remote run of the workflow:
Screenshots
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: