-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(automl): fix uploading pandas dataframe to AutoML Tables #9647
Conversation
pandas.dataframe.to_csv() by default exports data index as an column with empty column name. This causes uploading the export csv file to fail because AutoML Tables does not allow empty column names. Given that the data index is not useful for training the model. This PR fixes the problem by setting the index argument to false so that the index is not exported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch! @tswast Can you evaluate the correctness here, please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It appears the index=False
option for to_csv
has been in pandas for a long time.
P.S. What's the reason we require 0.24 and above?
google-cloud-python/automl/setup.py
Line 29 in 5cbfa26
"pandas": ["pandas>=0.24.0"], |
If you want to be consistent with BigQuery, we use google-cloud-python/bigquery/setup.py Line 44 in cab728b
Unless you count dropping Python 2 support in 0.25, pandas 0.17 was the version with the most breaking changes in recent history, so something after that will get most customers. |
pandas.dataframe.to_csv() by default exports data index as an column with empty column name. This causes uploading the export csv file to fail because AutoML Tables does not allow empty column names. Given that the data index is not useful for training the model. This PR fixes the problem by setting the index argument to false so that the index is not exported.
pandas.dataframe.to_csv() by default exports data index as an column
with empty column name. This causes uploading the export csv file to
fail because AutoML Tables does not allow empty column names. Given
that the data index is not useful for training the model. This PR
fixes the problem by setting the index argument to false so that the
index is not exported.
Fixes #9483