Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(automl): fix uploading pandas dataframe to AutoML Tables #9647

Merged
merged 1 commit into from
Nov 12, 2019

Conversation

helinwang
Copy link
Contributor

pandas.dataframe.to_csv() by default exports data index as an column
with empty column name. This causes uploading the export csv file to
fail because AutoML Tables does not allow empty column names. Given
that the data index is not useful for training the model. This PR
fixes the problem by setting the index argument to false so that the
index is not exported.

Fixes #9483

@helinwang helinwang requested a review from busunkim96 as a code owner November 8, 2019 19:33
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Nov 8, 2019
pandas.dataframe.to_csv() by default exports data index as an column
with empty column name. This causes uploading the export csv file to
fail because AutoML Tables does not allow empty column names. Given
that the data index is not useful for training the model. This PR
fixes the problem by setting the index argument to false so that the
index is not exported.
@helinwang helinwang changed the title fix(automl): Fix uploading pandas dataframe to AutoML Tables. fix(automl): fix uploading pandas dataframe to AutoML Tables Nov 8, 2019
@busunkim96 busunkim96 requested a review from sirtorry November 8, 2019 21:21
@tseaver tseaver added the api: automl Issues related to the AutoML API. label Nov 11, 2019
Copy link
Contributor

@tseaver tseaver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch! @tswast Can you evaluate the correctness here, please?

Copy link
Contributor

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

It appears the index=False option for to_csv has been in pandas for a long time.

P.S. What's the reason we require 0.24 and above?

"pandas": ["pandas>=0.24.0"],

@helinwang
Copy link
Contributor Author

Thanks, @tswast . I am not sure as I just become the maintainer for AutoML Tables' Python SDK.
@TrucHLe @lwander any specific reasons for pandas>=0.24.0?
@tswast do you have any suggestion for pandas' version? If there is no specific reasons for pandas>=0.24.0, I am happy to send a PR to change it.

@helinwang helinwang merged commit 8fdd2a4 into googleapis:master Nov 12, 2019
@tswast
Copy link
Contributor

tswast commented Nov 12, 2019

If you want to be consistent with BigQuery, we use 0.17.1.

"pandas": ["pandas>=0.17.1"],

Unless you count dropping Python 2 support in 0.25, pandas 0.17 was the version with the most breaking changes in recent history, so something after that will get most customers.

@helinwang
Copy link
Contributor Author

Thanks @tswast , sent #9824 to change it to 0.17.1 as well.

parthea pushed a commit that referenced this pull request Oct 21, 2023
pandas.dataframe.to_csv() by default exports data index as an column
with empty column name. This causes uploading the export csv file to
fail because AutoML Tables does not allow empty column names. Given
that the data index is not useful for training the model. This PR
fixes the problem by setting the index argument to false so that the
index is not exported.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: automl Issues related to the AutoML API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AutoML: Tables client importing data with 'pandas_dataframe' fails.
4 participants