Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange behavior on saving a dataframe containing = at the start of a cell value #32741

Closed
AMR-KELEG opened this issue Mar 16, 2020 · 1 comment
Labels
Duplicate Report Duplicate issue or pull request IO Excel read_excel, to_excel

Comments

@AMR-KELEG
Copy link

import pandas as pd

index = [0, 1, 2]
text = ['====== 2', '======= abd', 'abc']
df = pd.DataFrame({'index':index, 'text': text})
df.to_excel('temp.xlsx', index=False)
print(df.head())

loaded_df = pd.read_excel('temp.xlsx')
print(loaded_df.head())

Problem description

Excel sheets interpret an = sign at the start of the cell as a signal for writing equations. If a column contains text data where one of the values start with = then the values are corrupted after saving the dataframe to an excel sheet.

Expected Output

   index         text
0      0     ====== 2
1      1  ======= abd
2      2          abc

Output after loading the exported xlsx sheet

   index text
0      0    0
1      1    0
2      2  abc

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 059f9bf
python : 3.6.9.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-88-generic
Version : #88-Ubuntu SMP Tue Feb 11 20:11:34 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : C.UTF-8
LANG : en_US.utf8
LOCALE : en_US.UTF-8

pandas : 1.1.0.dev0+793.g059f9bff6
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.0.0
Cython : 0.29.15
pytest : 5.4.1
hypothesis : 5.6.0
sphinx : 2.4.4
blosc : 1.8.3
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.2
fastparquet : 0.3.3
gcsfs : None
matplotlib : 3.2.0
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.48.0

@WillAyd
Copy link
Member

WillAyd commented Mar 19, 2020

The underlying excel engines by default will write cells starting with an equals sign as a formula. There as another discussion of this in #29095 so closing this as a duplicate of that.

As mentioned in that issue though, would definitely welcome updates to the documentation for this if you have interest in submitting a PR

@WillAyd WillAyd closed this as completed Mar 19, 2020
@WillAyd WillAyd added Duplicate Report Duplicate issue or pull request IO Excel read_excel, to_excel labels Mar 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request IO Excel read_excel, to_excel
Projects
None yet
Development

No branches or pull requests

2 participants