Strange behavior on saving a dataframe containing = at the start of a cell value #32741

AMR-KELEG · 2020-03-16T00:33:01Z

import pandas as pd

index = [0, 1, 2]
text = ['====== 2', '======= abd', 'abc']
df = pd.DataFrame({'index':index, 'text': text})
df.to_excel('temp.xlsx', index=False)
print(df.head())

loaded_df = pd.read_excel('temp.xlsx')
print(loaded_df.head())

Problem description

Excel sheets interpret an = sign at the start of the cell as a signal for writing equations. If a column contains text data where one of the values start with = then the values are corrupted after saving the dataframe to an excel sheet.

Expected Output

   index         text
0      0     ====== 2
1      1  ======= abd
2      2          abc

Output after loading the exported xlsx sheet

   index text
0      0    0
1      1    0
2      2  abc

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : 059f9bf
python : 3.6.9.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-88-generic
Version : #88-Ubuntu SMP Tue Feb 11 20:11:34 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : C.UTF-8
LANG : en_US.utf8
LOCALE : en_US.UTF-8

pandas : 1.1.0.dev0+793.g059f9bff6
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 46.0.0
Cython : 0.29.15
pytest : 5.4.1
hypothesis : 5.6.0
sphinx : 2.4.4
blosc : 1.8.3
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.2
fastparquet : 0.3.3
gcsfs : None
matplotlib : 3.2.0
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.48.0

The text was updated successfully, but these errors were encountered:

WillAyd · 2020-03-19T19:16:02Z

The underlying excel engines by default will write cells starting with an equals sign as a formula. There as another discussion of this in #29095 so closing this as a duplicate of that.

As mentioned in that issue though, would definitely welcome updates to the documentation for this if you have interest in submitting a PR

WillAyd closed this as completed Mar 19, 2020

WillAyd added Duplicate Report Duplicate issue or pull request IO Excel read_excel, to_excel labels Mar 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange behavior on saving a dataframe containing = at the start of a cell value #32741

Strange behavior on saving a dataframe containing = at the start of a cell value #32741

AMR-KELEG commented Mar 16, 2020

INSTALLED VERSIONS

WillAyd commented Mar 19, 2020

Strange behavior on saving a dataframe containing = at the start of a cell value #32741

Strange behavior on saving a dataframe containing = at the start of a cell value #32741

Comments

AMR-KELEG commented Mar 16, 2020

Problem description

Expected Output

Output after loading the exported xlsx sheet

Output of pd.show_versions()

INSTALLED VERSIONS

WillAyd commented Mar 19, 2020

Output of `pd.show_versions()`