SheetReader allows to read your Excel spreadsheet files (.xlsx) blazingly fast. This repository contains the Python bindings, as the core library is implemented in C++.
Sheetreader is available through:
pip install pysheetreader
After successful installation, spreadsheets can be loaded:
import pysheetreader as sr
sheet = sr.read_xlsx("my_favorite_sheet.xlsx")
To convert a spreadsheet into a pandas
Dataframe:
import pysheetreader as sr
import pandas as pd
sheet = sr.read_xlsx("my_favorite_sheet.xlsx")
df = pd.DataFrame.from_dict(sheet[0])
Parameter | Type | Description | Default |
---|---|---|---|
path |
string |
The path of the .xlsx file to parse. |
- |
sheet |
integer or string |
The sheet of the file to parse, can be either the index (starting at 1) or the name. | 1 |
headers |
boolean |
Whether to interpret the first parsed row as headers. | True |
skip_rows |
integer |
How many rows to skip before parsing data. | 0 |
skip_columns |
integer |
How many columns to skip before parsing data. | 0 |
num_threads |
integer |
How many threads to use for parsing. Use -1 for automatic threading. |
-1 |
col_types |
dict or list |
How to interpret parsed data, either by names (dict ) or by position (list ). Types: numeric , text , logical , date , skip , guess . |
None |
First install the submodules, which contain the sheetreader-core
dependency with:
git clone --recurse-submodules https://github.com/polydbms/sheetreader-python.git
To build from source, this repository provides a pyproject.toml
.
The SheetReader wheel file can be generated through:
python -m build .
or installed with pip through:
pip install .
SheetReader is part of the PolyDB Project. We also provide bindings/extensions for several other environments:
- R language: Load spreadsheets into dataframes, also available via CRAN.
- PostgreSQL FDW: Foreign data wrapper for PostgreSQL; allows to register spreadsheets as foreign tables.
- DuckDB Extension: Extension for DuckDB that allows loading spreadsheets into tables. Also available as a community extension.
SheetReader was published in the Information Systems Journal. Cite as:
@article{DBLP:journals/is/GavriilidisHZM23,
author = {Haralampos Gavriilidis and
Felix Henze and
Eleni Tzirita Zacharatou and
Volker Markl},
title = {SheetReader: Efficient Specialized Spreadsheet Parsing},
journal = {Inf. Syst.},
volume = {115},
pages = {102183},
year = {2023},
url = {https://doi.org/10.1016/j.is.2023.102183},
doi = {10.1016/J.IS.2023.102183},
timestamp = {Mon, 26 Jun 2023 20:54:32 +0200},
biburl = {https://dblp.org/rec/journals/is/GavriilidisHZM23.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}