Skip to content

polydbms/sheetreader-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SheetReader Python Bindings

SheetReader allows to read your Excel spreadsheet files (.xlsx) blazingly fast. This repository contains the Python bindings, as the core library is implemented in C++.

Quickstart

Sheetreader is available through:

pip install pysheetreader

After successful installation, spreadsheets can be loaded:

import pysheetreader as sr
sheet = sr.read_xlsx("my_favorite_sheet.xlsx")

To convert a spreadsheet into a pandas Dataframe:

import pysheetreader as sr
import pandas as pd
sheet = sr.read_xlsx("my_favorite_sheet.xlsx")
df = pd.DataFrame.from_dict(sheet[0])

Parameters:

Parameter Type Description Default
path string The path of the .xlsx file to parse. -
sheet integer or string The sheet of the file to parse, can be either the index (starting at 1) or the name. 1
headers boolean Whether to interpret the first parsed row as headers. True
skip_rows integer How many rows to skip before parsing data. 0
skip_columns integer How many columns to skip before parsing data. 0
num_threads integer How many threads to use for parsing. Use -1 for automatic threading. -1
col_types dict or list How to interpret parsed data, either by names (dict) or by position (list). Types: numeric, text, logical, date, skip, guess. None

Build Instructions

First install the submodules, which contain the sheetreader-core dependency with:

git clone --recurse-submodules https://github.com/polydbms/sheetreader-python.git

To build from source, this repository provides a pyproject.toml. The SheetReader wheel file can be generated through:

python -m build .

or installed with pip through:

pip install .

More resources

SheetReader is part of the PolyDB Project. We also provide bindings/extensions for several other environments:

Paper

SheetReader was published in the Information Systems Journal. Cite as:

@article{DBLP:journals/is/GavriilidisHZM23,
  author       = {Haralampos Gavriilidis and
                  Felix Henze and
                  Eleni Tzirita Zacharatou and
                  Volker Markl},
  title        = {SheetReader: Efficient Specialized Spreadsheet Parsing},
  journal      = {Inf. Syst.},
  volume       = {115},
  pages        = {102183},
  year         = {2023},
  url          = {https://doi.org/10.1016/j.is.2023.102183},
  doi          = {10.1016/J.IS.2023.102183},
  timestamp    = {Mon, 26 Jun 2023 20:54:32 +0200},
  biburl       = {https://dblp.org/rec/journals/is/GavriilidisHZM23.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published