Skip to content

Latest commit

 

History

History
177 lines (164 loc) · 7.05 KB

README.md

File metadata and controls

177 lines (164 loc) · 7.05 KB

redisJSONPythonProductCatalog

A simple product catalog solution based on icecat files

Initial project setup

Get this github code

get clone https://github.com/jphaugla/redisPythonProductCatalog.git

Two options for setting the environment are given:

  • run with docker-compose using a flask and redis container
  • installing for mac os
  • running on linux (probably in the cloud)

docker-compose is much easier and is main method documented here

docker compose startup

docker-compose up -d --build

Code and File discussion

This is an implementation of a product Catalog using data download from icecat redis-stack

Download the datafiles to the data subdirectory

  • To download the datafiles, a free login id from icecat is required. WARNNING these data files are large especially prodid_d.txt (over 4GB).
  • Once effectively registered to ICECAT, need to retrieve these files using the registered username and password. The quotes are needed. Also gunzip the files.
mkdir data
cd data
curl -u 'yourUN':'yourPW' https://data.Icecat.biz/export/freexml/refs/CategoriesList.xml.gz -o CategoriesList.xml.gz
gunzip CategoriesList.xml.gz
mkdir index
cd index
curl -u 'yourUN':'yourPW' https://data.Icecat.biz/export/freexml/files.index.csv.gz -o files.index.csv.gz
gunzip files.index.csv.gz
cd ..
mkdir prodid
cd prodid
curl -u 'yourUN':'yourPW' https://data.Icecat.biz/prodid/prodid_d.txt.gz -o prodid_d.txt.gz
gunzip prodid_d.txt.gz

Set environment

The docker compose file has the environment variables set for the redis connection and the location of the data files. This code uses redisjson and redisearch. The redis database must have both of these modules installed. [Redis stack}(https://redis.com/blog/introducing-redis-stack/) makes this easy to work with. docker-compose is set to redis-stack

load categories

The redis node and port can be changed. The python code uses 2 environment variable REDIS_SERVER and REDIS_PORT. The default is REDIS_SERVER=redis and REDIS_PORT=6379

docker exec -it flask bash -c "python categoryImport.py"

load Product Title

This can take quite a long time (maybe 35 minutes) and consumes quite a bit of space. The title lookup in the product index load can be disabled by setting the DO_TITLE environment variable to false. It is possible to speed the product load by splitting the file. The python code to load the products uses python multi-processing based on the number of files found in the data/prodid directory. To facilitate this splitting of the file while keeping the header row on each of the split files, a script is provided to split the file into chunks. This will create the separate files in the prodid directory and move the original prodid_d.txt file up to the data directory. Set the PROCESSES environment variable depending on power of client machine. Can adjust the splitProdid.sh script to make a number of files that is a multiple of the PROCESSES parameter. These is a script steps to do this:

cd scripts
./splitProdid.sh

The redis node and port can be changed. The python code uses 2 environment variable REDIS_SERVER and REDIS_PORT.
The default is REDIS_SERVER=redis and REDIS_PORT=6379. See the docker-compose.yml file.

docker exec -it flask bash -c "python productTitleImport.py"

Can observe the load progress by watching the load for each file

docker exec -it redis redis-cli hgetall prod_title_load:prodid_d.txt.00063.csv

load Products

This can take quite a long time (maybe 35 minutes). It is possible to speed the product load by splitting the file. The python code to load the products uses python multi-processing based on the number of files found in the data/index directory. To facilitate this splitting of the file while keeping the header row on each of the split files, a script is provided to split the file until 100,000 row chunks. This will create the separate files in the index directory and move the original files.index.csv file up to the data directory. Set the PROCESSES environment variable depending on power of client machine. Can adjust the splitFile.sh script to make a number of files that is a multiple of the PROCESSES parameter. These is a script steps to do this:

cd scripts
./splitFile.sh

The redis node and port can be changed. The python code uses 2 environment variable REDIS_SERVER and REDIS_PORT.
The default is REDIS_SERVER=redis and REDIS_PORT=6379. See the docker-compose.yml file.

docker exec -it flask bash -c "python productImport.py"

Can observe the load progress by watching the load for each file

docker exec -it redis redis-cli hgetall prod_load:files.index.00004.csv
  • THIS IS HOW to start flask app server
  • However, it is already running as part of the flask container
docker exec -it flask bash -c "python appy.py"
redic-cli -f scripts/searchQueries.txt

Notes for running outside of Docker

Follow most of the same steps as above with some changes

Instead of docker to execute, use python virtualenv

  • create a virtualenv
cd src
python3 -m venv venv
source venv/bin/activate
  • Use an environment file for locations
  • Need to make sure the data location variables are set correctly
  • Can also set the number of concurrent processes for the client using the "PROCESSES" environment variable
source scripts/app.env
  • execute python scripts from the src directory
cd src
pip install -r requirements.txt
python categoryImport.py
python productTitleImport.py
python productImport.py
python app.py

installing on mac

  1. install xcode
  2. install homebrew
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
  1. verify homebrew
brew doctor
  1. install python
brew install python
  1. install redis-py
pip install redis
  1. install flask
pip install flask
  1. clone repository
git clone https://github.com/jphaugla/redisPythonProductCatalog.git
  1. install redis
brew install redis
  1. start redis redis-server /usr/local/etc/redis.conf