A simple product catalog solution based on icecat files
Get this github code
get clone https://github.com/jphaugla/redisPythonProductCatalog.git
Two options for setting the environment are given:
- run with docker-compose using a flask and redis container
- installing for mac os
- running on linux (probably in the cloud)
docker-compose is much easier and is main method documented here
docker-compose up -d --build
This is an implementation of a product Catalog using data download from icecat redis-stack
- To download the datafiles, a free login id from icecat is required. WARNNING these data files are large especially prodid_d.txt (over 4GB).
- Once effectively registered to ICECAT, need to retrieve these files using the registered username and password. The quotes are needed. Also gunzip the files.
mkdir data
cd data
curl -u 'yourUN':'yourPW' https://data.Icecat.biz/export/freexml/refs/CategoriesList.xml.gz -o CategoriesList.xml.gz
gunzip CategoriesList.xml.gz
mkdir index
cd index
curl -u 'yourUN':'yourPW' https://data.Icecat.biz/export/freexml/files.index.csv.gz -o files.index.csv.gz
gunzip files.index.csv.gz
cd ..
mkdir prodid
cd prodid
curl -u 'yourUN':'yourPW' https://data.Icecat.biz/prodid/prodid_d.txt.gz -o prodid_d.txt.gz
gunzip prodid_d.txt.gz
The docker compose file has the environment variables set for the redis connection and the location of the data files. This code uses redisjson and redisearch. The redis database must have both of these modules installed. [Redis stack}(https://redis.com/blog/introducing-redis-stack/) makes this easy to work with. docker-compose is set to redis-stack
The redis node and port can be changed. The python code uses 2 environment variable REDIS_SERVER and REDIS_PORT. The default is REDIS_SERVER=redis and REDIS_PORT=6379
docker exec -it flask bash -c "python categoryImport.py"
This can take quite a long time (maybe 35 minutes) and consumes quite a bit of space. The title lookup in the product index load can be disabled by setting the DO_TITLE environment variable to false. It is possible to speed the product load by splitting the file. The python code to load the products uses python multi-processing based on the number of files found in the data/prodid directory. To facilitate this splitting of the file while keeping the header row on each of the split files, a script is provided to split the file into chunks. This will create the separate files in the prodid directory and move the original prodid_d.txt file up to the data directory. Set the PROCESSES environment variable depending on power of client machine. Can adjust the splitProdid.sh script to make a number of files that is a multiple of the PROCESSES parameter. These is a script steps to do this:
cd scripts
./splitProdid.sh
The redis node and port can be changed. The python code uses 2 environment variable REDIS_SERVER and REDIS_PORT.
The default is REDIS_SERVER=redis and REDIS_PORT=6379. See the docker-compose.yml file.
docker exec -it flask bash -c "python productTitleImport.py"
Can observe the load progress by watching the load for each file
docker exec -it redis redis-cli hgetall prod_title_load:prodid_d.txt.00063.csv
This can take quite a long time (maybe 35 minutes). It is possible to speed the product load by splitting the file. The python code to load the products uses python multi-processing based on the number of files found in the data/index directory. To facilitate this splitting of the file while keeping the header row on each of the split files, a script is provided to split the file until 100,000 row chunks. This will create the separate files in the index directory and move the original files.index.csv file up to the data directory. Set the PROCESSES environment variable depending on power of client machine. Can adjust the splitFile.sh script to make a number of files that is a multiple of the PROCESSES parameter. These is a script steps to do this:
cd scripts
./splitFile.sh
The redis node and port can be changed. The python code uses 2 environment variable REDIS_SERVER and REDIS_PORT.
The default is REDIS_SERVER=redis and REDIS_PORT=6379. See the docker-compose.yml file.
docker exec -it flask bash -c "python productImport.py"
Can observe the load progress by watching the load for each file
docker exec -it redis redis-cli hgetall prod_load:files.index.00004.csv
- THIS IS HOW to start flask app server
- However, it is already running as part of the flask container
docker exec -it flask bash -c "python appy.py"
-
run API tests Easiest is to run the API tests using Postman. Running Postman, use File->Import to import the following file for use with Postman-https://github.com/jphaugla/redisJSONProductCatalog/blob/main/scripts/Product-Category%20APIs.postman_collection.json Once the collection is imported, run each request to test the APIs. Alternatively, use the commands in this file https://github.com/jphaugla/redisJSONProductCatalog/blob/main/scripts/sampleput.sh Make sure to use bash as zsh has issues with the curl command Note: there are multiple API tests in the file but only one should be run at a time So, the tests not to be run should be commented out.
-
run sample search queries
run sample redisearch queries as provided. Run one at a time using
redic-cli -f scripts/searchQueries.txt
Follow most of the same steps as above with some changes
- create a virtualenv
cd src
python3 -m venv venv
source venv/bin/activate
- Use an environment file for locations
- Need to make sure the data location variables are set correctly
- Can also set the number of concurrent processes for the client using the "PROCESSES" environment variable
source scripts/app.env
- execute python scripts from the src directory
cd src
pip install -r requirements.txt
python categoryImport.py
python productTitleImport.py
python productImport.py
python app.py
- install xcode
- install homebrew
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
- verify homebrew
brew doctor
- install python
brew install python
- install redis-py
pip install redis
- install flask
pip install flask
- clone repository
git clone https://github.com/jphaugla/redisPythonProductCatalog.git
- install redis
brew install redis
- start redis redis-server /usr/local/etc/redis.conf