pip install -r requirements.txt
Install Playwright headless browser
playwright install
Run the following commands to install necessary library if needed
sudo apt-get install libatk1.0-0 libatk-bridge2.0-0 libcups2 libatspi2.0-0 libxcomposite1 libxdamage1 libxfixes3 libxrandr2 libgbm1 libxkbcommon0 libpango-1.0-0 libcairo2 libasound2
python main.py
python scrape_by_topic.py
python convert_parquet.py
python upload_hf.py