LEAQ is the scraper. Currently, is has 2.1M companies with websites, and 2.6M legal entities (next will mention it orgs)
Backend has API, and option to download all 2.1M companies in single csv
file. Backend services adds and index companies 24/7 with queues and cron jobs for repetitive tasks. All workers and services are stateless and scraping scales horizontally
- Companies export constructor
- Add new company by URL. Option to confirm website ownership by metatag and edit your company info
- Billing has free and paid plans
- Social sign in
First you need to run some infrastructure services:
- MongoDB;
- STAN;
- Redis;
- S3 compatible object storage. MinIO, DO Spaces, AWS S3, etc;
- (Optionally) Kubernetes;
Any service accept config via environment variables. Go services has a config at config/config.go
. Wappalyzer config at config/env.js
. Web config at nuxt.config.js
Discover repositories here
You can build code from source or use Docker images. Also, each service has K8s manifest