scraping

Star

Here are 6,216 public repositories matching this topic...

scrapy / scrapy

Star

Scrapy, a fast high-level web crawling & scraping framework for Python.

python crawler framework scraping crawling web-scraping hacktoberfest web-scraping-python

Updated Nov 19, 2024
Python

gocolly / colly

Star

Elegant Scraper and Crawler Framework for Golang

go golang crawler scraper framework spider scraping crawling

Updated Jul 30, 2024
Go

AIHawk-FOSS / Auto_Jobs_Applier_AI_Agent

Star

Auto_Jobs_Applier_AI_Agent by AIHawk is an AI Agent that automates the jobs application process. Utilizing artificial intelligence, it enables users to apply for multiple jobs in an automated and personalized way.

Updated Nov 22, 2024
Python

mendableai / firecrawl

Star

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

markdown crawler data scraper ai html-to-markdown web-crawler scraping webscraping rag llm ai-scraping

Updated Nov 21, 2024
TypeScript

ScrapeGraphAI / Scrapegraph-ai

Sponsor

Star

Python scraper based on AI

machine-learning ai scraping webscraping sc automated-scraper scraping-python gpt-3 gpt-4 llm scrapingweb llama3

Updated Nov 22, 2024
Python

apify / crawlee

Star

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

nodejs javascript npm crawler scraper automation typescript web-crawler headless scraping crawling web-scraping web-crawling headless-chrome apify puppeteer playwright

Updated Nov 22, 2024
TypeScript

psf / requests-html

Sponsor

Star

Pythonic HTML Parsing for Humans™

python html http scraping requests kennethreitz beautifulsoup lxml css-selectors pyquery

Updated Apr 16, 2024
Python

code4craft / webmagic

Star

A scalable web crawler framework for Java.

java crawler framework scraping

Updated Oct 25, 2024
Java

ultrafunkamsterdam / undetected-chromedriver

Star

Custom Selenium Chromedriver | Zero-Config | Passes ALL bot mitigation systems (like Distil / Imperva/ Datadadome / CloudFlare IUAM)

testing chrome automation webdriver browser captcha scraping selenium navigator python3 cloudflare chromedriver anti-bot bot-detection cloudflare-bypass distil anti-detection

Updated Jun 25, 2024
Python

tabulapdf / tabula

Star

Tabula is a tool for liberating data tables trapped inside PDF files

pdf csv excel scraping tables

Updated Sep 23, 2024
CSS

lorien / awesome-web-scraping

Star

List of libraries, tools and APIs for web scraping and data processing.

crawler spider scraping crawling web-scraping captcha-recaptcha webscraping crawling-framework scraping-framework captcha-bypass scraping-tool crawling-tool scraping-python crawling-python

Updated Oct 27, 2024
Makefile

alirezamika / autoscraper

Sponsor

Star

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

python crawler machine-learning scraper automation ai scraping artificial-intelligence web-scraping scrape webscraping webautomation

Updated Oct 12, 2024
Python

MontFerret / ferret

Star

Declarative web scraping

go cli golang crawler chrome data-mining scraper library tool dsl scraping crawling query-language scraping-websites hacktoberfest cdp

Updated Nov 20, 2024
Go

yujiosaka / headless-chrome-crawler

Sponsor

Star

Distributed crawler powered by Headless Chrome

jquery crawler chrome scraper promise scraping crawling chromium headless-chrome puppeteer

Updated Apr 29, 2023
JavaScript

apify / crawlee-python

Star

Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless mode. With proxy rotation.

python crawler scraper automation web-crawler headless scraping crawling pip web-scraping beautifulsoup web-crawling hacktoberfest headless-chrome apify playwright

Updated Nov 22, 2024
Python

sparklemotion / mechanize

Star

Mechanize is a ruby library that makes automated web interaction easy.

ruby web scraping

Updated Oct 2, 2024
Ruby

khuyentran1401 / Data-science

Sponsor

Star

Collection of useful data science topics along with articles, videos, and code

python data-science machine-learning natural-language-processing time-series scraping data-visualization artificial-intelligence data-analysis articles

Updated Oct 12, 2024
Jupyter Notebook

fake-useragent / fake-useragent

Star

Up-to-date simple useragent faker with real world database

python agent user-agent scraping fake faker python3 user useragent user-agent-spoofer useragent-scraper

Updated Nov 20, 2024
Python

adbar / trafilatura

Star

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

Updated Nov 22, 2024
Python

aapatre / Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE

Star

Do you want to LEARN NEW STUFF for FREE? Don't worry, with the power of web-scraping and automation, this script will find the necessary Udemy coupons & enroll you for PAID UDEMY COURSES, ABSOLUTELY FREE!

python scraper scraping selenium python3

Updated May 10, 2024
Python

Improve this page

Add a description, image, and links to the scraping topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the scraping topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scraping

Here are 6,216 public repositories matching this topic...

scrapy / scrapy

gocolly / colly

AIHawk-FOSS / Auto_Jobs_Applier_AI_Agent

mendableai / firecrawl

ScrapeGraphAI / Scrapegraph-ai

apify / crawlee

psf / requests-html

code4craft / webmagic

ultrafunkamsterdam / undetected-chromedriver

tabulapdf / tabula

lorien / awesome-web-scraping

alirezamika / autoscraper

MontFerret / ferret

yujiosaka / headless-chrome-crawler

apify / crawlee-python

sparklemotion / mechanize

khuyentran1401 / Data-science

fake-useragent / fake-useragent

adbar / trafilatura

aapatre / Automatic-Udemy-Course-Enroller-GET-PAID-UDEMY-COURSES-for-FREE

Improve this page

Add this topic to your repo