Crawlab Go SDK

Crawlab Go SDK supports Golang-based spiders integration with Crawlab. It contains a number of APIs including saving crawled items into different data sources including MongoDB, MySQL, Postgres, ElasticSearch and Kafka.

Basic Usage

package main

import (
	"github.com/crawlab-team/crawlab-go-sdk"
)

func main() {
	item := make(map[string]interface{})
	item["url"] = "http://example.com"
	item["title"] = "hello world"
	_ = crawlab.SaveItem(item)
}

Example Using Colly

package main

import (
	"fmt"
	"github.com/crawlab-team/crawlab-go-sdk"

	"github.com/gocolly/colly/v2"
)

func main() {
	// Instantiate default collector
	c := colly.NewCollector(
		// Visit only domains: quotes.toscrape.com
		colly.AllowedDomains("quotes.toscrape.com"),
	)

	// On every a element which has href attribute call callback
	crawlab.CollyOnHTMLMany(c, "a[href]", func(e *colly.HTMLElement) []map[string]any {
		return []map[string]any{
			{
				"text": e.Text,
				"link": e.Attr("href"),
			},
		}
	})

	// Before making a request print "Visiting ..."
	c.OnRequest(func(r *colly.Request) {
		fmt.Println("Visiting", r.URL.String())
	})

	// Start scraping on https://quotes.toscrape.com
	c.Visit("https://quotes.toscrape.com")
}

Name	Name	Last commit message	Last commit date
Latest commit Marvin Zhang chore: updated examples Jan 6, 2025 15dcd96 · Jan 6, 2025 History 13 Commits
.github/workflows	.github/workflows	ci: updated workflow	Jan 2, 2025
_examples	_examples	chore: updated examples	Jan 6, 2025
constants	constants	refactor: removed unused files, updated Go module dependencies, and a…	Jan 2, 2025
entity	entity	ci: updated workflow	Jan 2, 2025
.gitignore	.gitignore	added go sdk	Aug 11, 2020
LICENSE	LICENSE	ci: updated workflow	Jan 2, 2025
README.md	README.md	chore: updated examples	Jan 6, 2025
colly.go	colly.go	feat: add Colly integration with unit tests for HTML extraction	Jan 6, 2025
colly_test.go	colly_test.go	feat: add Colly integration with unit tests for HTML extraction	Jan 6, 2025
go.mod	go.mod	feat: add Colly integration with unit tests for HTML extraction	Jan 6, 2025
go.sum	go.sum	feat: add Colly integration with unit tests for HTML extraction	Jan 6, 2025
utils.go	utils.go	ci: updated workflow	Jan 2, 2025
utils_test.go	utils_test.go	refactor: removed unused files, updated Go module dependencies, and a…	Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crawlab Go SDK

Basic Usage

Example Using Colly

About

Releases 6

Packages

Languages

License

crawlab-team/crawlab-go-sdk

Folders and files

Latest commit

History

Repository files navigation

Crawlab Go SDK

Basic Usage

Example Using Colly

About

Resources

License

Stars

Watchers

Forks

Releases 6

Packages 0

Languages

Packages