Skip to content
This repository has been archived by the owner on Aug 16, 2021. It is now read-only.
/ gann Public archive

[[ ARCHIVED ]] gann(go-approximate-nearest-neighbor) is a library for Approximate Nearest Neighbor Search written in Go

License

Notifications You must be signed in to change notification settings

mathetake/gann

Repository files navigation

gann

CircleCI MIT License

portfolio_view

gann (go-approximate-nearest-neighbor) is a library for approximate nearest neighbor search purely written in golang.

The implemented algorithm is truly inspired by Annoy (https://github.com/spotify/annoy).

feature

  1. purely written in Go: no dependencies out of Go world.
  2. easy to tune with a bit of parameters

installation

go get github.com/mathetake/gann

parameters

setup phase parameters

name type description run-time complexity space complexity accuracy
dim int dimension of target vectors the larger, the more expensive the larger, the more expensive N/A
nTree int # of trees the larger, the more expensive the larger, the more expensive the larger, the more accurate
k int maximum # of items in a single leaf the larger, the less expensive N/A the larger, the less accurate

runtime (search phase) parameters

name type description time complexity accuracy
searchNum int # of requested neighbors the larger, the more expensive N/A
bucketScale float64 affects the size of bucket the larger, the more expensive the larger, the more accurate

bucketScale affects the size of bucket which consists of items for exact distance calculation. The actual size of the bucket is calculated by int(searchNum * bucketScale).

In the search phase, we traverse index trees and continuously put items on reached leaves to the bucket until the bucket becomes full. Then we calculate the exact distances between a item in the bucket and the query vector to get approximate nearest neighbors.

Therefore, the larger bucketScale, the more computational complexity while the more accurate result to be produced.

example

package main

import (
	"fmt"
	"math/rand"
	"time"

	"github.com/mathetake/gann"
	"github.com/mathetake/gann/metric"
)

var (
	dim    = 3
	nTrees = 2
	k      = 10
	nItem  = 1000
)

func main() {
	rawItems := make([][]float64, 0, nItem)
	rand.Seed(time.Now().UnixNano())

	for i := 0; i < nItem; i++ {
		item := make([]float64, 0, dim)
		for j := 0; j < dim; j++ {
			item = append(item, rand.Float64())
		}
		rawItems = append(rawItems, item)
	}

	m, err := metric.NewCosineMetric(dim)
	if err != nil {
		// err handling
		return
	}

	// create index
	idx, err := gann.CreateNewIndex(rawItems, dim, nTrees, k, m)
	if err != nil {
		// error handling
		return
	}

	// search
	var searchNum = 5
	var bucketScale float64 = 10
	q := []float64{0.1, 0.02, 0.001}
	res, err := idx.GetANNbyVector(q, searchNum, bucketScale)
	if err != nil {
		// error handling
		return
	}

	fmt.Printf("res: %v\n", res)
}

references

License

MIT