Skip to content
Runar Buvik edited this page Sep 10, 2013 · 6 revisions

Aub article This article is a work in progress. You can help Searchdaimon by expanding it with information you know.

The ES utilizes a system where 5000 documents are grouped together and stored in a type of "bucket" we call a "lot". Most command line tools either work on the lot level or document level. Individual lots are addressed with a lot id and collection name, and documents with a document id and collection name.

Tools for interact with documents

To interact with document there is two basic tools; rreadbb for printing all documents and PageInfobb for getting extended information about a single document.

rreadbb – Print information on all documents in a lot

The rreadbb tool prints out information on the documents in a lot by reading the document repository.

Example:
Print information about lot 1 in the Enronsmall collection.
bin/rreadbb 1 Enronsmall

You can also use 0 as lot number to get all the lots in a collection.

Command line arguments

Argument     
-h Html. Print the content of the document
-a Acl. Print any Access control associated with the document.
-s Statistics. Print statistics about the lot
-r Reponame. Use an alternative repository name

PageInfobb - Print information about a single document

Example:
Prints information about document with the document id 481 in the Enronsmall collection.
bin/PageInfobb 481 Enronsmall

Command line arguments

Argument     
-h Html. Print the content of the document
-s Summary. Print the text summary if we have it
-w Words. Print all the words in the document
_Other options may also be available, pleas see the source code for more info_

Tools for interact with indexes

readDocumentIndexbb – Reads the document index

On the ES documents are stored sequentially in a repository. This repository is so indexed to make a ISAM index for fast retrieval.

The readDocumentIndexbb program prints the raw ISAM index.

Example:
Print information about lot 1 in the Enronsmall collection.
bin/readDocumentIndexbb 1 Enronsmall

readIIndex - Prints an inverted index

readIIndex prints out all document ids and other information stored in an inverted index file.

Example:
Prints all documents in bucket 1 for the Enronsmall collection.
bin/readIIndex /boithoData/lot/1/iindex/Enronsmall/Main/index/aa/1.txt