-
Notifications
You must be signed in to change notification settings - Fork 43
Command line tools
This article is a work in progress. You can help Searchdaimon by expanding it with information you know.
The ES utilizes a system where 5000 documents are grouped together and stored in a type of "bucket" we call a "lot". Most command line tools either work on the lot level or document level. Individual lots are addressed with a lot id and collection name, and documents with a document id and collection name.
To interact with document there is two basic tools; rreadbb for printing all documents and PageInfobb for getting extended information about a single document.
The rreadbb tool prints out information on the documents in a lot by reading the document repository.
Example:
Print information about lot 1 in the Enronsmall collection.
bin/rreadbb 1 Enronsmall
You can also use 0 as lot number to get all the lots in a collection.
Argument | |
-h | Html. Print the content of the document |
-a | Acl. Print any Access control associated with the document. |
-s | Statistics. Print statistics about the lot |
-r | Reponame. Use an alternative repository name |
Example:
Prints information about document with the document id 481 in the Enronsmall collection.
bin/PageInfobb 481 Enronsmall
Argument | |
-h | Html. Print the content of the document |
-s | Summary. Print the text summary if we have it |
-w | Words. Print all the words in the document |
On the ES documents are stored sequentially in a repository. This repository is so indexed to make a ISAM index for fast retrieval.
The readDocumentIndexbb program prints the raw ISAM index.
Example:
Print information about lot 1 in the Enronsmall collection.
bin/readDocumentIndexbb 1 Enronsmall
readIIndex prints out all document ids and other information stored in an inverted index file.
Example:
Prints all documents in bucket 1 for the Enronsmall collection.
bin/readIIndex /boithoData/lot/1/iindex/Enronsmall/Main/index/aa/1.txt