/conf
schema.xml
solrconfig.xml
stopwords.txt
/data
index
spellchecker
Loads plugins, sets access URLs, directories, listeners, warmup queries, etc
Contains the field type definitions (filters, tokenizers, etc)
And field definitions (name, type, indexed, sored, multiValued, etc)
And possibly other instructions, eg: unique key field, fallback field, query adjustments, etc
<field name="id" type="[field_type_name]" indexed="[boolean]" stored="[boolean]" multiValued="[boolean]" default="[value]" />
Set to true if field should be searchable, sortable, or facetable
If true, goes through 'analysis' phase
Transformation applied (lower case, remove word stems, etc), and tokens added to index
Set to true if field should be retrievable, or for using highlighting
Does not go through analysis phase
Boolean
Will throw an error if the value does not exist
Including compressThreshold, omitNorms, termVectors, omitPositions
Instead of sending data twice to different fields, let Solr do it for you
<field name="one" />
<field name="two" />
<field name="three" />
<copyField source="two" dest="three" />
If you set something to 'two', it will be stored in 'two' and 'three'
A common requiremnt is to copy (or mege) all input fields into one solr field;
<copyField source="*" dest="allText"/>
Allows you to process many fields in a similar way
Most useful feature is using wildcards
<dynamicField name="*" param="value" />
Another common method is using a few dynamic fields to allow for schema-less indexing;
<dynamicField name="*_i" type="integer" indexed="true" stored="true" />
<dynamicField name="*_s" type="string" indexed="true" stored="true"/>
<fieldType name="xx" class="solr.BaseClass" defaultAttribute="defaultValue" />
<fieldType name="xx" class="solr.BaseClass">
<analyzer>
<tokenizer class="solr.ClassName" />
<filter class="solr.ClassName" attribute="value" />
</analyzer>
<analyzer type="query">
<!-- ... -->
</analyzer>
</fieldType>
string
boolean
int
float
long
double
date
Format: 1995-12-31T23:59:59Z
text_general
Reasonable, generic cross-language defaults
Tokenizes, removes stop words, down cases
Applies synonyms at query time
text_en
Removes english stop words, down cases, applies stemming
text_en_splitting
Aggressive word-splitting and autophrase features enabled
'wi fi' will match 'WiFi' or 'Wi-Fi'
text_en_splitting_tight
Less flexible matching (use if you get false matches)
Good for SKUs (can insert dashes in wrong place and still match)
binary
location
currency
Using "Data Import Handler" (DIH)
http://wiki.apache.org/solr/DataImportHandler
Including those exported by MySQL
http://wiki.apache.org/solr/UpdateCSV
http://wiki.apache.org/solr/UpdateJSON
<field name="tags">tag1</field>
<field name="tags">tag2</field>
aka Atomic Updates Can send commands such as 'set', 'add', and 'inc'
Use HTTP GET
Query string
To search all documents;
q=*:*
Query strings are based on Lucene syntax
at least one word +musthavethisword +andthistoo -butnotthis "exact phrase" "proximity matching"~5 (fieldname:boostme)^2
Field list
Specify set of fields to return, limiting amount of information in the response
Space or comma separated
Examples:
id, name
id, score
*
*, score
Filter query
Some examples:
fl=name,id
Field List - only name and id fields
fl=name,id,score
Return relevancy score too
fl=*,score
All fields and relevancy score
wt=json
Return response in JSON format
qf=title^2,feature
Fields and weighting to search
sort=price desc
Default sort is 'score desc'
Uses em HTML tags to highlight relevant snippets in returned document
This example enables for 'name' and 'features' fields;
hl=true&hl.fl=name,features
Generates counts for various properties
Allows users to 'drill down' / refine their search
Request counts for category field 'cat':
facet=true&facet.field=cat
Add a facet on the boolean 'inStock' field:
facet=true&facet.field=cat&facet.field=inStock
Can also generate counts on the fly!
This example queries for prices below 100 and above 100 using range queries
facet=true&facet.query=price:[* TO 100]&facet.query=price:[100 TO *]
Collapses a group of results with same field value down to a single (or fixed number) of entries
Eg: google does this per site, so only a couple of entries per site are shown
Groups documents with a common field value into groups, and returns top documents per group
Eg: search for DVD's, show top results for TV, Movies, Documentaries, etc
Can be done a few ways - boost when indexing, or boost when querying
Store something such as important:true, and boost that match
Can read value (eg: property, or how recently updated the document is) and apply a boosting value
The default query parser is DisMax
Also includes Lucene query parser (aka 'standard'), and Extended DisMax (aka 'eDisMax')
DisMax = similar to Google, rarely displays syntax errors to users
eDisMax = improves DisMax, handles full Lucene syntax, and more (@todo - like what?)
- sort
- start
- rows
- fq = filter query
Enables users to specify very precise queries, however, it's less tolerant of syntax errors
Contains lots of Lucene stuff - fuzzy, proximity, boosting, phrasing, etc
https://cwiki.apache.org/confluence/display/solr/The+Standard+Query+Parser
- q = query (mandatory)
- df = default field to query (usually defined in schema.xml file)
Similar to Google. Rarely displays syntax error messages.
Implements an extremely simplified subset of Lucene QueryParser syntax (quotes, +, -)
Can also do boolean clauses and boosts in query
Solr administrator can provide additional boosting queries, functions, and filtering (solrconfig.xml)
These can be overridden in query URL
- q = query
- qf = Query Fields. Which fields to perform query on. If absent, defaults to 'df'. Can contain boosting stuff.
Supports full Lucene syntax
Seems it's always best to choose this over DisMax
Set variables within query param
q={!key1=value key2=value}regular query
- facet = if set to 'true', enables faceting
- facet.query = lucene query to generate facet count
More here: https://cwiki.apache.org/confluence/display/solr/Faceting
Handles requests coming to Solr. Perhaps query requests, or index update requests.
<requestHandler name="/select" class="solr.SearchHandler">
Contains a bunch of 'lists';
- defaults = obvious
- appends = added to user queries