This is a Kirby plugin which enables incrementally sending new or updated content to Algolia for indexing. Supports batch indexing all content through a CLI script.
The default indexing method is inspired by the rationale behind DocSearch, featured in this blog post. It will be referred as fragment indexing in this readme and in the code.
The basic idea behind fragment indexing is that when people search for a phrase, they expect results that show the searched terms in that phrase within relatively close proximity of each other. For instance, when looking for red door, I am more likely to be interested in matches where these two words are in the same sentence (for instance "this door, which I see across the room, has a red handle") than matches where the words red and door appear in different paragraphs.
Algolia already gives access to that metric used in their tie-break algorithm. Through the correct configuration options, this algorithm enables matches with the closest proximity of each other to be pushed forward. Fragment indexing goes a step further by ensuring that matches can only be found within the same fragment. In this case, a fragment consists of a heading and the immediate following text. The fragment stops at the next heading (or the end of the file).
- Go through the setup.
- Create a piece of content in Kirby's panel and click 'Save'.
- Check your Algolia dashboard to see your indexed content.
Run php path/to/batch-index.php
from either the site root or the plugin folder. Uses the same configuration options as the incremental indexing.
$ git clone https://github.com/mlbrgl/kirby-algolia [KIRBY_ROOT]/site/plugins
where KIRBY_ROOT is the folder where you Kirby site lives.
Then, run composer install
.
In [KIRBY_ROOT]/site/config/config.php, add the configuration options:
// Example configuration options in config.php
$config = [
"mlbrgl.kirby-algolia" => [
"algolia" => [
"application_id" => "[ALGOLIA_APP_ID]", // required
"index" => "[ALGOLIA_INDEX_NAME]", // required
"api_key" => "[ALGOLIA_API_KEY]" // required, needs to have write access to the ALGOLIA_INDEX_NAME
],
"fields" => [
"article" => [ // example, at least one blueprint required
"meta" => ["title", "datetime", "author"], // example, optional. See description below.
"boost" => ["teaser"], // example, optional. See description below.
"main" => ["text"], // example, optional. See description below.
],
... // other blueprints, option
],
"active" => true, // false to parse without sending to Algolia
],
... // other config
;
Each blueprint array is an array of field IDs (defined in the site blueprints):
meta
: if the whole content was indexed as a single record, each field would likely be an attribute on Algolia's side. However, in a fragmented context, we need to decide what fields are content and what fields are metadata attached to that content. This is the purpose of themeta
array. As a starting point, title and author can be used here.- Expects raw text (no Markdown)
boost
: allows for a very basic priority system between fields. Typically used for teasers or semantically rich fields. The content in this field should be short and not contain headings as it will not be fragmented.- Expects raw text (no Markdown)
main
: this is where you main content lives. Fragmentation will happen when a new heading is found.- Expects Markdown or Kirby's markdown flavor
- Date fields are converted to timestamp (only format supported by Algolia for sorting) only if the date field is either called
date
ordatetime
and if it is listed in themeta
fields. In case the content of amain
field does not start with a heading, all content up to the first heading will be ignored.- Headings are only recognized as such when they start on the first character of the line (no leading spaces)
Run tests with composer test
.
Tests inherit the configuration from the parent site. This can be overriden in bootstrap.php
.
Parsed 872 pages (8275 fragments) in 4.1703071594238 s.
Memory usage: 21.64 MB
- Kirby 3: v3.x.x releases
- Kirby 2: v1.x.x releases
This work is unlicenced along the terms of https://unlicense.org/.