Skip to content

Latest commit

 

History

History
15 lines (10 loc) · 966 Bytes

README.md

File metadata and controls

15 lines (10 loc) · 966 Bytes

sphinxsearch-wordforms-pl

Polish wordforms file (dictionary) for use with Sphinx Search, as it lacks official polish stemmer. Provided text file has already been stripped of duplicate wordforms.

Usage

Link to the dictionary file in an index definition in your Sphinx config (typically /etc/sphinxsearch/sphinx.conf).

wordforms = /path/to/pl_PL.UTF-8.txt

Apart from applying wordforms file into your Sphinx config, you must make sure that polish letters are considered valid keywords part.

Polish charset_table from official Sphinx wiki:

charset_table = 0..9, A..Z->a..z, a..z, U+0143->U+0144, U+0104->U+0105, U+0106->U+0107, U+0118->U+0119, U+0141->U+0142, U+00D3->U+00F3, U+015A->U+015B, U+0179->U+017A, U+017B->U+017C, U+0105, U+0107, U+0119, U+0142, U+00F3, U+015B, U+017A, U+017C, U+0144