Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add filtering, normalisation and variants for postcodes #2756

Closed
wants to merge 25 commits into from

Conversation

lonvia
Copy link
Member

@lonvia lonvia commented Jun 21, 2022

This adds a special sanitizer and tokenizer for postcodes. The sanitizer filters the postcodes from the OSM tags, so that only those can pass that conform to the official postcode format of the country. The tokenizer creates variants for postcodes that have optional spaces in them.

Per-country postcode formats can be configured in settings/country_settings.yaml. There is a new section in the customization documentation that explains the details.

The main effect of this change is that only postcodes in the proper format will end up in the location postcode table. Postcodes with typos and bad postcode entries will no longer propagate to other objects. Nominatim also has some special treatment of words recognized as postcodes in a search request. This will now work much better because the recognition as a postcode is based on 'whatever can be found in the location_postcode table'.

Still to do are handling of hierarchical postcodes (#1011) and recognition of unknown postcodes during search (#1452).

Fixes #927. Fixes #1207.

lonvia added 25 commits June 20, 2022 22:47
The postcodes will only be removed as a 'computed postcode' they
are still searchable for the given object.
Adds patterns for countries that have simple numeric-only postcodes.
Moves postcodes that are either in countries without a postcode
system or don't correspond to the local pattern for postcodes into
a field for a normal address part. Makes them searchable but not as
a special address. This has two consequences: they are no longer a
skippable part of the address and the postcodes cannot be searched
on their own.
If the country code is not part of the mandatory output, the
country code filter will do the correct handling.
Now includes all postcodes that have optional parts.
Optional groups are not implemented yet.
Also documents the changes to the SQL functions of the tokenizer.
Includes smaller code fixes found by the tests.
@lonvia lonvia closed this Jun 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check postcodes for format correctness Postcode search requires a space to return results
1 participant