-
-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hyphenated housenumber parsing #204
Comments
we already have some configurable values to control how we handle hyphens: |
@missinglink Putting my "UK centric addressing format" hat on for a second ... When running
... both of which make sense for the UK. Apartment/flat/unit numbers are (almost) always expressed as Apt 1 or Flat 1 followed by the rest of the address, so Also, given that a significant number of UK building number allocations follow odd numbers on the one side of the road and even numbers on the other, then Looking at the constants at https://github.com/pelias/interpolation/blob/master/lib/analyze.js#L1-L8 ... I'd welcome some suggestions on what magic values to drop in here and tweak to make the |
Hmm yeah so we can totally add country-specific logic, some potential issues adding that:
OSM has the concept of interpolation ranges, these are much more reliable and already supported out-of-the-box, as are TIGER ranges. You should also consider just doing nothing, which I know sounds like an anti-solution but let me explain 😄 Interpolation ranges are only valuable when they are valid, if one or more erroneous members are introduced into the range then it can screw up most of the street. However, if we have fewer points then we only lose out on precision, so a valid sparse index is probably preferable to a dense range with errors, if that makes sense? Maybe you could send me an example of a street which you'd like to improve? |
Ugh the address coverage in the UK is just so bad, what ever happened to the OpenAddressesUK project and the rumours of Ordnance Survey opening some block range data up? There's an interactive demo where you can click streets to see the coverage, which just proves how sparse the coverage is in the UK, even in London:
Maybe I missed a bunch of data in the last import? |
Here's a good example, which happens to be my local supermarket ... AFAIK OpenAddresses UK almost got there, but then died due to claims of legal rights ov There is a whole new load of OS open data coming this month as a result of the UK Geospatial Commission shaking things up, which I'm waiting eagerly to see just what gets released and whether I can a) use this in my Pelias instance and then (of course) b) contribute this back to Pelias. But right now ... I'm waiting |
END OF RANT and to answer your question, the easiest thing to do is split the data yourself, so a single row in your file becomes two rows, one is the beginning number and one is the end. That's it, there is no added value in generating all the rest of the values within the range, they can be interpolated. |
That makes a lot of sense and I'll give that a try. Also, I appreciated the rant about open addressing data in the UK. I feel that way ... a lot |
Amazing, I've been waiting 6 years for this day, if/when that happens we should jump on a call. |
@missinglink Hmm ... three new open data sets are now up on the new OS Data Hub, Open TOIDs, Open UPRNs and Open USRNs ... sadly I'm underwhelmed at first glance. Not what I'd hoped for. It's just indentifiers which are linkages into proprietary data sets such as AddressBase and MasterMap ... https://osdatahub.os.uk/downloads/open |
👑 📧 is the 😈 |
What I would love to have (at minimum) is 4 house numbers per street, just the start-left, start-right, end-left & end-right house numbers, from this we can figure out quite a lot, and if those 4 points also had postal code info then this would make a huge difference. What I'm describing is the TIGER file I'm using for the USA, to some degree we could delete all of OSM and OA for the USA and it wouldn't be too bad. |
Hmmm ... with some custom preprocessing and tooling you might be able to cobble that together from (off the top of my head) ONS PD, OS OpenNames, CodePoint Open and OpenRoads and WOF. Maybe. Plus some interpolation into OSM. Though that may veer into (OSM ODbL) derived data set licensing horrendousness. |
But that would only be for England and Wales. Maybe for Scotland too. But definitely not for Northern Ireland. Because history and politics. |
Hmm... I had a quick look at those data sources today and I couldn't find anything more granular than a street 😢 |
@missinglink There's a conversation about UK open data, probably mainly about admin polygons, going on over on Gitter which would be good to get your take on when you have a moment |
We have a conservative setting for parsing hyphenated house numbers.
ie. is
4-6
a 'number range' or a 'house number and apartment number'In some countries such as Canada their postal authority recommends separating the house number and apartment number with a hyphen.
https://en.wikipedia.org/wiki/Address#Canada
As we cannot reliably determine the postal addressing format, we discard the address rather than potentially corrupting the number series with an incorrect value.
We have tests for this behaviour here: https://github.com/pelias/interpolation/blob/master/test/lib/analyze.js#L41
It would probably be better to assume these numbers are ranges and then try to detect countries where hyphens are used to delimit apartment numbers (possibly via a bbox check) and then only apply the conservative logic for these countries.
The text was updated successfully, but these errors were encountered: