Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"#number" after street name does not parse as unit number, but as street name. #671

Open
arya6000 opened this issue Sep 21, 2024 · 4 comments

Comments

@arya6000
Copy link

Hi!

I was checking out libpostal, and saw something that could be improved.


My country is

US


Here's how I'm using libpostal

Parsing list of addresses in my city to store in a normalized relational database.


Here's what I did

Parsed the following address "1141 Kendall Town Blvd #3202, Jacksonville, FL 32225"


Here's what I got

house_number: 1141
road: kendall town blvd #3202
city: jacksonville
state: fl
postcode: 32225


Here's what I was expecting

house_number: 1141
unit: #3202
road: kendall town blvd
city: jacksonville
state: fl
postcode: 32225


  • Does the input address exist in OpenStreetMap?
    No

  • Do all the toponyms exist in OSM (city, state, region names, etc.)?
    City and state are in OSM

  • If the address uses a rare/uncommon format, does changing the order of the fields yield the correct result?
    "1141 #3202 Kendall Town Blvd, Jacksonville, FL 32225" results in the following format

house_number: 1141 #3202
road: kendall town blvd
city: jacksonville
state: fl
postcode: 32225

But "#3202" should be in listed under "unit" and not house number. However "1141 apt 3202 Kendall Town Blvd, Jacksonville, FL 32225" outputs the correct format

house_number: 1141
unit: apt 3202
road: kendall town blvd
city: jacksonville
state: fl
postcode: 32225

  • If the address contains apartment/floor/sub-building information or uncommon formatting, does removing that help? Is there any minimum form of the address that gets the right parse?

Yes, the following results in correct output

"1141 apt 3202 Kendall Town Blvd, Jacksonville, FL 32225"


Here's what I think could be improved

If "# followed by numbers is listed before the city it would be treated as unit number.

@brianmacy
Copy link

brianmacy commented Sep 21, 2024 via email

@arya6000
Copy link
Author

arya6000 commented Sep 21, 2024

Did you try the Senzing provided model?

I was not aware of Senzing. You are referring to this https://github.com/Senzing/libpostal-data ?

@brianmacy
Copy link

brianmacy commented Sep 21, 2024 via email

@arya6000
Copy link
Author

Yes. If you search the libpostal docs for alternative data models you should see how to enable it.

I just tried with a the Senzing model and it solved the issue. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants