-
Notifications
You must be signed in to change notification settings - Fork 1.3k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What data to add next? #6
Comments
It might be useful to provide the country name in the native language of the country itself (e.g. |
The CLDR database of the unicode project contains Country-To-Language data, including the percent of speakers: http://www.unicode.org/cldr/charts/latest/supplemental/territory_language_information.html |
The native name of Germany is already in 'alt-spellings'. I recognize that the name 'alt-spellings' isn't good since it contains alternative spellings and the native name of the country. So there are two solutions here:
Initially, I created this dataset with a country selector in mind [1] but it would make more sense to be able to get the native names separately. So I would choose the second option. But the second option raises the question of how to write the native name of the country. German uses latin characters so it's easy to know that it's Germany, but what about Armenia for example which is written Հայաստան in armenian [2]? For some people it might be difficult to know that it's Armenia. What do you think? I know that alternative spellings and native names are missing for many countries, I'm currently working on adding them. Also, I'll add the native/official language(s) of each country. [1] https://github.com/JamieAppleseed/selectToAutocomplete |
Not all people speak English, so they might be confused while selecting their locale. It might be useful if it is possible to see the English and native version of the country name parallel in the selector. I would recommend to provide both versions for different individual usecases. |
Right, it's valid for non english speakers. If you want, feel free to start working on adding the native names as I'll be off for a few days. |
I think it would be great to have a way to make Countries Hierarchical and have meta data describing whether they are countries or sovereign states. For the UK currently it says "alt-spellings":"GB,Great Britain,England,UK,Wales,Scotland,Northern Ireland". The full name of the UK is "The United Kingdom of Great Britain and Northern Ireland". It is not a country, it is a sovereign state. Great Britain also isn't a country, it's an island. There are three countries in Great Britain: England, Scotland and Wales. So the types I think needed are: Country, State, Sovereign State and potentially Nation and Union as well. Then it would be good to have a way to specify that England is within the UK and if you also have unions that it is within the EU. Another nice feature would be to list what land borders a country has. So you could specify that England borders Scotland and Wales for example. |
From https://github.com/ProGNOMmers It would be wonderful if it would be possible to retrieve regions, provinces and cities. Something like: // Regions of country
// /rest/alpha2/it/regions ->
{ regions: [ "Abruzzi e Molise",
"Basilicata",
"Calabria",
"Campania",
"Emilia-Romagna",
"Friuli-Venezia Giulia",
"Lazio",
"Liguria",
"Lombardia",
"Marche",
"Piemonte",
"Puglia",
"Sardegna",
"Sicilia",
"Toscana",
"Trentino-Alto Adige",
"Umbria",
"Valle d'Aosta",
"Veneto" ] }
// Provinces of region
// /rest/alpha2/it/regions/Veneto/provinces ->
{ provinces: [ "Verona", "Venezia", ... ] }
// Cities of province
// /rest/alpha2/it/regions/Veneto/provinces/Venezia/cities ->
{ cities: [ { name: "Venezia", zip_codes: [ "30121", ... , "30176" ] },
{ name: "Chioggia", zip_codes: [ "30015" ] },
{ name: "San Donà di Piave", zip_codes: [ "30027" ] },
... ] }
// Cities of country by name
// /rest/alpha2/it/regions/Veneto/provinces/Venezia/cities ->
{ cities: [ { name: "Venezia", zip_codes: [ "30121", ... , "30176" ] },
{ name: "Chioggia", zip_codes: [ "30015" ] },
{ name: "San Donà di Piave", zip_codes: [ "30027" ] },
... ] } Cities could have metadata like f.i. zip codes, which are very useful. It is a huge work because recording and maintaining the whole list of regions, provinces and cities for every world country is hard, but it is a good target to be accomplished by an open source project. |
I agree, I'll add this to the todo. I know that many entries in the dataset are not actual contries. I wanted to provide simple and factual data about world countries but I understand that more accuracy is needed. |
Yes it is a huge work. First I want to continue to add more data at the country level (native and official names, official language, etc.) and add the master file as soon as possible (#12) to ease the contributions. Thank you for your help/feedback, I appreciate it! |
add country official language(s) in English add alt spellings: official country name in english and in its official language(s) add region and subregion for Bonaire, Sint Maarten and South Sudan add capital for British Indian Ocean Territory, Micronesia, Réunion, South Georgia, Virgin Islands (British) and Virgin Islands (U.S.) add currency for Palestinian Territory rename Brunei Darussalam to Brunei rename Falkland Islands (Malvinas) to Falkland Islands rename French Southern Territories to French Southern and Antarctic Lands rename Myanmar (Burma) to Myanmar (added Burma in alt-spellings) rename Palestinian Territory to Palestine rename Pitcairn to Pitcairn Islands rename Russian Federation to Russia rename Syrian Arab Republic to Syria rename Virgin Islands (British) to British Virgin Islands rename Virgin Islands (U.S.) to United States Virgin Islands fix ccn3 padding fix subregion for Brunei Darussalam, Cambodia, Indonesia, Laos, Malaysia, Myanmar, Philippines, Singapore, Thailand, Timor-Leste and Vietnam fix TLD for Bonaire, Heard and McDonald Islands, Kazakhstan and Saint Martin fix capital for Moldova fix alt-spellings for United Kingdom (#6 (comment)) update README
@stephenpaulger in bd22b4a I have removed most of the names in |
We can also add time zone data from http://timezonedb.com/download. |
It would be really nice if there would be also a list of states per country such as the United States states. http://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States |
@shanti2530 yes, this has been suggested #6 (comment) but it has not been done yet because the work is pretty huge. |
@mledoze don't know if this is what you were looking for http://vikku.info/programming/geodata/geonames-get-country-state-city-hierarchy.htm |
@shanti2530 this seems very good, thank you. I'll create an issue for this. Would you like to work on this? |
GeoJSON outlines of the countries: https://github.com/datasets/geo-boundaries-world-110m |
@gerbenjacobs yes good idea, I'll add this to the to-do |
I agree for the gerbenjacobs idea of GeoJSON outlines of the countries |
@mledoze don't know if it's in the scope of this project, but I would love to see financial information like GDP, GDP per capita, GNI etc. - problem with this is of course that these numbers would change every year. |
@matiassingers no it's not really in the scope of this project. I prefer to stick with static data that do not change. The dataset currently contains population data which are not in the scope and I would like to remove it in the near future. Although it does not currently contains GDP data, you should check this project https://github.com/tinata/tinatapi which contains other financial data. |
@dalu the postal prefixes is a good idea! |
@dalu you are saying that postal services want the native country name instead of the country postal prefix? |
I would like to inform you that I am about to remove population data because they require frequent updates to stay relevant. I recently added CONTRIBUTING explaining the contributions rules of this project. Population data do not follow these instructions. |
acknowledged |
source http://thematicmapping.org/downloads/world_borders.php you can test these files here: http://geojsonlint.com/ original idea from @gerbenjacobs #6 (comment)
How about the address format, from the page mentioned above: http://en.wikipedia.org/wiki/Address_(geography) This may be fairly difficult to do as it requires some pseudo templating language, so say for US: And would need agreement on the labels used... |
@maquejp Countries project ideology is KISS. Adding cities would majorly increase the size of the project, and would not serve according value. So I personally vote not on that idea. About SQL DUMP: I don't see much of an issue to write a script that loads json into DB. There are far more then enough solutions online/stackoverflow. As addition it is a major security risk, to just load random sql dumps into databases. So I vote no on that one as well. |
What about the past? In using countries.json to analyse past pieces of text, I found "soviet" was not identified. Would adding a historical extension of the data with applicable dates be achievable in the long term? |
@cmacdonald Countries project goal is to provide basic information about current de-jure political state of the world. We provide basic, static information. History will never be part of this project as it is contradicts the goal of this project to provide basic and static information, and due to single case use limitation. |
Hello @cmacdonald, I'm sorry but historical data are out of scope for this project. |
Hello! I think adding emoji for country flag will be nice. You can find emoji of flags here: http://emojiflags.com/ |
If someone is searching for a country list with info if the country is in the EU: Here's a short script to append the info to the country list. https://gist.github.com/Bartinger/f0507c786bad45cc942de471b1427e48 |
@Bartinger thank you for that! :) |
Hi Thanks for a great project. |
Hi @mledoze, how accurate should be the country name translations? Should it be 100% or it would be ok from an online translator. |
Hello @marc-ed-raffalli, I am sorry but I don't understand your question. The translations currently available in this project were gathered from various sources (mainly from Wikipedia). I hope that they are correct but I cannot guarantee that they are 100% correct. |
Hello @mledoze, I wanted to know if the translation has to be exact or is it ok to have translation from online translator like Google Translate (which may produce inaccurate output). Then, if it is ok, what would be the list of targeted languages? |
Well in that case, the translations have to be exact :) There is no list of targeted languages, any language can be added to the list of translations. |
Good to know, I'll look for better translation sources :) |
Hi @mledoze capital:{
en: 'London',
fr: 'Londres'
} |
@marc-ed-raffalli this change was already made by @blumk in d9e81cc. But translations for countries capital(s) are welcome :) |
@mledoze The change allows to support multiple capitals as an array of strings, does not allow a mapping to multiple languages. I would propose something like:
edited with better example |
Looking at the current
Creating separate language files brings the following benefits:
@marc-ed-raffalli @mledoze What do you guys think of this approach? |
Do you mean 3 files? one pure data (language agnostic), and two translation files |
Yes. Adding support for for two additional languages ( |
@blumk Let's see what's the feedback from the other contributors |
I find this idea interesting, but what would be your proposal for the directories/files structure? Given your example, would the |
I guess there are two options:
I might prefer one file per language containing all 250 records for the following reasons:
E.g.
|
My suggestion on "what to add next" would be NDD and IDD prefixes. |
Thank your for your suggestion, I'll look into it. |
@blumk I would like to move forward with your proposal to refactor translations to separate files; could you please open a new issue with your proposal? Thank you very much. |
One thing I would really like to see added is the most commonly used native language / administrative language used in the country. For example, Norway has three different, unweighted native names: And there is no way to tell which one of the three is the commonly used, for example in language or country choosers. In this example, Norge would obviously be the way to go, not Noreg or Norgga. Can we either have a key for the official native name or a weighted/ranked list of languages in the country? Also, there are altSpellings: But I don't feel comfortable trusting on the first items in the list to be the most common, also it's pretty much a mix of abbreviations and languages, which is generally not very helpful in this case. Also, Norgga from the native array above is missing in the altSpellings array, which is confusing? |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
I would like to discuss here the data that should be added to this repository.
A similar project like 0xJS [1] contains a lot more data such as the land area or the latitude/longitude coordinates of each country.
Is it interesting/useful to have this kind of data too?
Data that can be added:
What would you like to be added?
Please let me know in the comments.
[1] http://oxjs.org/#doc/Ox.COUNTRIES
[2] source: http://opengeocode.org/
[3] source: https://oxjs.org/#doc/Ox.COUNTRIES
From the comments
The text was updated successfully, but these errors were encountered: