Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What data to add next? #6

Closed
mledoze opened this issue Oct 4, 2013 · 120 comments
Closed

What data to add next? #6

mledoze opened this issue Oct 4, 2013 · 120 comments

Comments

@mledoze
Copy link
Owner

mledoze commented Oct 4, 2013

I would like to discuss here the data that should be added to this repository.

A similar project like 0xJS [1] contains a lot more data such as the land area or the latitude/longitude coordinates of each country.

Is it interesting/useful to have this kind of data too?

Data that can be added:

  • land (land mass in square kilometers [3])
  • latitude (latitude coordinate of the capital [2])
  • longitude (longitude coordinate of the capital [2])
  • east (longitude of the country's eastern boundary [3])
  • north (latitude of the country's northern boundary [3])
  • south (latitude of the country's southern boundary [3])

What would you like to be added?

Please let me know in the comments.

[1] http://oxjs.org/#doc/Ox.COUNTRIES
[2] source: http://opengeocode.org/
[3] source: https://oxjs.org/#doc/Ox.COUNTRIES


From the comments

@scento
Copy link

scento commented Oct 22, 2013

It might be useful to provide the country name in the native language of the country itself (e.g. {"name": "Germany", "name_native": "Deutschland"}...

@scento
Copy link

scento commented Oct 22, 2013

The CLDR database of the unicode project contains Country-To-Language data, including the percent of speakers: http://www.unicode.org/cldr/charts/latest/supplemental/territory_language_information.html

@mledoze
Copy link
Owner Author

mledoze commented Oct 23, 2013

It might be useful to provide the country name in the native language of the country itself

The native name of Germany is already in 'alt-spellings'. I recognize that the name 'alt-spellings' isn't good since it contains alternative spellings and the native name of the country. So there are two solutions here:

  • either we change 'alt-spellings' to 'alt-names' and keep the native name here
  • or we keep 'alt-spellings' just for alternative spellings and create 'name-native' as you suggested.

Initially, I created this dataset with a country selector in mind [1] but it would make more sense to be able to get the native names separately. So I would choose the second option.

But the second option raises the question of how to write the native name of the country. German uses latin characters so it's easy to know that it's Germany, but what about Armenia for example which is written Հայաստան in armenian [2]? For some people it might be difficult to know that it's Armenia.

What do you think?

I know that alternative spellings and native names are missing for many countries, I'm currently working on adding them. Also, I'll add the native/official language(s) of each country.

[1] https://github.com/JamieAppleseed/selectToAutocomplete
[2] http://en.wikipedia.org/wiki/Armenia

@scento
Copy link

scento commented Oct 23, 2013

Not all people speak English, so they might be confused while selecting their locale. It might be useful if it is possible to see the English and native version of the country name parallel in the selector.

I would recommend to provide both versions for different individual usecases.

@mledoze
Copy link
Owner Author

mledoze commented Oct 24, 2013

Right, it's valid for non english speakers.

If you want, feel free to start working on adding the native names as I'll be off for a few days.

@stephenpaulger
Copy link
Contributor

I think it would be great to have a way to make Countries Hierarchical and have meta data describing whether they are countries or sovereign states.

For the UK currently it says "alt-spellings":"GB,Great Britain,England,UK,Wales,Scotland,Northern Ireland".

The full name of the UK is "The United Kingdom of Great Britain and Northern Ireland". It is not a country, it is a sovereign state.

Great Britain also isn't a country, it's an island.

There are three countries in Great Britain: England, Scotland and Wales.

So the types I think needed are: Country, State, Sovereign State and potentially Nation and Union as well.

Then it would be good to have a way to specify that England is within the UK and if you also have unions that it is within the EU.

Another nice feature would be to list what land borders a country has. So you could specify that England borders Scotland and Wales for example.

@fayderflorez
Copy link
Contributor

From https://github.com/ProGNOMmers

It would be wonderful if it would be possible to retrieve regions, provinces and cities.

Something like:

// Regions of country
// /rest/alpha2/it/regions ->
{ regions:  [ "Abruzzi e Molise",
              "Basilicata",
              "Calabria",
              "Campania",
              "Emilia-Romagna",
              "Friuli-Venezia Giulia",
              "Lazio",
              "Liguria",
              "Lombardia",
              "Marche",
              "Piemonte",
              "Puglia",
              "Sardegna",
              "Sicilia",
              "Toscana",
              "Trentino-Alto Adige",
              "Umbria",
              "Valle d'Aosta",
              "Veneto" ] }

// Provinces of region
// /rest/alpha2/it/regions/Veneto/provinces ->
{ provinces: [ "Verona", "Venezia", ... ] }

// Cities of province
// /rest/alpha2/it/regions/Veneto/provinces/Venezia/cities ->
{ cities: [ { name: "Venezia", zip_codes: [ "30121", ... , "30176" ] }, 
            { name: "Chioggia", zip_codes: [ "30015" ] },
            { name: "San Donà di Piave", zip_codes: [ "30027" ] }, 
            ... ] }

// Cities of country by name
// /rest/alpha2/it/regions/Veneto/provinces/Venezia/cities ->
{ cities: [ { name: "Venezia", zip_codes: [ "30121", ... , "30176" ] }, 
            { name: "Chioggia", zip_codes: [ "30015" ] },
            { name: "San Donà di Piave", zip_codes: [ "30027" ] }, 
            ... ] }

Cities could have metadata like f.i. zip codes, which are very useful.

It is a huge work because recording and maintaining the whole list of regions, provinces and cities for every world country is hard, but it is a good target to be accomplished by an open source project.

@mledoze
Copy link
Owner Author

mledoze commented Nov 4, 2013

@stephenpaulger

I think it would be great to have a way to make Countries Hierarchical and have meta data describing whether they are countries or sovereign states.

I agree, I'll add this to the todo. I know that many entries in the dataset are not actual contries. I wanted to provide simple and factual data about world countries but I understand that more accuracy is needed.

@mledoze
Copy link
Owner Author

mledoze commented Nov 4, 2013

@fayder

It would be wonderful if it would be possible to retrieve regions, provinces and cities.

Yes it is a huge work. First I want to continue to add more data at the country level (native and official names, official language, etc.) and add the master file as soon as possible (#12) to ease the contributions.

Thank you for your help/feedback, I appreciate it!

mledoze added a commit that referenced this issue Nov 16, 2013
add country official language(s) in English
add alt spellings: official country name in english and in its official language(s)
add region and subregion for Bonaire, Sint Maarten and South Sudan
add capital for British Indian Ocean Territory, Micronesia, Réunion, South Georgia, Virgin Islands (British) and Virgin Islands (U.S.)
add currency for Palestinian Territory
rename Brunei Darussalam to Brunei
rename Falkland Islands (Malvinas) to Falkland Islands
rename French Southern Territories to French Southern and Antarctic Lands
rename Myanmar (Burma) to Myanmar (added Burma in alt-spellings)
rename Palestinian Territory to Palestine
rename Pitcairn to Pitcairn Islands
rename Russian Federation to Russia
rename Syrian Arab Republic to Syria
rename Virgin Islands (British) to British Virgin Islands
rename Virgin Islands (U.S.) to United States Virgin Islands
fix ccn3 padding
fix subregion for Brunei Darussalam, Cambodia, Indonesia, Laos, Malaysia, Myanmar, Philippines, Singapore, Thailand, Timor-Leste and Vietnam
fix TLD for Bonaire, Heard and McDonald Islands, Kazakhstan and Saint Martin
fix capital for Moldova
fix alt-spellings for United Kingdom (#6 (comment))
update README
@mledoze
Copy link
Owner Author

mledoze commented Nov 16, 2013

For the UK currently it says "alt-spellings":"GB,Great Britain,England,UK,Wales,Scotland,Northern Ireland".

@stephenpaulger in bd22b4a I have removed most of the names in altSpellings, now it's just GB,UK,Great Britain.

@mledoze
Copy link
Owner Author

mledoze commented Nov 17, 2013

We can also add time zone data from http://timezonedb.com/download.

@shanti2530
Copy link

It would be really nice if there would be also a list of states per country such as the United States states. http://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States

@mledoze
Copy link
Owner Author

mledoze commented Feb 13, 2014

@shanti2530 yes, this has been suggested #6 (comment) but it has not been done yet because the work is pretty huge.
Do you know a source where we can find the states for every country?

@shanti2530
Copy link

@mledoze don't know if this is what you were looking for http://vikku.info/programming/geodata/geonames-get-country-state-city-hierarchy.htm

@mledoze
Copy link
Owner Author

mledoze commented Feb 13, 2014

@shanti2530 this seems very good, thank you. I'll create an issue for this. Would you like to work on this?

@gerbenjacobs
Copy link

GeoJSON outlines of the countries: https://github.com/datasets/geo-boundaries-world-110m

@mledoze
Copy link
Owner Author

mledoze commented Mar 20, 2014

@gerbenjacobs yes good idea, I'll add this to the to-do

@oriolfg
Copy link

oriolfg commented Apr 1, 2014

I agree for the gerbenjacobs idea of GeoJSON outlines of the countries

@matiassingers
Copy link

@mledoze don't know if it's in the scope of this project, but I would love to see financial information like GDP, GDP per capita, GNI etc. - problem with this is of course that these numbers would change every year.

@mledoze
Copy link
Owner Author

mledoze commented Apr 28, 2014

@matiassingers no it's not really in the scope of this project. I prefer to stick with static data that do not change. The dataset currently contains population data which are not in the scope and I would like to remove it in the near future.

Although it does not currently contains GDP data, you should check this project https://github.com/tinata/tinatapi which contains other financial data.

@mledoze
Copy link
Owner Author

mledoze commented May 2, 2014

@dalu the postal prefixes is a good idea!

@mledoze
Copy link
Owner Author

mledoze commented May 2, 2014

@dalu you are saying that postal services want the native country name instead of the country postal prefix?

@mledoze
Copy link
Owner Author

mledoze commented May 6, 2014

I would like to inform you that I am about to remove population data because they require frequent updates to stay relevant.

I recently added CONTRIBUTING explaining the contributions rules of this project. Population data do not follow these instructions.

@fayderflorez
Copy link
Contributor

acknowledged

mledoze added a commit that referenced this issue May 9, 2014
@tdegrunt
Copy link

How about the address format, from the page mentioned above: http://en.wikipedia.org/wiki/Address_(geography)

This may be fairly difficult to do as it requires some pseudo templating language, so say for US:
"addressFormat": "{{name}}\n{{houseNumber}} {{street}}\n{{locality}}\n{{city}}\n{{postalCode}}"

And would need agreement on the labels used...

@yackermann
Copy link
Contributor

@maquejp Countries project ideology is KISS. Adding cities would majorly increase the size of the project, and would not serve according value. So I personally vote not on that idea.

About SQL DUMP: I don't see much of an issue to write a script that loads json into DB. There are far more then enough solutions online/stackoverflow. As addition it is a major security risk, to just load random sql dumps into databases. So I vote no on that one as well.

@cmacdonald
Copy link

What about the past? In using countries.json to analyse past pieces of text, I found "soviet" was not identified. Would adding a historical extension of the data with applicable dates be achievable in the long term?

@yackermann
Copy link
Contributor

@cmacdonald Countries project goal is to provide basic information about current de-jure political state of the world. We provide basic, static information.

History will never be part of this project as it is contradicts the goal of this project to provide basic and static information, and due to single case use limitation.

@mledoze
Copy link
Owner Author

mledoze commented Aug 31, 2016

Hello @cmacdonald, I'm sorry but historical data are out of scope for this project.

@8alery
Copy link

8alery commented Sep 1, 2016

Hello! I think adding emoji for country flag will be nice. You can find emoji of flags here: http://emojiflags.com/

@dominicbartl
Copy link

If someone is searching for a country list with info if the country is in the EU: Here's a short script to append the info to the country list.

https://gist.github.com/Bartinger/f0507c786bad45cc942de471b1427e48

@mledoze
Copy link
Owner Author

mledoze commented May 16, 2017

@Bartinger thank you for that! :)

@pfwd
Copy link

pfwd commented May 31, 2017

Hi Thanks for a great project.
Are there any plans to add ISO 639-1 codes? I'm looking for a way to adjust HTML language codes.
If the language section included the ISO 639-1 then it would be perfect for my use case.
I see that https://restcountries.eu has this support and its listed in the showcase.

@fayderflorez
Copy link
Contributor

Hi @pfwd I started restcountries.eu based on @mledoze's great data some years ago but then I started to maintain my own version. That's why they are not much similar anymore.

@marc-ed-raffalli
Copy link

Hi @mledoze, how accurate should be the country name translations? Should it be 100% or it would be ok from an online translator.

@mledoze
Copy link
Owner Author

mledoze commented Dec 6, 2017

Hi @mledoze, how accurate should be the country name translations? Should it be 100% or it would be ok from an online translator.

Hello @marc-ed-raffalli, I am sorry but I don't understand your question. The translations currently available in this project were gathered from various sources (mainly from Wikipedia). I hope that they are correct but I cannot guarantee that they are 100% correct.

@marc-ed-raffalli
Copy link

Hello @mledoze, I wanted to know if the translation has to be exact or is it ok to have translation from online translator like Google Translate (which may produce inaccurate output). Then, if it is ok, what would be the list of targeted languages?

@mledoze
Copy link
Owner Author

mledoze commented Dec 6, 2017

Well in that case, the translations have to be exact :)

There is no list of targeted languages, any language can be added to the list of translations.

@marc-ed-raffalli
Copy link

Good to know, I'll look for better translation sources :)

@marc-ed-raffalli
Copy link

marc-ed-raffalli commented Jan 24, 2018

Hi @mledoze
I found translations for a good number country names and capitals e.g. London => Londres.
In #263 (comment) you said planning for 2.0. I would propose to change capital to a multi dimensional array too, will send a PR once ok.

capital:{
  en: 'London',
  fr: 'Londres'
}

@mledoze
Copy link
Owner Author

mledoze commented Jan 28, 2018

I would propose to change capital to a multi dimensional array

@marc-ed-raffalli this change was already made by @blumk in d9e81cc. But translations for countries capital(s) are welcome :)

@marc-ed-raffalli
Copy link

marc-ed-raffalli commented Jan 28, 2018

@mledoze The change allows to support multiple capitals as an array of strings, does not allow a mapping to multiple languages. I would propose something like:

capital: {
  en: ['Pretoria', 'Cape Town', 'Bloemfontein'],
  fr: ['Pretoria', 'Le Cap', 'Bloemfontein'],
  ru: ['Претория', 'Кейптаун', 'Блумфонтейн'],
  zh: ['比勒陀利亚', '开普敦', '布隆方丹']
}

edited with better example

@blumk
Copy link
Contributor

blumk commented Jan 29, 2018

Looking at the current countries.json implementation I'd suggest refactoring translations into separate *.json files, e.g. fra.json:

{
  "cca3": "ZAF",
  "name": {"official": "R\u00e9publique d'Afrique du Sud", "common": "Afrique du Sud"},
  "capital": ['Pretoria', 'Le Cap', 'Bloemfontein'],
  "demonym": "Sud-Africain"
},
...

Creating separate language files brings the following benefits:

  • Current implementation has a "translations" property. Adding capitals would lead to the following confusion: translation of what? (Name, capital?). Dedicated files solves this problem (see example above)
  • Languages shouldn't pollute the main countries.json file since there are 6000+ languages. Adding a new language is difficult (see Arabic translations PR)
  • You can give a single language file to a native speaker and she can confirm the correctness without knowing too much about the countries.json file
  • Better support for lazy loading (e.g. two supported languages means loading two *.json files)

@marc-ed-raffalli @mledoze What do you guys think of this approach?

@marc-ed-raffalli
Copy link

@blumk

Better support for lazy loading (e.g. two supported languages means loading two *.json files)

Do you mean 3 files? one pure data (language agnostic), and two translation files

@blumk
Copy link
Contributor

blumk commented Jan 30, 2018

@marc-ed-raffalli

Do you mean 3 files? one pure data (language agnostic), and two translation files

Yes. Adding support for for two additional languages (countries.json defaults to English) means that you'll have to load countries.json and e.g. fra.json and slk.json in order to support ENG, FRA, SLK.

@marc-ed-raffalli
Copy link

@blumk
I like the idea of separate language files. I would even push the idea and have all languages into dedicated files (including ENG) since not all apps necessarily support English :)

Let's see what's the feedback from the other contributors

@mledoze
Copy link
Owner Author

mledoze commented Jan 30, 2018

@blumk

Looking at the current countries.json implementation I'd suggest refactoring translations into separate *.json files

I find this idea interesting, but what would be your proposal for the directories/files structure? Given your example, would the fra.json file be in a data/zaf folder?

@blumk
Copy link
Contributor

blumk commented Jan 30, 2018

@mledoze

I find this idea interesting, but what would be your proposal for the directories/files structure? Given your example, would the fra.json file be in a data/zaf folder?

I guess there are two options:

  • one translation file per language containing all 250 records: you can use a folder like data | translations | i18n containing all translation files
  • one translation file per language and record: in that case subfolders like data/zaf might be the way to go. However you end up having n*250 files

I might prefer one file per language containing all 250 records for the following reasons:

  • it use case driven: e.g. drop down list of all countries in French, a map with all capitals in Dutch.
  • one language, one location: you can give a single language file to a translator, you'll only need to load one file

E.g.

{
  "cca3": "ZAF",
  "name": {"official": "R\u00e9publique d'Afrique du Sud", "common": "Afrique du Sud"},
  "capital": ['Pretoria', 'Le Cap', 'Bloemfontein'],
  "demonym": "Sud-Africain"
},
{
  "cca3": "FRA",
  "name": ...
},
...

@ColinH
Copy link

ColinH commented Jun 15, 2018

My suggestion on "what to add next" would be NDD and IDD prefixes.

@mledoze
Copy link
Owner Author

mledoze commented Jul 23, 2018

My suggestion on "what to add next" would be NDD and IDD prefixes.

Thank your for your suggestion, I'll look into it.

@mledoze
Copy link
Owner Author

mledoze commented Jul 23, 2018

@blumk I would like to move forward with your proposal to refactor translations to separate files; could you please open a new issue with your proposal? Thank you very much.

@lupinitylabs
Copy link

lupinitylabs commented Mar 10, 2020

One thing I would really like to see added is the most commonly used native language / administrative language used in the country.

For example, Norway has three different, unweighted native names:
"native":{"nno":{"official":"Kongeriket Noreg","common":"Noreg"},"nob":{"official":"Kongeriket Norge","common":"Norge"},"smi":{"official":"Norgga gonagasriika","common":"Norgga"}}

And there is no way to tell which one of the three is the commonly used, for example in language or country choosers. In this example, Norge would obviously be the way to go, not Noreg or Norgga.

Can we either have a key for the official native name or a weighted/ranked list of languages in the country?

Also, there are altSpellings:
"altSpellings":["NO","Norge","Noreg","Kingdom of Norway","Kongeriket Norge","Kongeriket Noreg"]

But I don't feel comfortable trusting on the first items in the list to be the most common, also it's pretty much a mix of abbreviations and languages, which is generally not very helpful in this case. Also, Norgga from the native array above is missing in the altSpellings array, which is confusing?

Repository owner locked and limited conversation to collaborators Oct 2, 2023
@mledoze mledoze converted this issue into discussion #501 Oct 2, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests