What data to add next? #6

mledoze · 2013-10-04T23:47:53Z

I would like to discuss here the data that should be added to this repository.

A similar project like 0xJS [1] contains a lot more data such as the land area or the latitude/longitude coordinates of each country.

Is it interesting/useful to have this kind of data too?

Data that can be added:

land (land mass in square kilometers [3])
latitude (latitude coordinate of the capital [2])
longitude (longitude coordinate of the capital [2])
east (longitude of the country's eastern boundary [3])
north (latitude of the country's northern boundary [3])
south (latitude of the country's southern boundary [3])

What would you like to be added?

Please let me know in the comments.

[1] http://oxjs.org/#doc/Ox.COUNTRIES
[2] source: http://opengeocode.org/
[3] source: https://oxjs.org/#doc/Ox.COUNTRIES

From the comments

add the type of the country (country, sovereign state, public body, territory, etc.)
add the land borders (done, see https://github.com/mledoze/countries/tree/v1.3)
add regions, provinces and cities

scento · 2013-10-22T17:50:25Z

It might be useful to provide the country name in the native language of the country itself (e.g. {"name": "Germany", "name_native": "Deutschland"}...

scento · 2013-10-22T18:41:06Z

The CLDR database of the unicode project contains Country-To-Language data, including the percent of speakers: http://www.unicode.org/cldr/charts/latest/supplemental/territory_language_information.html

mledoze · 2013-10-23T09:24:56Z

It might be useful to provide the country name in the native language of the country itself

The native name of Germany is already in 'alt-spellings'. I recognize that the name 'alt-spellings' isn't good since it contains alternative spellings and the native name of the country. So there are two solutions here:

either we change 'alt-spellings' to 'alt-names' and keep the native name here
or we keep 'alt-spellings' just for alternative spellings and create 'name-native' as you suggested.

Initially, I created this dataset with a country selector in mind [1] but it would make more sense to be able to get the native names separately. So I would choose the second option.

But the second option raises the question of how to write the native name of the country. German uses latin characters so it's easy to know that it's Germany, but what about Armenia for example which is written Հայաստան in armenian [2]? For some people it might be difficult to know that it's Armenia.

What do you think?

I know that alternative spellings and native names are missing for many countries, I'm currently working on adding them. Also, I'll add the native/official language(s) of each country.

[1] https://github.com/JamieAppleseed/selectToAutocomplete
[2] http://en.wikipedia.org/wiki/Armenia

scento · 2013-10-23T17:04:28Z

Not all people speak English, so they might be confused while selecting their locale. It might be useful if it is possible to see the English and native version of the country name parallel in the selector.

I would recommend to provide both versions for different individual usecases.

mledoze · 2013-10-24T14:01:13Z

Right, it's valid for non english speakers.

If you want, feel free to start working on adding the native names as I'll be off for a few days.

stephenpaulger · 2013-10-31T13:37:17Z

I think it would be great to have a way to make Countries Hierarchical and have meta data describing whether they are countries or sovereign states.

For the UK currently it says "alt-spellings":"GB,Great Britain,England,UK,Wales,Scotland,Northern Ireland".

The full name of the UK is "The United Kingdom of Great Britain and Northern Ireland". It is not a country, it is a sovereign state.

Great Britain also isn't a country, it's an island.

There are three countries in Great Britain: England, Scotland and Wales.

So the types I think needed are: Country, State, Sovereign State and potentially Nation and Union as well.

Then it would be good to have a way to specify that England is within the UK and if you also have unions that it is within the EU.

Another nice feature would be to list what land borders a country has. So you could specify that England borders Scotland and Wales for example.

fayderflorez · 2013-11-02T11:21:48Z

From https://github.com/ProGNOMmers

It would be wonderful if it would be possible to retrieve regions, provinces and cities.

Something like:

// Regions of country
// /rest/alpha2/it/regions ->
{ regions:  [ "Abruzzi e Molise",
              "Basilicata",
              "Calabria",
              "Campania",
              "Emilia-Romagna",
              "Friuli-Venezia Giulia",
              "Lazio",
              "Liguria",
              "Lombardia",
              "Marche",
              "Piemonte",
              "Puglia",
              "Sardegna",
              "Sicilia",
              "Toscana",
              "Trentino-Alto Adige",
              "Umbria",
              "Valle d'Aosta",
              "Veneto" ] }

// Provinces of region
// /rest/alpha2/it/regions/Veneto/provinces ->
{ provinces: [ "Verona", "Venezia", ... ] }

// Cities of province
// /rest/alpha2/it/regions/Veneto/provinces/Venezia/cities ->
{ cities: [ { name: "Venezia", zip_codes: [ "30121", ... , "30176" ] }, 
            { name: "Chioggia", zip_codes: [ "30015" ] },
            { name: "San Donà di Piave", zip_codes: [ "30027" ] }, 
            ... ] }

// Cities of country by name
// /rest/alpha2/it/regions/Veneto/provinces/Venezia/cities ->
{ cities: [ { name: "Venezia", zip_codes: [ "30121", ... , "30176" ] }, 
            { name: "Chioggia", zip_codes: [ "30015" ] },
            { name: "San Donà di Piave", zip_codes: [ "30027" ] }, 
            ... ] }

Cities could have metadata like f.i. zip codes, which are very useful.

It is a huge work because recording and maintaining the whole list of regions, provinces and cities for every world country is hard, but it is a good target to be accomplished by an open source project.

mledoze · 2013-11-04T09:43:43Z

@stephenpaulger

I think it would be great to have a way to make Countries Hierarchical and have meta data describing whether they are countries or sovereign states.

I agree, I'll add this to the todo. I know that many entries in the dataset are not actual contries. I wanted to provide simple and factual data about world countries but I understand that more accuracy is needed.

mledoze · 2013-11-04T09:50:43Z

@fayder

It would be wonderful if it would be possible to retrieve regions, provinces and cities.

Yes it is a huge work. First I want to continue to add more data at the country level (native and official names, official language, etc.) and add the master file as soon as possible (#12) to ease the contributions.

Thank you for your help/feedback, I appreciate it!

add country official language(s) in English add alt spellings: official country name in english and in its official language(s) add region and subregion for Bonaire, Sint Maarten and South Sudan add capital for British Indian Ocean Territory, Micronesia, Réunion, South Georgia, Virgin Islands (British) and Virgin Islands (U.S.) add currency for Palestinian Territory rename Brunei Darussalam to Brunei rename Falkland Islands (Malvinas) to Falkland Islands rename French Southern Territories to French Southern and Antarctic Lands rename Myanmar (Burma) to Myanmar (added Burma in alt-spellings) rename Palestinian Territory to Palestine rename Pitcairn to Pitcairn Islands rename Russian Federation to Russia rename Syrian Arab Republic to Syria rename Virgin Islands (British) to British Virgin Islands rename Virgin Islands (U.S.) to United States Virgin Islands fix ccn3 padding fix subregion for Brunei Darussalam, Cambodia, Indonesia, Laos, Malaysia, Myanmar, Philippines, Singapore, Thailand, Timor-Leste and Vietnam fix TLD for Bonaire, Heard and McDonald Islands, Kazakhstan and Saint Martin fix capital for Moldova fix alt-spellings for United Kingdom (#6 (comment)) update README

mledoze · 2013-11-16T16:54:10Z

For the UK currently it says "alt-spellings":"GB,Great Britain,England,UK,Wales,Scotland,Northern Ireland".

@stephenpaulger in bd22b4a I have removed most of the names in altSpellings, now it's just GB,UK,Great Britain.

mledoze · 2013-11-17T11:28:53Z

We can also add time zone data from http://timezonedb.com/download.

shanti2530 · 2014-02-13T15:08:25Z

It would be really nice if there would be also a list of states per country such as the United States states. http://en.wikipedia.org/wiki/List_of_states_and_territories_of_the_United_States

mledoze · 2014-02-13T15:26:28Z

@shanti2530 yes, this has been suggested #6 (comment) but it has not been done yet because the work is pretty huge.
Do you know a source where we can find the states for every country?

shanti2530 · 2014-02-13T15:50:30Z

@mledoze don't know if this is what you were looking for http://vikku.info/programming/geodata/geonames-get-country-state-city-hierarchy.htm

mledoze · 2014-02-13T15:59:12Z

@shanti2530 this seems very good, thank you. I'll create an issue for this. Would you like to work on this?

gerbenjacobs · 2014-03-20T13:37:56Z

GeoJSON outlines of the countries: https://github.com/datasets/geo-boundaries-world-110m

mledoze · 2014-03-20T14:29:23Z

@gerbenjacobs yes good idea, I'll add this to the to-do

oriolfg · 2014-04-01T14:59:08Z

I agree for the gerbenjacobs idea of GeoJSON outlines of the countries

matiassingers · 2014-04-28T07:46:52Z

@mledoze don't know if it's in the scope of this project, but I would love to see financial information like GDP, GDP per capita, GNI etc. - problem with this is of course that these numbers would change every year.

mledoze · 2014-04-28T10:13:26Z

@matiassingers no it's not really in the scope of this project. I prefer to stick with static data that do not change. The dataset currently contains population data which are not in the scope and I would like to remove it in the near future.

Although it does not currently contains GDP data, you should check this project https://github.com/tinata/tinatapi which contains other financial data.

mledoze · 2014-05-02T08:13:23Z

@dalu the postal prefixes is a good idea!

mledoze · 2014-05-02T13:43:15Z

@dalu you are saying that postal services want the native country name instead of the country postal prefix?

mledoze · 2014-05-06T16:07:39Z

I would like to inform you that I am about to remove population data because they require frequent updates to stay relevant.

I recently added CONTRIBUTING explaining the contributions rules of this project. Population data do not follow these instructions.

fayderflorez · 2014-05-07T12:26:47Z

acknowledged

@gerbenjacobs

source http://thematicmapping.org/downloads/world_borders.php you can test these files here: http://geojsonlint.com/ original idea from @gerbenjacobs #6 (comment)

tdegrunt · 2014-05-27T19:41:25Z

How about the address format, from the page mentioned above: http://en.wikipedia.org/wiki/Address_(geography)

This may be fairly difficult to do as it requires some pseudo templating language, so say for US:
"addressFormat": "{{name}}\n{{houseNumber}} {{street}}\n{{locality}}\n{{city}}\n{{postalCode}}"

And would need agreement on the labels used...

yackermann · 2016-06-21T13:54:12Z

@maquejp Countries project ideology is KISS. Adding cities would majorly increase the size of the project, and would not serve according value. So I personally vote not on that idea.

About SQL DUMP: I don't see much of an issue to write a script that loads json into DB. There are far more then enough solutions online/stackoverflow. As addition it is a major security risk, to just load random sql dumps into databases. So I vote no on that one as well.

cmacdonald · 2016-08-10T19:28:30Z

What about the past? In using countries.json to analyse past pieces of text, I found "soviet" was not identified. Would adding a historical extension of the data with applicable dates be achievable in the long term?

yackermann · 2016-08-10T19:53:13Z

@cmacdonald Countries project goal is to provide basic information about current de-jure political state of the world. We provide basic, static information.

History will never be part of this project as it is contradicts the goal of this project to provide basic and static information, and due to single case use limitation.

mledoze · 2016-08-31T16:07:23Z

Hello @cmacdonald, I'm sorry but historical data are out of scope for this project.

8alery · 2016-09-01T09:37:21Z

Hello! I think adding emoji for country flag will be nice. You can find emoji of flags here: http://emojiflags.com/

dominicbartl · 2017-04-28T14:35:54Z

If someone is searching for a country list with info if the country is in the EU: Here's a short script to append the info to the country list.

https://gist.github.com/Bartinger/f0507c786bad45cc942de471b1427e48

mledoze · 2017-05-16T14:14:25Z

@Bartinger thank you for that! :)

pfwd · 2017-05-31T17:44:20Z

Hi Thanks for a great project.
Are there any plans to add ISO 639-1 codes? I'm looking for a way to adjust HTML language codes.
If the language section included the ISO 639-1 then it would be perfect for my use case.
I see that https://restcountries.eu has this support and its listed in the showcase.

fayderflorez · 2017-06-01T06:51:59Z

Hi @pfwd I started restcountries.eu based on @mledoze's great data some years ago but then I started to maintain my own version. That's why they are not much similar anymore.

marc-ed-raffalli · 2017-12-05T20:59:29Z

Hi @mledoze, how accurate should be the country name translations? Should it be 100% or it would be ok from an online translator.

mledoze · 2017-12-06T09:15:46Z

Hi @mledoze, how accurate should be the country name translations? Should it be 100% or it would be ok from an online translator.

Hello @marc-ed-raffalli, I am sorry but I don't understand your question. The translations currently available in this project were gathered from various sources (mainly from Wikipedia). I hope that they are correct but I cannot guarantee that they are 100% correct.

marc-ed-raffalli · 2017-12-06T09:50:26Z

Hello @mledoze, I wanted to know if the translation has to be exact or is it ok to have translation from online translator like Google Translate (which may produce inaccurate output). Then, if it is ok, what would be the list of targeted languages?

mledoze · 2017-12-06T09:53:58Z

Well in that case, the translations have to be exact :)

There is no list of targeted languages, any language can be added to the list of translations.

marc-ed-raffalli · 2017-12-06T10:02:53Z

Good to know, I'll look for better translation sources :)

marc-ed-raffalli · 2018-01-24T10:55:13Z

Hi @mledoze
I found translations for a good number country names and capitals e.g. London => Londres.
In #263 (comment) you said planning for 2.0. I would propose to change capital to a multi dimensional array too, will send a PR once ok.

capital:{
  en: 'London',
  fr: 'Londres'
}

mledoze · 2018-01-28T14:48:23Z

I would propose to change capital to a multi dimensional array

@marc-ed-raffalli this change was already made by @blumk in d9e81cc. But translations for countries capital(s) are welcome :)

marc-ed-raffalli · 2018-01-28T15:58:03Z

@mledoze The change allows to support multiple capitals as an array of strings, does not allow a mapping to multiple languages. I would propose something like:

capital: {
  en: ['Pretoria', 'Cape Town', 'Bloemfontein'],
  fr: ['Pretoria', 'Le Cap', 'Bloemfontein'],
  ru: ['Претория', 'Кейптаун', 'Блумфонтейн'],
  zh: ['比勒陀利亚', '开普敦', '布隆方丹']
}

edited with better example

blumk · 2018-01-29T22:01:47Z

Looking at the current countries.json implementation I'd suggest refactoring translations into separate *.json files, e.g. fra.json:

{
  "cca3": "ZAF",
  "name": {"official": "R\u00e9publique d'Afrique du Sud", "common": "Afrique du Sud"},
  "capital": ['Pretoria', 'Le Cap', 'Bloemfontein'],
  "demonym": "Sud-Africain"
},
...

Creating separate language files brings the following benefits:

Current implementation has a "translations" property. Adding capitals would lead to the following confusion: translation of what? (Name, capital?). Dedicated files solves this problem (see example above)
Languages shouldn't pollute the main countries.json file since there are 6000+ languages. Adding a new language is difficult (see Arabic translations PR)
You can give a single language file to a native speaker and she can confirm the correctness without knowing too much about the countries.json file
Better support for lazy loading (e.g. two supported languages means loading two *.json files)

@marc-ed-raffalli @mledoze What do you guys think of this approach?

marc-ed-raffalli · 2018-01-30T10:19:59Z

@blumk

Better support for lazy loading (e.g. two supported languages means loading two *.json files)

Do you mean 3 files? one pure data (language agnostic), and two translation files

blumk · 2018-01-30T10:25:46Z

@marc-ed-raffalli

Do you mean 3 files? one pure data (language agnostic), and two translation files

Yes. Adding support for for two additional languages (countries.json defaults to English) means that you'll have to load countries.json and e.g. fra.json and slk.json in order to support ENG, FRA, SLK.

marc-ed-raffalli · 2018-01-30T13:56:10Z

@blumk
I like the idea of separate language files. I would even push the idea and have all languages into dedicated files (including ENG) since not all apps necessarily support English :)

Let's see what's the feedback from the other contributors

mledoze · 2018-01-30T22:01:06Z

@blumk

Looking at the current countries.json implementation I'd suggest refactoring translations into separate *.json files

I find this idea interesting, but what would be your proposal for the directories/files structure? Given your example, would the fra.json file be in a data/zaf folder?

blumk · 2018-01-30T22:15:32Z

@mledoze

I find this idea interesting, but what would be your proposal for the directories/files structure? Given your example, would the fra.json file be in a data/zaf folder?

I guess there are two options:

one translation file per language containing all 250 records: you can use a folder like data | translations | i18n containing all translation files
one translation file per language and record: in that case subfolders like data/zaf might be the way to go. However you end up having n*250 files

I might prefer one file per language containing all 250 records for the following reasons:

it use case driven: e.g. drop down list of all countries in French, a map with all capitals in Dutch.
one language, one location: you can give a single language file to a translator, you'll only need to load one file

E.g.

{
  "cca3": "ZAF",
  "name": {"official": "R\u00e9publique d'Afrique du Sud", "common": "Afrique du Sud"},
  "capital": ['Pretoria', 'Le Cap', 'Bloemfontein'],
  "demonym": "Sud-Africain"
},
{
  "cca3": "FRA",
  "name": ...
},
...

ColinH · 2018-06-15T11:05:55Z

My suggestion on "what to add next" would be NDD and IDD prefixes.

mledoze · 2018-07-23T11:37:02Z

My suggestion on "what to add next" would be NDD and IDD prefixes.

Thank your for your suggestion, I'll look into it.

mledoze · 2018-07-23T11:37:59Z

@blumk I would like to move forward with your proposal to refactor translations to separate files; could you please open a new issue with your proposal? Thank you very much.

lupinitylabs · 2020-03-10T14:25:50Z

One thing I would really like to see added is the most commonly used native language / administrative language used in the country.

For example, Norway has three different, unweighted native names:
"native":{"nno":{"official":"Kongeriket Noreg","common":"Noreg"},"nob":{"official":"Kongeriket Norge","common":"Norge"},"smi":{"official":"Norgga gonagasriika","common":"Norgga"}}

And there is no way to tell which one of the three is the commonly used, for example in language or country choosers. In this example, Norge would obviously be the way to go, not Noreg or Norgga.

Can we either have a key for the official native name or a weighted/ranked list of languages in the country?

Also, there are altSpellings:
"altSpellings":["NO","Norge","Noreg","Kingdom of Norway","Kongeriket Norge","Kongeriket Noreg"]

But I don't feel comfortable trusting on the first items in the list to be the most common, also it's pretty much a mix of abbreviations and languages, which is generally not very helpful in this case. Also, Norgga from the native array above is missing in the altSpellings array, which is confusing?

mledoze mentioned this issue Nov 11, 2013

Åland Islands is not a country #22

Closed

mledoze mentioned this issue Feb 13, 2014

Add the country states/regions #40

Closed

mledoze added a commit that referenced this issue May 7, 2014

remove population data (see #6 (comment))

5aef9ee

mledoze added a commit that referenced this issue May 9, 2014

add countries GeoJSON outline

ab6e0ce

source http://thematicmapping.org/downloads/world_borders.php you can test these files here: http://geojsonlint.com/ original idea from @gerbenjacobs #6 (comment)

rubengmurray mentioned this issue Sep 29, 2019

Adding GB Subdivisions Per ISO 3166-2:GB #357

Open

Repository owner locked and limited conversation to collaborators Oct 2, 2023

mledoze converted this issue into discussion #501 Oct 2, 2023

This issue was moved to a discussion.

What data to add next? #6

What data to add next? #6

Comments

mledoze commented Oct 4, 2013

scento commented Oct 22, 2013

scento commented Oct 22, 2013

mledoze commented Oct 23, 2013

scento commented Oct 23, 2013

mledoze commented Oct 24, 2013

stephenpaulger commented Oct 31, 2013

fayderflorez commented Nov 2, 2013

mledoze commented Nov 4, 2013

mledoze commented Nov 4, 2013

mledoze commented Nov 16, 2013

mledoze commented Nov 17, 2013

shanti2530 commented Feb 13, 2014

mledoze commented Feb 13, 2014

shanti2530 commented Feb 13, 2014

mledoze commented Feb 13, 2014

gerbenjacobs commented Mar 20, 2014

mledoze commented Mar 20, 2014

oriolfg commented Apr 1, 2014

matiassingers commented Apr 28, 2014

mledoze commented Apr 28, 2014

mledoze commented May 2, 2014

mledoze commented May 2, 2014

mledoze commented May 6, 2014

fayderflorez commented May 7, 2014

tdegrunt commented May 27, 2014

yackermann commented Jun 21, 2016

cmacdonald commented Aug 10, 2016

yackermann commented Aug 10, 2016

mledoze commented Aug 31, 2016

8alery commented Sep 1, 2016

dominicbartl commented Apr 28, 2017

mledoze commented May 16, 2017

pfwd commented May 31, 2017

fayderflorez commented Jun 1, 2017

marc-ed-raffalli commented Dec 5, 2017

mledoze commented Dec 6, 2017

marc-ed-raffalli commented Dec 6, 2017

mledoze commented Dec 6, 2017

marc-ed-raffalli commented Dec 6, 2017

marc-ed-raffalli commented Jan 24, 2018 • edited Loading

mledoze commented Jan 28, 2018

marc-ed-raffalli commented Jan 28, 2018 • edited Loading

blumk commented Jan 29, 2018 • edited Loading

marc-ed-raffalli commented Jan 30, 2018

blumk commented Jan 30, 2018

marc-ed-raffalli commented Jan 30, 2018

mledoze commented Jan 30, 2018 • edited Loading

blumk commented Jan 30, 2018 • edited Loading

ColinH commented Jun 15, 2018

mledoze commented Jul 23, 2018

mledoze commented Jul 23, 2018

lupinitylabs commented Mar 10, 2020 • edited Loading

This issue was moved to a discussion.

marc-ed-raffalli commented Jan 24, 2018 •

edited

Loading

marc-ed-raffalli commented Jan 28, 2018 •

edited

Loading

blumk commented Jan 29, 2018 •

edited

Loading

mledoze commented Jan 30, 2018 •

edited

Loading

blumk commented Jan 30, 2018 •

edited

Loading

lupinitylabs commented Mar 10, 2020 •

edited

Loading