Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong capitalized letter in Portuguese names #72

Closed
kelvins opened this issue May 7, 2018 · 10 comments
Closed

Wrong capitalized letter in Portuguese names #72

kelvins opened this issue May 7, 2018 · 10 comments
Assignees
Milestone

Comments

@kelvins
Copy link
Contributor

kelvins commented May 7, 2018

First of all, congrats for the great project.

I have found a small issue related to Portuguese names. By running the following code:

from nameparser import HumanName
name = HumanName('joao da silva do amaral de souza')
name.capitalize()
str(name)

I get the following result:

'Joao da Silva Do Amaral de Souza'

when it should be:

'Joao da Silva do Amaral de Souza'

The d from do should be lowercase.

@derek73 derek73 added this to the v0.5.7 milestone Jun 14, 2018
@derek73
Copy link
Owner

derek73 commented Jun 14, 2018

Thanks for the ticket. I can add "do" to the prefixes which will fix the capitalization. What should the correct parsing of the name be? This is what it does currently. Is that correct?

$ python tests.py "Joao da Silva do Amaral de Souza"
<HumanName : [
	title: '' 
	first: 'Joao' 
	middle: 'da Silva do Amaral de' 
	last: 'Souza' 
	suffix: ''
	nickname: ''
]>

@derek73 derek73 self-assigned this Jun 14, 2018
@derek73
Copy link
Owner

derek73 commented Jun 16, 2018

If that's not the correct way to parse Portuguese names, just open another ticket and I'll fix it.

@derek73 derek73 closed this as completed Jun 16, 2018
@kelvins
Copy link
Contributor Author

kelvins commented Jun 16, 2018

Hello @derek73, sorry for the delay. Yes, it is correct. Thank you.

@derek73
Copy link
Owner

derek73 commented Aug 30, 2018

hey @kelvins, I'm working on fixing a different bug and my fix results in a new parsing of your example name where everything ends up in the last name:

$ python tests.py "Joao da Silva do Amaral de Souza"
<HumanName : [
	title: '' 
	first: 'Joao' 
	middle: '' 
	last: 'da Silva do Amaral de Souza' 
	suffix: ''
	nickname: ''
]>

My understanding of Portuguese last names is limited, but it seems like it could be more technically correct to include them all as last names since they appear to be joined by conjunctions. How do you feel about that? If it's obviously not correct to you, I'd like to know the reasons why it's not correct so I can figure out how to change my fix for #60.

@kelvins
Copy link
Contributor Author

kelvins commented Aug 31, 2018

Hello @derek73, to be honest, in Brazilian Portuguese we do not use (only) the last name too much. We usually use the entire surname, for example:

First name: João
Middle name: da Silva do Amaral
Last name: de Souza
"Entire Surname": da Silva do Amaral de Souza

Personally, I don't think it is correct to say da Silva do Amaral de Souza is the last name, but, as I have said, we do not often use only the last name, so I don't think it is a big problem.

If you're interested, here are some other examples of Brazilian names:

First Name Middle Name Last Name
Kelvin Salton do Prado
João da Silva Siqueira
Pedro Albuquerque da Rosa
Joana Silveira Pinto
Arlindo Luz da Conceição
Jonas Andriolli da Fonseca
Maria Pereira de Queiroz
Tereza Jesus de Oliveira
Ana Gonçalves dos Santos
Luiza Lima Ferreira
Carla Rodrigues Almeida
José do Nascimento Carvalho
Taís Araújo Ribeiro
Daiane dos Santos
Flávio Prado

I hope it helps.

@derek73
Copy link
Owner

derek73 commented Aug 31, 2018

Thanks, that's what I needed. The table is super helpful. Here's my interpretation, let me know if I got anything wrong.

do/da/de/dos/das/des are always connected to the piece that follows it, so we can always treat those as one name piece. Then, the attribute bucket that they belong to (first, middle, etc) is determined by the position of the pair (e.g. da + following_piece) in the name, just like if the pair was a single name piece. As long as those articles are connected to the thing that follows them, we should be able to parse out the position just like any other name if we treat them like one piece.

I think I can do that. I think it just means that middle names can have prefixes too and right now I only allow last names.

Also, I noticed I only have the plural of one of those articles right now. I don't have "des" and "das" as prefixes. Are those possible? I only see a "dos" in your example names.

@kelvins
Copy link
Contributor Author

kelvins commented Aug 31, 2018

Yes, that's exactly it. You can treat the pair (do/da/de/dos + following_piece) as a single name piece.

In Portuguese, I have never seen "des" or "das" as prefixes. If it actually exists, it should be very rare.

Here is a List of most common surnames in South America.

Note that are variations in surnames, for example:

João de Oliveira - João Oliveira
Flávio do Prado - Flávio Prado
Pedro da Silva - Pedro Silva

derek73 added a commit that referenced this issue Aug 31, 2018
while continuing to support multiple names after a prefix #23
@derek73
Copy link
Owner

derek73 commented Aug 31, 2018

I just released v1.0 which I think handles prefixes as I described. This is the output now for your test name:

$ python tests.py "Joao da Silva do Amaral de Souza"
<HumanName : [
	title: '' 
	first: 'Joao' 
	middle: 'da Silva do Amaral' 
	last: 'de Souza' 
	suffix: ''
	nickname: ''
]>

Probably best then to leave "des" and "das" out of the constants if they are rare, it will more likely cause errors than be the desired output.

Thanks again for your help on this. I like it when it works right. :) Let me know if I misinterpreted anything.

@kelvins
Copy link
Contributor Author

kelvins commented Sep 1, 2018

Perfect, thank you so much @derek73.

@fabiuz
Copy link

fabiuz commented Feb 3, 2019

In portugues, 'do' is a contraction of preposition 'de' plus the article 'o'.
'Do' before of masculine substantive.
'Da' before of feminine substantive.
'Dos', is plural of 'do'.
'Das' is plural of 'da'.
'De' is preposition.
'Des' not exists into portugueses.

'do' is a contraction of preposition 'de' plus the article 'o', but, never in portuguese you see:
[Incorret] O github de o Fábio.
But yes:
[Corret] O github do Fábio.

In portugueses, there not are this tradition of 'Title', 'first', 'middleName', 'lastname'.

I speak portuguese of Brazil.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants