Search the site
Subscribe to Data Quality Pro

 via email            RSS Feed

external resources

Ask the expert forum > Trying to validate foreign given names, advice for good quality name lists required

Message received from registered member: 03-April-09:

I'm trying to validate given names out of the contact information of my customer. All the addresses are located in Germany, but contains german and foreign given names.

For the validation I used a list of given names of my vendor, but the list is too incomplete (missing foreign given names) - so I can' t use it any longer. I'm searching for a better list from another vendor or another way to do it.

Does anybody has experience with good first name lists? Any other suggestions how to validate given names?

It does boil down to what the questionner means with "validate". He is doubtless aware that the naming laws in Germany are strict: for given names, German law states that they must be “by nature” given names (i.e. may not include family names, common nouns and so on); that they must be gender-specific (or, if the first given name is not gender-specific, a second, gender-specific, name must be given); and that the given name may not have the potential to cause harm to the namee (e.g. Mickey Mouse, Kain, Osame bin Ladin).

For non-Germans in Germany these laws don't apply, and there are very few countries which have similar rules. Thus, in Germany for Germans it is fairly simple to check names for gender and (to come degree) for spelling, and to parse a given name from a surname/family name. This cannot be done for foreign names: French male Jean or British female Jean? Gene or Jean or Gean? Cliff Richard or Richard Cliff? And so on.

I do collect given names: http://www.grcdi.nl/givennames.htm - but the degree to which such lists can be used to process names is limited.

Apr 3, 2009 | Registered CommenterGraham Rhind

In some ways Mr. Rhind is right and in some ways he is not. Only very few countries have similar rules that apply in Germany and according to the German naming laws a given name must not harm the namee, indeed.
On the other hand German naming laws have become much more liberal in the last years. One reason is that naming conventions are based less on laws and paragraphs than on separate interpretation by a judge. In December 2008 the “Bundesverfassungsgericht” pronounced the judgement that a given name does not need to be gender-specific and that no second (gender-specific) name needs to be given. The name in question was “Kiran” which is known in India as male and female given name. According to a judgement of the “Bundesgerichtshof” in April 2008 even family names are no longer prohibited as given names.
The most important thing is growing ambiguity - caused by judgements like the ones mentioned above and by migration of people and data as well. Migration of data means that databases are becoming more and more international, including records with given names from all over the world. That means that just a single given name can’t be validated without its context: In some respects it is only the context that constitutes the meaning of a given name and it needs knowledge and interpretation to draw the right conclusion.
At Human Inference (http://www.humaninference.com) we have more than 20 years of experience on this field so that we know that validation of given names is quite a tricky thing.
We are anxious for the first data sets with “Metallica Carlsson” and “Budweiser Svensson” because since 2008 names of music bands or trade marks can be used as given names. This decision was taken by the Swedish financial authorities that are responsible for registration offices in Sweden and thus for the approval of given names as well.
Michael Grigat

Apr 17, 2009 | Registered CommenterM Grigat

Metallica is already here, as the Swedish statistical bureau provides a list of frequency by gender of all given names, and Metallica is counted at 2 females on the one I got last summer.

Actually the Swedes are very liberal with the citizen registration and the access to these data, which means that private entities may integrate with the citizen register and obtain a full citizen ID and other data considered very personal in most countries.

But in general this is surely about probabilities and the possibilities vary between countries. In my country being Denmark the statistical bureau also provides lists with counts of given name by gender and surnames.

So I know about “Henrik” that this is counted at 43.989 males, 0 females and as 1 surname. There is a pretty good probability with Danish data that I am a male using “Henrik” as my given name written as the first word in my full name.

But about “Kim” there are 30.941 males, 316 females and 115 surnames in Denmark. There is a fair probability that “Kim” is a male given name in Denmark – but the picture will be different in Anglophone countries with “Kim” as a female nickname – and different again in Korea with “Kim” as a very frequent surname written as the first word in a full name.