Person Given Name Mask

This mask shall generate given names, such as "Anne", "John", "William", etc.

The longest given name is 11 characters. If a receiving column width is shorter than a generated given name then the generated name shall be truncated.

Every generated name consists of only alphabetic characters. In other words, there are no hyphens or apostrophes.


Select from

At least one of Male or Female checkboxes must be selected.

If male Common names only checkbox is selected then a male name shall be selected from only the most popular 329 male names. Otherwise the name shall be chosen from a total of 800 names that may result in many exotic and unfamiliar names being selected.

If female Common names only checkbox is selected then a female name shall be selected from only the most popular 355 female names. Otherwise the name shall be chosen from a total of 2000 names that may result in many exotic and unfamiliar names being selected.

The Weighting fields represent a probability ratio to be used whenever the given name gender must be guessed. This will occur when either:

a) 'Attempt to match original gender' option is not selected; or..

b) 'Attempt to match original gender' option is selected using the 'Guess' parameter AND the actual given name's gender is unisex or unknown.

For example, if the Male Weighting is 1 and the Female Weighting is 2 then the Male:Female ratio shall be 1:2. This is an approximate probability of the gender of the given name that shall be returned by the mask whenever an original name's gender must be guessed.

The valid range of each Weighting value is from 1 to 1,000,000.

Some other weighting examples:

2:5  (2 male names for every 5 female names)

100:96 (100 male names for every 96 female names)

1:1000000 (1 male name for every 1,000,000 female names)

Please note that these weightings are probabilities (approximations) and do not guarantee that the exact ratio of name genders will be generated. 


Case of new names

Specifies the case of the masked values to be generated.

For example, "James" is in Title Case, "JAMES" is in Upper Case and "james" is in Lower Case.

Note:  The case of the original name is not relevant for the selection of the masked name. For example, if the original name is "jane" and the masked value is "Sally", then the masked value for "Jane" and "JANE" shall also be "Sally" (assuming that Deterministic mode masking is used.)


Attempt to match original gender

If this option is selected then DataVeil shall first lookup every name in an attempt to identify its gender and to then generate a masked given name value of the same gender. This is generally reliable, particularly for given names in the United States; however, it is not perfect because a great many given names are used by both genders. Please refer to 'Unisex given names" below.

Any non-alphabetic characters in a name shall be ignored for the purposes of the lookup.

If this option is not selected and both Male and Female names are specified in the 'Select from' panel then the probability of the masked given name gender shall be determined by the Weighting. This applies to both deterministic and random modes.

Note: As mentioned, this lookup method is generally reliable; however, there will be instances where the gender of a name is ambiguous and an incorrect gender name will be returned. If this is not a critical issue then you may simply use this 'Attempt to match original gender' option. If, however, you want to match gendered names exactly to the original names then the ideal method is to define two 'Person Given Name' masks - one masking only male names and the other masking only female names. The first mask should define a Where condition specifying how to positively identify one gender. For instance, this may be testing another column which may be called something like 'gender_columnname' or 'sex_columnname'. Eg. "WHERE gender_columnname = 'M'". The second 'Person Given Name' mask should have no Where condition specified to ensure that all remaining rows are masked using the other gender.


Data source information

The first name value are principally based on information returned from the United States census of the year 2000. DataVeil has categorized these names into male, female and unisex. Please refer to the subsection "Unisex first names" below.

The DataVeil Person First Name Mask returns:

800 of the most popular Male first names. The longest name is 11 characters.

2,000 of the most popular Female first names. The longest name is 11 characters.


Unisex given names

Strictly speaking, there are a great many names that are unisex in the sense that a name is used by both men and women and are recorded as such in a census.

However, using such a simple interpretation would classify an unrealistically high number of unisex names. For example, in the United States census of 2010, 82 of the 100 most common male given names (James, John, Robert, etc) are also recorded as female given names.

Therefore DataVeil categorizes a name as belonging to a gender if the frequency of use by a gender is at least double of the other gender.


Null and Empty Value Handling

Deterministic mode: A null or empty string is preserved.

Non-Deterministic mode: A null or empty string is overwritten with a random given name value.


Pre/Post SQL

The format of names can be adjusted if required. Please refer to the Pre/Post SQL Option.