The Determinism Option
You can execute masks in a deterministic or non-deterministic manner.
In DataVeil, 'deterministic' mode means that a given input value shall always be masked to the same output value. This is true regardless of when the project is executed.
For example, if "Smith" happens to be masked as "Jones" in deterministic mode, then every occurrence of "Smith" shall be masked as "Jones". This is true even if you perform separate masking runs (today, tomorrow, next year, etc.), "Smith" would always be masked as "Jones".
This is sometimes also referred to as 'consistent' masking.
The default Determinism for each mask is 'Use Project Determinism setting'.
Use Project Determinism setting
This setting means that the mask shall use the global Determinism value set under the project Settings tab.
If the Deterministic setting is chosen then a seed value value is used as a basis from which to generate masked values consistently.
There are two options for how a deterministic seed is to be used, Fixed and Automitically generated, s described below:
A fixed seed is specified by the user. Use this when you need consistent masking results from one execution of a masking project to the next.
For example, testers in the QA department may have become accustomed to seeing specific masked values for specific records in the masked database. If a fixed deterministic seed is used then each time a refreshed database is masked then the same familiar masked values shall be generated.
Another example is that there may be multiple databases that are masked at different times and it is required to mask consistently across all the masked databases. i.e. If "John" is masked to "William" in one database, then "John" must be masked to "William" in every other database too.
The fixed seed value will enable the same set of masked values to be generated for each instance of a masked on each masking run provided that the mask has the same configuration parameters and that the fixed seed value is the same.
To specify a fixed seed value, click on the button.
The seed value can consist of any printable characters. Characters are case-sensitive. Spaces are significant except that leading and trailing spaces are trimmed.
After clicking OK the displayed value in the Seed field is shown as asterisks to indicate that a value is present. The actual seed value is not displayed because seeds are considered sensitive.
Seed values are stored in the project file encrypted using the Project Key that you shall also need to specify. If you later open the project file and click on the to view or edit the seed value then you shall first be prompted for the Project Key, if it hasn't been entered already, to unlock the project file for access to sensitive fields.
A new deterministic seed is automatically generated by DataVeil at the start of each masking run. The seed value is never revealed and will be destroyed immediately at the end of the masking run. The generated seed value is at least 64 characters consisting of upper and lower case characters, digits and special symbols.
The purpose of this setting to enable the consistent masking of values within the currently executing masking project without revealing the seed value to anyone to prevent the possibility of someone using a known seed value in an attempt to reverse the masking using techniques such as a dictionary attack. For example, an attacker could create a table with two columns where each row contains every distinct person surname. According to the US Census Bureau there are less than 200,000 distinct surnames in the USA so this is a relatively small list for data processing. If the masking seed becomes known to an attacker, they could easily create the same mask using the known seed and mask one of the columns. They would now have an index of original and masked values. Therefore, although deterministic masking can be very useful, using a fixed seed value must be kept secure. An automatic deterministic seed is therefore implicitly secure and should therefore be favored over a fixed seed. The only potential disadvantage of an automatic seed is that the set of consistent masked values shall be different each time the masking project is executed.
Ignore case of original string values for calculating determinism
If this option is selected then the same deterministic masked value shall be generated for an original value regardless of its case. E.g. If the masked value for "ABC" is "XYZ" then the masked value shall be the same for "abc", "Abc", "abC", etc.
If this option is not selected then the masked value shall vary according to the case of the original value. E.g. If the masked value for "ABC" is "XYZ" then the masked shall for "abc" could be "PAR", for "Abc" it could be "HSI", for "abC" it could be "ERA", etc.
This option may not appear for all masks. E.g. For the Redact mask neither the 'Seed' nor 'Ignore case..' fields appear because redaction does not rely on either setting.
The 'Not Deterministic' setting means that the generated masked values will be different each time the masking project is run.
Furthermore, some masks may generate different masked values for the same original value during the same masking run. E.g. The masked value for the first occurrence of "John" could be "Frank" but the second occurrence of "John" could be masked with "Tim".