Consistent Masking
It is a common requirement that all occurrences of an original value be masked with the same masked value.
Within DataVeil this is known as 'deterministic' masking. It is also sometimes known as 'consistent' masking.
There are several methods in which to achieve consistent masking within DataVeil depending on the required scope, such as whether consistency is required only within a column or across an entire database. Each of these methods is described below.
Duplicates Column Option
If you require masked values to be masked consistently only within a column then you can enable the Synchronize duplicates option for that column.
DataVeil shall then ensure that all duplicates of an original value shall receive the same masked value in that column.
This is particularly useful if you want to use non-deterministic masking mode and still have consistency within the column.
Note: If you use deterministic mode masking (see below) for a column then also enabling the Synchronize duplicates option will have no effect; in fact, it will introduce an unnecessary duplicates processing step that will slow down the masking for that column.
Deterministic Masking
DataVeil offers Deterministic and Non-Deterministic masking modes.
Deterministic mode in DataVeil is always repeatable. This means that a given original value shall always produce the same masked value on repeated executions. E.g. If you run the project today, tomorrow or next year then the masked values shall always be the same for the corresponding original values. This is true even if you run the project on different databases. The only caveat is that the same deterministic seed is used. If a different seed is used then an entirely different set of masked values shall be generated.
In general, using Deterministic mode masking is the easiest way to achieve consistent masking throughout a database and across multiple databases.
Non-Distinct
When using Deterministic mode, almost all DataVeil masks produce non-distinct values.
This means that multiple different original values can produce the same masked value. This is also sometimes referred to as collisions. For example, using Deterministic mode, it may be possible that "Smith" shall always mask to "Baker", but "Jones" could also mask to "Baker".
This is normal and not unusual even in the real world. e.g. It's common for people to have the same names.
It is very important to understand this non-distinct property because it is the cause of some misunderstandings and runtime masking errors such as unique constraint violations.
DataVeil shall generally emit a compile warning if a non-distinct mask is used on a column with a unique constraint. However, if the column is part of a multi-column unique constraint then a warning may not be generated because in such cases it may not be feasible to predict if duplicates in one of such columns would cause a unique constraint violation. Note: Performing a Complete Preview run will validate unique constraints whereas a Quick Preview will not.
Distinct
Masks that can produce distinct values (one-to-one mapping, no collisions) when used in Deterministic mode are:
•Bank Account Number (Belgium)
•National Identifiers
•Primary Account Number
•Randomize (choose Distinct option)
•Randomize Hex (choose Distinct option)
Dependencies
Dependencies are defined either by foreign keys in the DBMS or user-defined within a DataVeil project.
DataVeil will process user-defined dependencies exactly the same as it processes foreign keys. User-defined dependencies exist as a convenience for the user to define a dependency without having to actually alter the DBMS schema with a new foreign key definition.
Whenever a mask is defined for any column that is part of a dependency, DataVeil shall identify all dependent columns and apply consistent masking for them automatically.
Please review all the topics (not just the Overview) in the Dependencies chapter for details.