You can specify that duplicate values in a column are always to be synchronized with the same masked value.
For example, suppose that the phone column contains 10 occurrences of "(415) 123-1234" and that you want all of these duplicate values to be masked with the same value.
If you select the "Synchronize duplicates" checkbox under the Duplicates tab then DataVeil shall ensure that all duplicates in the masked column shall be synchronized.
By default, duplicates in the masked column are defined simply as duplicates within the column being masked. However, you can define duplicates to be identified by the combined value of multiple columns. In this case you 'Add' each of the columns whose combined value is to be considered for identifying duplicates.
Note: Duplicates processing is not performed during a Quick Preview Run. If you would like to see a preview of masked values that takes into consideration of duplicate values then you should perform a Complete Preview Run.
Another Way To Synchronize Duplicates
The "Synchronize duplicates" column setting is usually not necessary if all masks defined for a column are using deterministic mode. This is because deterministic mode in DataVeil means that the same masked value will always be generated for the same original value.
The following are circumstances in which you should consider using the Duplicates setting if you wanted to ensure that duplicates are preserved with masked values:
1) Non-deterministic mode masks. Unless the Duplicates setting is used then all masked values are unpredictable.
2) User SQL Value mask. DataVeil has no control over what you define in a User SQL Value mask. Therefore, if your SQL code produces non-deterministic masked values then the Duplicates setting would be necessary if you wanted to preserve duplicates.
3) Shuffle mask. Using the Duplicates setting will ensure that all rows that originally have duplicate values will also have duplicate values after the column is shuffled but with a different value from the column. Note: Deterministic mode for Shuffle only guarantees that the rows will be shuffled into the same order every time for the exact same original table without any consideration of duplicate values (unless the Duplicates setting is also used).
In summary, if you are using deterministic mode masking (the default) and none of the circumstances above apply then you should avoid using the Duplicates setting because it introduces a redundant step into the masking process. The effect is harmless - it simply increases the masking execution time unnecessarily.
Please refer to The Deterministic Option for more details.
Ambiguous Masking of Duplicates
You may encounter a DataVeil masking error that states that an ambiguous masking of duplicates has been attempted.
Here is a simple example of one possibility of how this can occur.
Suppose you create a Number Sequence mask for Col2 (100, 101, 102, 103, 104).
You also create a User SQL Mask for Col1 to use the masked values from Col2 and you define that duplicates in Col1 are to be maintained.
If you run this project you will get an error similar to:
Ambiguous masking definition. Multiple masked values have been generated for a single original value in column "dvtest.dbo.AmbiguousDuplicates.Col1". This may be because a DataVeil Duplicates definition for this column is ambiguous or the mask in this column references another column that returns multiple non-unique values for a single value in column "dvtest.dbo.AmbiguousDuplicates.Col1"
This is because DataVeil identified two distinct values in Col1 (A and B) each which had a distinct corresponding value in Col2 (0 and 1 respectively).
When the User SQL Value mask attempts to use the masked values of Col2 (100, 101, 102, 103, 104) it becomes ambiguous as to which value to use for the distinct value 'A' (100, 101 or 102?) and which value to use for the distinct value 'B' (103 or 104?) given that the "Synchronize duplicates" setting was enabled on Col1.