Duplicates Option 

You can specify that duplicate values in a column are always to be synchronized with the same masked value.

For example, suppose that the phone column contains 10 occurrences of "(415) 123-1234" and that you want all of these duplicate values to be masked with the same value.

If you select the "Synchronize duplicates" checkbox under the Duplicates tab then DataVeil shall ensure that all duplicates in the masked column shall be synchronized.

By default, duplicates in the masked column are defined simply as duplicates within the column being masked. However, you can define duplicates to be identified by the combined value of multiple columns. In this case you 'Add' each of the columns whose combined value is to be considered for identifying duplicates.


Note: Duplicates processing is not performed during a Quick Preview Run. If you would like to see a preview of masked values that takes into consideration of duplicate values then you should perform a Complete Preview Run.

Another Way To Synchronize Duplicates

The "Synchronize duplicates" column setting is usually not necessary if all masks defined for a column are using deterministic mode. This is because deterministic mode in DataVeil means that the same masked value will always be generated for the same original value.

The following are circumstances in which you should consider using the Duplicates setting if you wanted to ensure that duplicates are preserved with masked values:

1) Non-deterministic mode masks. Unless the Duplicates setting is used then all masked values are unpredictable.

2) User SQL Value mask. DataVeil has no control over what you define in a User SQL Value mask. Therefore, if your SQL code produces non-deterministic masked values then the Duplicates setting would be necessary if you wanted to preserve duplicates.

3) Shuffle mask. Using the Duplicates setting will ensure that all rows that originally have duplicate values will also have duplicate values after the column is shuffled but with a different value from the column. Note: Deterministic mode for Shuffle only guarantees that the rows will be shuffled into the same order every time for the exact same original table without any consideration of duplicate values (unless the Duplicates setting is also used).

In summary, if you are using deterministic mode masking (the default) and none of the circumstances above apply then you should avoid using the Duplicates setting because it introduces a redundant step into the masking process. The effect is harmless - it simply increases the masking execution time unnecessarily.

Please refer to The Deterministic Option for more details.


Ambiguous Masking of Duplicates

You may encounter a DataVeil masking error that states that an ambiguous masking of duplicates has been attempted.

Here is a simple example of one possibility of how this can occur.

Suppose you create a Number Sequence mask for Col2 (100, 101, 102, 103, 104).

You also create a User SQL Mask for Col1, with duplicates synchronized, to use the masked values from Col2.


As you can see in the original values above, there is a 1-to-1 mapping of each distinct value in Col1 to Col2.
i.e. In Col1 the value 'A' always has the '0' in Col2.

If you run this project you will get an error similar to:

Ambiguous masking definition. Multiple masked values have been generated for a single original value in column "dvtest.dbo.AmbiguousDuplicates.Col1". This may be because a DataVeil Duplicates definition for this column is ambiguous or the mask in this column references another column that returns multiple non-unique values for a single value in column "dvtest.dbo.AmbiguousDuplicates.Col1"

This is because after running the Number Sequence mask on Col2, the distinct mapping is lost. Col2 will have masked values of (100, 101, 102, 103, 104).

Therefore, since Col1 has 'Synchronize duplicates' enabled and Col1's mask is to use the masked value of Col2, it is no longer possible to determine which masked value to use from Col2 for Col1.
E.g. Should all instances (duplicates) of 'A" be replaced with 100, 101 or 102? Similarly, should all instances of 'B' be replaced with 103 or 104?
Clearly, this ambiguity is not resolvable and so DataVeil reports the error.


For the sake of completeness, to fix this problem, you could instead define 'Synchronize duplicates' on Col2. This would cause the Number Sequence mask to instead generate only the values 100 (3 instances) and 101 (2 instances) as shown below. You could then also remove the 'Synchronize duplicates' setting from Col1 as this relationship would already be implicitly maintained by Col2. It is best to remove unnecessary 'Synchronize duplicates' settings as this will improve performance.