The Shuffle Mask

 

The Shuffle mask randomly shuffles the rows within one or more columns.

You can specify a range of rows to be shuffled that satisfy a condition using a Where condition.

You can also shuffle rows within partitions that you define according to distinct values in one or more columns.

When you create a Shuffle mask, all of the Shuffle parameters are optional. The default is that every row in the column shall be shuffled to a randomly selected row within the column.

Example:

Consider the original data:

If you want to simply shuffle all the rows in the FamilyName column then all that is needed is a Shuffle mask with its default parameters. Such a Shuffle mask is shown below followed by a sample before and after masking values. 

Please note: Person names are being used here as an example to demonstrate the Shuffle mask. The Shuffle mask may suit your requirements for masking person names but please be aware that DataVeil also provides masks specifically designed to mask person names, such as Person First Name, Person Last Name and Person Full Name.

 Shuffle data mask

 

 

Using the Partition option

You may have noticed that the original data has a column called 'Gender'. What if we wanted to shuffle the names in the FirstName column only among rows that had the same value in the Gender column? i.e. Male names shuffled only to other rows containing male names and female names shuffled only to other rows containing female names.

This is easily accomplished by using the Partition option.

We create a Shuffle mask for the FirstName column, enable the Partition option checkbox, and Add the Gender column from the combo box as shown:

Shuffle partitions data mask

Shuffle mask example

You may notice in this example that 'Laureen' did not shuffle to a different row. That is because this example is using a very small data sample and the 'F' partition has only 7 rows so it was just coincidence that the particular row shuffled to the same row. This row is still technically considered masked.

'Auro' was also not shuffled, but that is because its partition value is NULL (see below).

 

NULL partition values

Rows that have a NULL partition value are not considered having a common partition value and will therefore not be shuffled. DataVeil shall consider such rows as not masked and will log warnings of any unmasked rows at runtime.

A simple solution is to specify an unconditional Shuffle mask as the last mask in the column's mask sequence. It will shuffle all remaining rows that had not yet been masked. It is recommended that an unconditional mask is always present to ensure that all rows in a column are masked. The unconditional mask must appear last in a column's mask sequence. Please also see Mask Execution Order.

 

 

However, only one row has NULL and therefore it would make no difference in this example. An alternative approach could be considered when there is only a small number of partitions (2 in this example) such as by using a Where condition as described below.

 

Using the Where condition

In the example above, with only two actual partitions within which we wish to shuffle rows, we could define the first mask with a Where condition to shuffle among rows that have a Gender value of 'F', followed by an unconditional mask to shuffle all remaining rows regardless of the value in the partition column ('M', NULL, or any other value).

Shuffle conditional filter data masking example

Please also see Mask Execution Order and The Where Condition.

 

Column Synchronization

If you would like to shuffle multiple columns together so that the values from multiple columns on the same row are moved together to a random row then you can define a Column Group.

Please note: Street addresses are being used in the example below to demonstrate the Shuffle mask. The Shuffle mask may suit your needs for masking street addresses but please be aware that DataVeil also provides the Address Street mask specifically designed to generate fictitious street addresses.

For example, in the table below, suppose that we wanted to shuffle the Street, City and Zip columns together.

First, you will need to define a Column Group that specifies the columns that are to be shuffled together. Please refer to the Column Group Option for details on how to create Column Groups.

Therefore, having selected the parent Table ('dbo.Address') in the Project explorer tree and then opening the Masking tab, then the Column Groups tab, the Column Group would be created as shown:

Shuffle multiple columns together data mask

 

Now add a Shuffle mask to any one of the Column Group columns (say, 'Street'). DataVeil will create a mask that automatically synchronizes the rows of all columns in the Column Group.

Shuffle column group definition

 

Shuffling within Partitions 

At this point, we have defined a Shuffle mask that will shuffle the Street, City and Zip column values together from one row onto some other row.

Let's narrow the range so that all the Street, City and Zip columns are shuffled only within their own State.

All that is required is to define the State column as the Shuffle mask's partition values, as shown:

Shuffle mask partition example

 

Using the Determinism option

If the Shuffle mask is executed using the Not Deterministic setting then it is impossible to predict how the rows shall be shuffled and the result shall be different each time it is executed on the exact same input data.

If you use the Deterministic setting then then DataVeil shall attempt to shuffle the rows in exactly the same order upon each execution of this mask. For the order to be predictable the following must be true:
- Properties of the table must be the same on each masking execution
- The table contains the same number of rows on each masking execution
- Please note that there is no guarantee that each execution will shuffle into the same order. This is because the order is determined by a pseudo random number function and if there is a sufficiently large number of rows in the table then it is possible that this randomization function may return duplicate numbers and therefore the order of rows that share the duplicate numbers are unpredictable. Therefore the precise shuffle repeatability is more likely on tables with smaller row counts (such as in the order of several thousand or less).