Relationship Obfuscation

It is possible to obfuscate relationships by disabling a Dependency.

For example, suppose that your company is considering a joint venture or even considering being bought outright and during the initial stage you need to provide a copy of your database for a limited revenue audit where the sales amounts must be preserved but the specific trading details should remain confidential, such as which particular sales locations are most lucrative as it could leak sensitive strategic business information prematurely.

Consider that you have a table called Store that lists all of your retail stores and their location information, and you have another table called Sale where each row identifies a transaction and the retail store where the transaction originated. There is a Foreign Key defined on Sale.StoreID which references the parent Store.StoreID.


Therefore, considering that the requirement is that sales amounts are to be preserved, you might think of shuffling the sales amounts (Sale.saleAmt) column and the Store.storeID column. Although that would shuffle sales amounts among stores, it would still leave a statistical hint as to the frequency of transactions produced by individual stores and therefore this may not be sufficient obfuscation.


Now, consider disabling the dependency between Store.storeID and Sale.storeID and generate a random distribution of storeID's. In this case we created a simple Random Number mask (not shown) to generate values of 1 to 5 which are valid storeID's in this simple example. You must make sure that if you disable a dependency that you generate only valid values to maintain referential integrity. If your real-word situation requires more than just a random number or that which any simple mask can generate then consider creating a CSV file with valid values and use a DataSet mask instead. 

The outcome is shown below. Notice that the frequency distribution of storeID's in the Sale table has been altered. e.g. Previously, storeID 1 appeared four times in the Sale table, but now its apparent masked value of 3 appears 5 times. You may also notice a side effect in this simple example where storeIDs 4 and 5 no longer appear in the masked values! This is a consequence of using the Random Number mask on a very small range of values (1 to 5). Changing the deterministic seed can influence the variability of numbers chosen.

In summary, the relationships of individual sales to stores has been completely obfuscated including the statistical properties of how any individual store is performing relative to the others.