Format Preserving Data Masking

We often receive enquiries along the lines of ‘How does DataVeil compare to ..’ some other masking tool. Recently, this has typically been in regard to Redgate’s Data Masker. Therefore, we thought that we would address such questions in our blog for our readers’ convenience.

To begin, there are actually numerous differences – some obvious and others subtle but quite important. Today, we’ll discuss one major area of difference and we’ll discuss other differences in future posts.

A key difference between DataVeil and Redgate Data Masker is the overall approach to masking.

DataVeil has smart masks whereas Redgate Data Masker relies mostly on simple substitutions.

DataVeil smart masks are aware of the content and structure of every individual original value and can therefore respond with a customized masked value for every individual value. These capabilities include format preserving masking and partial masking.

These smart masking capabilities provide enormous benefits because of the utility of the masked data that would otherwise be overwritten and lost forever by simple substitutions.

To illustrate, in this post we’ll consider format-preserving masks. In the next post, we’ll discuss partial masking.

Suppose that an occasional error is occurring in production and the cause is unclear. It is therefore decided to move a copy of the production data into the test environment so that the developers can investigate. The data is masked before being made available to the developers.

As it happens, the cause of the problem is because a column that contains millions of account numbers, which should contain only hyphen-separated values, has a few rows with invalid formats of missing hyphens and are space separated.

Tools that rely on simple substitution will simply overwrite the invalid data with canned fake data; effectively cleansing the data. The developers will spend a lot of time and effort looking for the problem in the masked data but will never find it because the invalid data has been overwritten by the simple substitution masking.

In contrast, DataVeil’s format preserving masking will mask all of these rows but preserve every hyphen, space and format of each individual value. This will preserve the invalid formats that are causing the problems in production and thereby provide the developers with the opportunity to detect and fix the problem in the test environment.

Clearly, format preserving masks provide an important benefit. It can mean the difference between useful and useless masked data.