Why do I need to mask sensitive data, shouldn’t it be already encrypted?

This type of question seems to pop up on internet forums from time-to-time.

Recently I saw a question similar to:

“How can I mask Social Security Numbers when restoring Production data onto our Development environment?”

The responses were along the lines of:

  • “Fix it. You shouldn’t be storing these in the clear to begin with”, or
  • “Your sensitive data, such as SSN, should already be encrypted!”

Although these responses may be well-intentioned they fail to appreciate what is being asked or the use-cases for masking sensitive data EVEN IF they are already stored in encrypted form.

In other words, encryption and data masking typically serve two different purposes although sometimes there is overlap.

Typically, encryption is to defeat unauthorized access to sensitive data. For example, rendering the data unreadable to hackers.

Data masking is to replace the sensitive data, partially or in full, with a non-sensitive substitute.
Note: I am specifically referring to Static Data Masking (also known as Persistent Data Masking.) This is where data is permanently overwritten so that it may be used in non-secure environments, such as in Development, Testing, Training, Outsourcing, etc. Dynamic Data Masking is something different and not discussed here.

For example, suppose that a developer needs to modify software that maintains its Customers’ sensitive information such as their SSN’s. The developer may need to actually VIEW the full unencrypted SSN on the Customer Management screen for the purposes of testing the software modifications. This is irrelevant of whether or not the SSN is stored in encrypted form in the database. Another example is a customer database used for training new Customer Service Representatives who need access to the unencrypted sensitive information but for the purposes of training do not require authentic sensitive information.

Therefore, instead of simply providing a copy of the raw Production data with real sensitive data exposed to the Developer or Trainee, the Production data is first masked. That means that the sensitive data is replaced with fictitious and usually realistic data. In fact, sometimes the masking process is so effective that the end user (or data thief) will not even know that the data has been masked and that they are actually looking at fictitious data. For example, the Trainee will see a realistic SSN but the actual number will be a fake.

Therefore, static data masking is a substitution of sensitive data with non-sensitive data so that the database can be used in non-secure environments. DataVeil is a static data masking tool.