Masking Multiple Databases

It is easy to mask multiple databases simultaneously from a single DataVeil project.

You just add as many Database connections to a project as you like and configure masks normally for each database.

If you need to mask multiple databases then there are some considerations that favor creating a single project containing multiple database definitions and there are others that favor creating separate projects to contain only one database definition in each.

Concurrency & Throughput

A benefit of multiple database connections in a single project is that all of the database schemas shall get masked simultaneously.

Assuming each of the database schemas have separate resources that do not compete with each other (e.g. each resides on a different server) then all of the database schemas shall get masked in parallel with no degradation in performance compared to masking each individually in sequence. DataVeil will not be a bottleneck because during masking execution DataVeil behaves mostly as a controller dispatching batch masking commands to each database connection. The actual masking tasks are performed on the DBMS. Therefore, DataVeil is mostly idle during the masking process.

For example, if you have 10 database schemas where each takes around 5 hours to mask, then running these concurrently from a single project will take a total execution time of around 5 hours (i.e. the longest running database) to mask all 10 databases. Conversely, if you performed the masking one database at a time then it would take approximately 50 hours.

Clearly there is a very significant throughput advantage in masking multiple databases from a single project.

There are also some good reasons why this may not be a suitable approach for some users. This is described below.


Consistent Data Across Multiple Databases

If data must be masked consistently across multiple databases then the mechanism that DataVeil provides to achieve this is Deterministic mode masking. Putting all the database schemas into a single project for the sake of consistent masking will not make any difference.

In fact, if a single project contains multiple databases and some databases encounter masking errors while other databases complete without errors then you will end up with inconsistent data across the databases. i.e. Some databases will contain masked values and some will still contain the original values. If you were to simply re-run the masking project then the already-masked data would get masked to yet different values (masking masked values. E.g. "Smith" to "Jones" to "Bunting") whereas the others would contain original values masked only once ("Smith" to "Jones").

Therefore, if you need consistency across multiple databases when all databases are masked from a single project then you need to ensure that all database connections are reliable and that the masking definitions are correct (e.g. won't occasionally generate duplicates for columns with unique constraints) so that each database masking will complete successfully every time. Otherwise you may find that you will need to restore data on some databases, or create new projects to re-attempt to mask only those databases that did not complete successfully and skip those that did.

The alternative is to create separate projects for each database and execute each project one at a time. If an error occurs then the process can be halted and corrected before continuing to execute the remaining projects for the other databases.

If data consistency across the databases is not required then of course you can just put all the database definitions in a single project and re-run the project without issues whenever some of the databases encountered masking errors and others completed successfully.