One of the most common types of question we receive is 'How can performance be improved?'

This section describes the factors that can most improve the performance of a DataVeil masking project.

In this context, performance shall mean how to execute a masking project in the minimum amount of time.

Use DataVeil version 4.8 or later

DataVeil 4.8 introduced significant performance enhancements. If you are using an earlier version then upgrading to at least 4.8 would be worthwhile in terms of performance improvement under the conditions described below.

The performance gains will depend on several factors and are cumulative. The performance gains can be between 20% and well over 200%. These gains will begin to become noticeable only when masking larger tables, i.e. those with at least a few million rows.

The main factors that will determine performance improvement:

* Larger tables will see greater performance improvements. i.e. A table smaller than around 3 million rows is unlikely to result in much or any improvement. A table with 500 million rows is likely to result in very significant improvement.

* Conditional masks. A column with masks using Where conditions will see further performance improvements. The greater the number of conditional masks in a single column, the greater the performance improvement for that column. Note: This is not to say that adding additional conditional masks will improve performance. It is simply saying that the same project with many conditional masks on large tables will perform much better in version 4.8 or later.

* Preserved rows. A column with preserved rows will perform better. This is especially true on large tables where only a single column is masked.

Use DataVeil Native Functions

If you are masking data on Oracle or SQL Server then you should install the corresponding DataVeil Native Library on the DBMS if at all possible.

This provides a performance boost during the masked value generation phase of approximately 2x for Oracle and 10x for SQL Server. The native libraries have no effect on the table write speeds.

Using the DataVeil native libraries, together with DataVeil version 4.8 or later, is probably the biggest performance boost possible, especially for SQL Server.

Adjust Masking Execution Settings

These settings are described in detail in The Execution Tab.

Maximum Connections

In general, increasing the Maximum Connections will reduce project execution time because it means more masking queries will be executed concurrently. Each masking query uses a separate connection for the duration of the query.

If the number of connections is configured lower than optimal then unused server resources shall sit idle and your project will take longer to execute than optimal.

If the number of connections is configured higher than optimal then resource contention shall occur on your server and the masking project will take longer to execute than optimal.

In general, a good starting point for this setting is 2x the number of available processors for the masking execution. i.e. processors not significantly busy with other tasks. Therefore, if you have a dedicated server with 4 processors then a good starting point for this setting is 8.

Adjust this parameter on subsequent masking executions and observe the performance difference to find the optimal setting for your environment.

Masked Value Generation Batch Size

This represents how many rows of masked values shall be generated in a single query.

The default is 100,000.

Generally, the best performance can be expected when this setting is in the range of 100,000 to 800,000; however, it will vary depending on the DBMS type and version, its configuration and the characteristics of the specific columns and tables being masked.

In performing numerous benchmarks, there have been instances where a setting of 100,000 had twice the throughput of 800,000. In other cases, 800,000 had better throughput.

Therefore, you are encouraged to compare the performance of a masking project in your own specific environment using different settings in the range mentioned above. The easiest way to do this is to perform comparison Preview Runs using the Complete option. This will run the masks on the full range of rows but without overwriting the original data. It is recommended to start with a setting of 100,000. Next, try 400,000 and then 800,000. Compare the masking times of each of these masking executions' reports to determine the optimal setting for your particular masking project.


Table IO

This represents how many rows of the original table are updated in a single query when using the UPDATE IO setting,

The default is 400,000 rows.

A value within the range of 100,000 to 500,000 is generally recommended.

This setting can affect the write performance considerably. In some masking projects, a setting of 100,000 doubled the throughput compared to a setting of 400,000. In other projects, a setting of 500,000 was fastest. You are encouraged to compare write performance of masking runs using 100,000 and 500,000. Compare the write durations for each of these settings in the masking execution reports to determine the optimal setting for your particular masking project.


Available Unique Index in Masked Tables

DataVeil requires an available unique index in any table that it masks.

If a column that is part of a unique index is masked then that index becomes unavailable for DataVeil's purposes of coordinating masking progress.

Therefore, if there are no available unique indexes then DataVeil shall need to create a temporary index. As part of this, DataVeil inserts a temporary column into the table being masked. This is a time-consuming process so it is worthwhile to consider how this can be avoided, such as:

Is a column of an existing unique index being masked that is not necessary to mask without compromising the result? E.g. Occasionally users mask arbitrary numeric ID columns that belong to a primary key that would have been ideal index candidates for DataVeil. Such columns are not normally sensitive, especially if all other sensitive columns have already been masked.

Does a column already exists that contains unique values but does not have a Unique index or constraint? If so then consider defining a unique index or constraint on this column. If it is not appropriate to alter the DBMS schema then you can add a 'create unique index' command as a DataVeil pre-masking script and a 'drop index' as a DataVeil post-masking script. This will avoid DataVeil having to insert and drop a temporary column.

Choice of Appropriate Mask for Long Values

The performance of some masks varies according to the length of the original value being masked.

A good example is the Randomize mask. This mask is not only compute-intensive but it performs in linear time according to the length of the original value being masked. For example, the Randomize mask shall take 5x as long to mask a 50 character value as it does to mask a 10 character value. This is also true for Randomize Hex. It is also true for Redact and Sentences, although these are faster because they do not perform the many computations that the Randomize masks do.

Therefore, if you have large text columns (e.g. thousands of characters wide) then avoid using the Randomize mask as this may take an unacceptably long time (many hours or days instead of minutes). Also avoid the Redact mask on very long values if possible. If you must preserve a realistic length then you may need to use the Redact mask but it would be strongly recommended to use the DataVeil Native Library to mask very long values to significantly reduce execution time.

Alternatives to consider:

Use the Sentences mask but supply a more manageable length of masked values to be generated. E.g. If you have a 5,000 character original value and it is acceptable to replace with a shorter value (say 300 characters) then configure a Sentences mask to generate random lengths of 250 - 350 characters.

Create you own specialized mask. If there is a more effective way for you to generate replacement values than the existing DataVeil masks can offer then you can write the SQL code to do this and either put it directly in a User SQL Value mask or put it in a User Defined Function (i.e. TSQL or PL/SQL) on your DBMS and call it using a User SQL Value mask.

See Large Objects for more examples.


Skip Pre-Write Recovery Snapshot

After all masked values have been generated and before DataVeil starts to overwrite the original data, DataVeil shall make a temporary copy of the original values to be masked (a 'snapshot'). This is done in case the masking run fails so that DataVeil can attempt to restore all the original values to leave the database in a consistent state.

Ordinarily, this does not take a long time because DataVeil copies only the columns being masked. However, some users who are masking very large tables and wide columns may notice a significant delay before the Table Writers start. If you are unsure of which steps are taking a long time then you can enable Log diagnostic messages to log more detailed progress messages.

If it becomes evident that this snapshot step is taking longer than you would like and you are sure that the masking project shall execute successfully (or don't mind performing a DBMS restore of your data if something goes wrong) then you can disable this snapshot under the tab Settings>Database>Recovery.

Unnecessary Use of Synchronize Duplicates

If a column has masks configured only for Deterministic masking then the determinism will implicitly preserve duplicates, i.e. duplicate values shall be masked consistently. Therefore, enabling the Synchronize duplicates option serves no further purpose other than introduce an unnecessary duplicates processing step. In such a case, consider disabling the Synchronize duplicates option for the column.

The Synchronize duplicates option is mainly needed for consistent masking of values within a column if at least one mask for that column is configured for non-deterministic masking.

Installation on Same Server

It is preferable to install DataVeil on the same server as the DBMS where data masking shall occur but it is not necessary and will generally not affect performance.

Data Masking

Installing DataVeil on the same server as the DBMS should have negligible effect on data masking performance. This is because the DataVeil process acts as a controller and only issues masking commands to the DBMS. Functions on the DBMS are executed to perform the actual masking tasks.

Note: It is not recommended to run DataVeil across a network when connecting to Oracle DBMS installed on a Linux/Unix server. Very slow/long latency has been observed in such cases and is believed to be related to Oracle JDBC drivers on *nix systems. This can also be observed when performing a command unrelated to DataVeil, such as a CREATE INDEX command, from an SQL Developer window across the network (the slowness will occur when the command is from SQL Developer because it uses JDBC but not from SQL Plus because it does not use JDBC).


When a Discovery task is executed DataVeil downloads samples of the data for analysis. The data is kept in memory temporarily and discarded. This is necessary because it was not feasible to implement the discovery analysis algorithms to be executed on the DBMS. Therefore, there may be some benefit in running DataVeil on the same server as the DBMS as it will avoid transmitting data across the network; however, the default sample size is small enough to not cause any network issues.