Discovery Search Options

The Discovery Query search options are located at the bottom of the Query panel on the Discovery Search tab.
  

 

Scope

The default is 'All Databases' defined in your project - which is typically one. If you have multiple databases defined in your project then you can click on the 'Scope...' button and select which are searched or ignored.
 

Sample

The combo box allows you to specify Limit or All.

Limit

This is the maximum number of rows in a table that DataVeil shall search. The default is 1,000. A range between 1,000 and 10,000 is generally recommended.

All

This will search all rows of a table. Generally, this is not recommended because it can take a very long time and is usually of limited benefit. This is because the primary objective is to identify columns that are sensitive and this can usually be done with a limited sample. Search processing speed varies depending on your system, network and configured Query; however, you can typically expect between 5,000 and 10,000 rows to be scanned per second.
 

Search Project Columns

This will select all tables that are defined in your project (shown in the Project explorer tab) for search. This is selected by default.
 

Search DBMS Columns

This will select all remaining tables on the DBMS (i.e. are not defined in your project under the Project explorer tab and not under the Excluded tab) for search. This is selected by default.
 

Search Excluded Columns

This will select all tables that you have excluded from your project (shown in the Excluded explorer tab) for search. This is not selected by default because you should only exclude columns that you are certain are not sensitive. Therefore, searching excluded columns would defeat its purpose.
 

Ignore Columns marked Sensitive

If this option is selected then the search shall ignore all columns already marked Sensitive. This can help you focus on finding only those sensitive columns that are not yet part of your project. The default is not to ignore Sensitive columns.
 

Ignore Columns marked Not Sensitive

If this option is selected then the search shall ignore all columns that are already marked Not Sensitive. This can help you focus on finding only those sensitive columns that are not yet part of your project without the search result being cluttered by columns that you have already marked as Not Sensitive. The default is not to ignore columns marked as Not Sensitive.
 

Approx row counts (fastest)

If this option is selected then DataVeil shall using alternative lookup or calculation-based methods to determine table row counts that are much faster than performing actual row counts. If a faster alternative method is not available for a table then a normal full count shall be performed. This is assessed on a table-by-table basis.

This option is especially useful if your schema contains very large tables (hundreds of millions of rows or more) because full row counts can take many minutes whereas this option will usually take only a few seconds.

Using this option will not affect the quality of the sensitive data discovery result because all configured scan sample sizes shall still be fully scanned. The only difference that the table rows counts shown in the reports shall be approximate counts. A message is logged to the report to state that row counts shown are approximate whenever this option is selected.
 

Filter False Positives

If this option is selected then DataVeil shall not report results that it has determined are more likely not matching any pattern configured in the Query, even if a configured pattern appears to match.

For example, if your Query consists of only the Person Last Name system pattern then it would match a column containing names "Black", "Brown", "Silver" and "White" (eg. Mr. Brown can refer to a person.) However, if the 'Filter False Positives' option is selected then Dataveil would determine whether the other rows in this column contain only color names, in which case DataVeil would determine that it is a false positive and therefore not report it as a Person Last Name.

Note: If you are using your own Custom regular expression patterns then you should consider disabling this option because DataVeil may filter out columns that you may have otherwise expected to appear in the search result.
 

Always Refresh Schema

If this option is selected then DataVeil shall always download a fresh copy of your database schema before it commences a search. The default is not to always refresh schema.

DataVeil will always ensure that the schema has been refreshed at least once since the current project was opened. Therefore, enabling this option would only be meaningful if you think that your schema may have changed since your project has been opened or a subsequent discovery search.
 

Minimum Confidence %

This is the minimum confidence that a result must achieve for it to be included in the discovery search result and reports. The default value of 0 disables this filter and means that all results are included.

For example, if it is determined that there are numerous insignificant results below 10% confidence that are cluttering discovery results and it would be preferable to exclude such results from the result panels and reports then this parameter should be set to 10.

If any results are excluded as a consequence of this filter then DataVeil shall log a warning message stating the count of columns that were omitted from the result. E.g. "WARNING 31 columns have been omitted from this discovery result because their Confidence score fell below the search query Minimum Confidence parameter of 10%."

Note: This option is intended as a convenience feature to assist in specific situations such as when the user wants a quick, clear and approximate overview of where most sensitive data is likely to be located without having the discovery result cluttered with a great many of other minor results. It is strongly recommended to at some stage fully review all discovery hits with this Minimum Confidence parameter set to 0. At that time, if there are false-positive hits then it is suggested to mask those columns as 'Excluded' (in the Discovery result, select the column's row, right-click and choose 'Exclude'). On subsequent discovery executions those Excluded columns shall not appear in the discovery result, provided that the 'Search Excluded Columns' option is disabled. You can further minimize the discovery result by enabling the options 'Ignore Columns marked Sensitive' and 'Ignore Columns marked Not Sensitive' as these are columns that have already been appropriately considered by the user.