FileMasker as an AWS Lambda Function
FileMasker can be run as an Amazon AWS Lambda function.
Some of the many benefits of using Lambda functions to perform on-demand masking of files are:
* Very easy to configure and use
* No further provisioning, tuning or reconfiguration required
* Automatic scaling through concurrency
* Very high throughput is possible (many terabytes per hour)
* AWS costs are charged only when Lambdas are actually executing.
The steps below describe in detail how to configure a Lambda with a FileMasker executable and how to configure and test request messages that are sent to Lambda functions on startup.
Configure an AWS Lambda Function
From the AWS Management Console, select Lambda under the Compute category and then click on the 'Create function' button.
This will open the Create function panel shown below.
Specify a Function name. In the example above it has been named 'fmLambda8'.
For Runtime choose either Java 8 or Java 8 (Corretto).
Note: Although Java 8 or later is available for FileMasker on local systems, only Java version 8 is compatible with FileMasker on AWS Lambda.
Click on the 'Create function' button.
For Code source, click on 'Upload from' and choose '.zip or .jar file'. The dialog below shall appear:
Click on 'Upload', as shown above.
Navigate to the FileMasker software folder filemasker/modules/ext/ and select the file FileMaskerLib.jar. Click 'Save'.
Note: Do not delete, move or rename the FileMaskerLib.jar file because the FileMasker GUI also uses it and shall expect to find it in that folder.
For Runtime settings click on the 'Edit' button shown above and specify the Handler as com.filemasker.lib.main.FileMasker. This is case sensitive.
It is recommended to adjust the memory and maximum execution time. This can be done in the 'Configuration' tab under 'General Configuration' as shown below:
Memory: 2048 MB is recommended.
Increasing the memory setting beyond 2048 generally does not improve throughput performance. This observation was correct as of October 2019 but could change depending on whether AWS changes the Lambda implementation in future. See the topic "Lambda Limitations" below for more information.
2048 MB is adequate to process files up approximately 5GB.
Timeout: 15 minutes is recommended.
The maximum execution time of a Lambda function is 15 minutes. This should generally be plenty of time for FileMasker to process files of approximately 5 GB. It is best to run a test on your maximum anticipated file sizes and note the actual execution time taken. The execution time will almost certainly be less than 15 minutes but if many compute intensive masks (such as Randomize) are being used and the time taken is close to 15 minutes then consider allocating more memory and then verify whether the execution time is reduced. In such a case also consider whether it is possible to reduce the size of input files to ensure that the execution time shall always be less than 15 minutes. See the topic "Lambda Limitations" below for more information.
If encryption is being used then FileMasker shall automatically use the same SSE encryption scheme & key as the original file that is being masked.
Review all other settings. such as those found under Configuration->Permissions, and adjust as required for your environment
Test Execution of the Lambda
Create a Test Request Message
When a Lambda function starts, it receives a request message from AWS. FileMasker relies on this request message to contain details such as the location of the FileMasker project file, the input file (S3 object) and output location.
To configure a Test request event, click on the arrow in the 'Test' tab.
A form shall appear with a dummy JSON record. Replace this with a FileMasker request JSON and give this request event a name. It has been named 'MyFmRequest' in the example below:
inPath: the S3 path to the original file to be masked
outFolderPath: the S3 folder where the masked version of the original file shall be written. If the folder does not exist then it shall be created automatically. If the file already exists in the out folder then it shall be replaced. The file name shall be the same name as the original file.
projectPath: the S3 path to the FileMasker project file (.fmp) that defines the masking to be performed.
projectKey: the FileMasker project key. If your FileMasker project does not use a key then you can omit this projectKey field.
licensePath: the S3 path to your FileMasker license file (.fml). Your license file can be downloaded from your user account at www.dataveil.com. You will then need to upload your license file into an S3 bucket.
region: this should match the S3 files locations and where your KMS keys are configured.
A sample request message has been provided with the FileMasker software in the file aws/request.json.
Click on 'Save changes'.
Run the Test Lambda
After configuring the test event as described above, you can test the execution of the FileMasker Lambda function by clicking on the 'Invoke' button.
Upon completion, an AWS 'Execution result' box is shown. Click on the 'Details' label to expand. This will show the FileMasker response message and log messages:
Note: It is normal for AWS Lambda functions to run faster on repeated invocations if the interval between invocations is relatively short. For example, if a Lambda function is executed for the first time it could take 20 seconds or longer but on an immediate successive execution it could take 15 seconds or less. This is because AWS attempts to cache the Lambda's environment for faster re-invocations. After an unspecified time, AWS will clear the environment and so the next invocation will take several seconds longer to execute again as the Lambda's environment is reinitialized.
Two basic limits of AWS Lambda shall effectively limit the size of files (S3 objects) that you can mask. Specifically, these are Memory and Timeout.
AWS Lambda allocates parts of configured Memory for different purposes, such as for heap, meta and cache. The specific breakdown is not clear from AWS documentation.
The critical memory component required by FileMasker is the heap. Our observations in testing is that allocating Memory beyond 2048 MB does not appear to make any significant difference in heap allocation. Therefore, as of October 2019, the 2048 MB setting appears to be optimal for FileMasker as it provides the maximum CPU performance with maximum useable heap space. Allocating beyond this will not increase performance or throughput but will increase Lambda charges billed by AWS.
The Timeout is the maximum time that an AWS Lambda can run before AWS terminates the Lambda. Setting this to the maximum of 15 minutes is recommended.
In general, FileMasker can process files up to between 4 and 6 GB. The actual size shall depend on how many masks are configured and the type of masks selected. Some masks are more computationally intensive and are therefore slower, such as the Randomize mask. Therefore, if a large number of masks are configured then the upper limit of file size may be closer to 4 GB because the total processing time may approach the AWS Lambda hard limit of 5 minutes. If fewer or faster masks are used then the upper limit may be closer to 6GB. These are general estimates only. Finally, please be aware that the AWS Lambda environment can fluctuate considerably in performance from time to time. E.g. A Lambda that sometimes runs in 200 seconds may run closer to 220 seconds or more on other occasions.