"This website is not affiliated with Splunk, Inc. and is not an authorized seller of Splunk products or services."
  • Home - Splunk Tutorial
  • Splunk training videos
  • Splunk interview questions
  • Contact US
  • About Us
  • Privacy Policy
  • Splunk Jobs
                                                                                           <  Back |Home| Next  >

How to anonymize sensative data fields in splunk?
​

Picture Anonymize data in splunk
  Sometimes it’s necessary to hide sensitive information from splunk itself. If we are forwarding logs which contain sensitive Information like credit card number and passwords, then anyone who have access to logs in splunk can see the information. This can be avoided by anonymising data in splunk
For example consider We’ve a  customer who asked  us  to mask data at both search and index-time. Usually this is to hide personally identifiable information either for security, compliance or both.  In this post we’ll cover several different approaches for doing this in Splunk .For each of the approaches we will use the following sample data:

Example Event
-------------
1.   9/12/2014 17:21 Bob the builder 21.00 2013210345537512

I want to mask the last series of digits like this:
201321######7512

There are diffeeren approache for achieving this goal.All explained below:

1 .Anonymizing data using props.conf and transforms.conf:

To mask/anonymize above data we need to edit transforms.conf and add regex for masking the data in field and need to mention transforms.conf stanza name in props.conf
The edited config on my heavy forwarder is using the props.conf and transforms.conf. They look like this:
props.conf
----------
1.   [test-masking]
2.   TRANSFORMS-masking = mask
3.   
transforms.conf
---------------
1.   [mask]
2.   REGEX = (.*)\s\d{16}$
3.   FORMAT = $1\s\d{6}######\d{4}$2
4.   DEST_KEY = _raw


2. Anonymizing data using SEDCMD

Splunk exposes a SEDCMD feature that can be used at index-time. 
Example log:
sample event = “This is an event with a sensitive number in it. SN=111-11-1111.  This number should be masked”
following config changes would work:
—props.conf--
[sourcetype name]
SEDCMD-masking = s/SN=\d+-\d+-(\d+)/SN=###-##-\1/
This will produce the same result as above.  There’s a few advantages to this approach:
  • It only involves one configuration file (props.conf) instead of two.
  • The matching expression is simpler and doesn’t need to match the entire event like the one in transforms.conf does
  • multiple expressions can be chained together


3. Anonymising data using SCRUB

The “scrub” command is an interesting one.  It anonymizes data at search-time based on configuration that is shipped with Splunk.  Note that this command only takes effect at search-time and therefore any sensitive data would still be stored on disk, “at-rest” on the indexer.  The cool part about this feature is that it can use an existing dictionary of terms to anonymize data but keep the format of the data intact.  For example, an e-mail address like "john@mail.com" would become "joe@abc.com." The latter is an invalid address but retains the format of an e-mail. 
Reference Link: http://docs.splunk.com/Documentation/Splunk/6.2.3/SearchReference/Scrub


4. ENCRYPTION/DECRYPTION

​
Finally, another option we found interesting is an app on Splunkbase to encrypt and decrypt data.  In this approach, existing data in Splunk is encrypted and then re-indexed in Splunk, then at search-time is decrypted. 
Link to the app on Splunkbase: https://splunkbase.splunk.com/app/282/
Hopefully this helps shed some light on the options for anonymizing or masking data in Splunk.  Thanks for reading!
Comment Box is loading comments...
Powered by Create your own unique website with customizable templates.