When mapping out the requirements for a Data Discovery programme, there are a number of options to consider and many sources it is important to draw from. If you have a PIM (Personal Information Management) programme already in place you can use the output from this in order to help focus the elements for a Discovery service. If you do not have a PIM this is not a problem, you can still get to the same point through planning and mapping out your requirements with your team.
Discovery programmes usually fall into two categories:
- Discovery of sensitive data in inappropriate storage locations
- E-Discovery and mapping of sensitive data for future referencing
Whichever one of these programme types you are looking to run the next questions to determine answers to are consistent across both exercises:
- What content do you actually want to find? (for example)
- Personal Data – PII, SPII, employment information, medical records
- Sensitive Corporate information – Intellectual Property, Code, Designs
- Financial Sensitive Data – Banking information, customer financial records, company financial records
- What combinations of data do you hold? (for example) – Different combinations impact sensitivity levels
- Personal Data – name, address, date of birth, bank details
- Financial Data – Mergers and Acquisitions
- Company finance – End of quarter/ year results
- Where is the data located? (for example)
- On-Premises file storage
- SharePoint Platform
- MS Exchange Folders
- Document management system
- Local User Drives
- Cloud Data – O365, Dropbox, Box, Salesforce, ServiceNOW, Slack, Github
- What is the sensitivity level of the data?
- Do you have classification levels for sensitivities?
- Are they published within your organisation?
- Do have an ability to classify the electronic content?
Once you have formulated answers to the previous questions, then you can look at the next stage:
- Where should data be located?
- Where should it not be located?
Although the latter may seem like rather an obvious question, it is actually an important distinction. For example, scanning for HR content in an HR drive is often counter-productive. This is an area where you expect to find this type of content and will generate a lot of detected incidents for not much gain. The answers for these questions is tied back to our original ask, what type of discovery programme are you wanting to run; a Discovery for where sensitive data should not be located? Or an e-discovery programme to find where all the sensitive data is located, so you can reference and search this data later on, potentially for DSARS, or Right To Be Forgotten also known as Deletion Requests.
The other reason why the distinction of the type of programme you are wanting to run is important is because your technology choice may be different depending upon the searching you want to achieve.
Many DLP platforms such as Symantec, Forcepoint, Digital Guardian are good at searching for content and highlighting when that content is found, and they will create incident tickets against the files discovered. However, if you are attempting to run an E-Discovery programme and simply cataloguing the data you find and all of the locations it has been found in, DLP platforms may not be the toolset to use. It is cumbersome to run DLP scans of all of your storage areas and attempt to keep a track of the content that is flagged as incident tickets. Also, upon running a follow-up scan, DLP platforms are designed to highlight when documents have been removed, not continue to flag as the content is still present. The DLP discovery function is really highlighting where a problem exists, showing the investigator and helping remove the problem so a follow-up scan would show how much remediation has taken place and files have been secured.
A good example of this would be:
- scanning a transfer drive that is meant to be cleared out weekly to ensure that no sensitive content has been left in there
- Scanning a public file share to ensure that no-one has accidentally saved PII to the location and given the whole organisation access to find it
If you are looking to map all data and its locations then using a full E-Discovery platform such as Privaci, BigID, or Commvault’s Activate are much stronger options. These platforms specifically scan for all sensitive data in the content, keep a record of where the files are located and also create reference index’s so data can be searched for and the location records pulled back for the investigator, potentially for a DSAR or Deletion Request to be fulfilled.
These are just some of the questions to consider when planning the initial stages of your data discovery programme.
Please check for our additional articles for the next steps.