How does DLP work?

Understanding the differences between contented awareness and contextual analysis is substantive to comprehend any DLP solution in its entirety. A utilitarian direction to think of the difference is if message is a letter, context is the envelope. While content awareness involves capturing the envelope and peering inside it to analyze the capacity, context includes external factors such as header, size, format, etc., anything that doesn ’ thymine include the content of the letter. The mind behind content awareness is that although we want to use the context to gain more intelligence on the content, we don ’ thymine want to be restricted to a individual context .
once the envelope is opened and the contented processed, there are multiple content analysis techniques which can be used to trigger policy violations, including :

  1. Rule-Based/Regular Expressions: The most common analysis technique used in DLP involves an engine analyzing content for specific rules such as 16-digit credit card numbers, 9-digit U.S. social security numbers, etc. This technique is an excellent first-pass filter since the rules can be configured and processed quickly, although they can be prone to high false positive rates without checksum validation to identify valid patterns.
  2. Database Fingerprinting: Also known as Exact Data Matching, this mechanism looks at exact matches from a database dump or live database. Although database dumps or live database connections affect performance, this is an option for structured data from databases.
  3. Exact File Matching: File contents are not analyzed; however, the hashes of files are matches against exact fingerprints. Provides low false positives although this approach does not work for files with multiple similar but not identical versions.
  4. Partial Document Matching: Looks for complete or partial match on specific files such as multiple versions of a form that have been filled out by different users.
  5. Conceptual/Lexicon: Using a combination of dictionaries, rules, etc., these policies can alert on completely unstructured ideas that defy simple categorization. It needs to be customized for the DLP solution provided.
  6. Statistical Analysis: Uses machine learning or other statistical methods such as Bayesian analysis to trigger policy violations in secure content. Requires a large volume of data to scan from, the bigger the better, else prone to false positives and negatives.
  7. Pre-built categories: Pre-built categories with rules and dictionaries for common types of sensitive data, such as credit card numbers/PCI protection, HIPAA, etc.

There are myriad techniques in the market nowadays that deliver different types of contented inspection. One thing to consider is that while many DLP vendors have developed their own contentedness engines, some employ third-party engineering that is not designed for DLP. For example, rather than building traffic pattern matching for citation card numbers, a DLP seller may license technology from a search engine supplier to pattern match credit calling card numbers. When evaluating DLP solutions, pay close attention to the types of patterns detected by each solution against a real corpus of sensible data to confirm the accuracy of its capacity locomotive.

Leave a Reply

Your email address will not be published.