PostgreSQL Anonymizer: Data Masking in PostgreSQL

What is PostgreSQL Anonymizer?

PostgreSQL Anonymizer is an extension designed to help mask or replace personally identifiable information (PII) or sensitive data in a PostgreSQL database. This is critical for ensuring data privacy and compliance with regulations such as GDPR. PostgreSQL Anonymizer provides a set of functions and mechanisms to mask sensitive data dynamically or permanently, depending on the use case.

Why Use PostgreSQL Anonymizer?

With data privacy laws becoming stricter, organizations are under pressure to ensure that sensitive information such as PII is protected. PostgreSQL Anonymizer allows developers and database administrators to implement data anonymization strategies directly within the PostgreSQL instance, reducing the risk of data leaks and ensuring that sensitive information is protected.

How PostgreSQL Anonymizer Works

PostgreSQL Anonymizer uses a declarative approach for data masking. This means you define anonymization rules directly in the database schema using PostgreSQL's Data Definition Language (DDL). These rules are enforced within the database, eliminating the need for external tools. The anonymization process can be static (permanent removal of PII) or dynamic (temporary masking for specific users).

Dynamic and Static Masking

PostgreSQL Anonymizer offers both dynamic and static data masking solutions:

  • Static Masking: This method removes PII permanently. Once applied, the original sensitive data is no longer accessible. This is useful when creating datasets for testing or training purposes where sensitive data must be completely eliminated.
  • Dynamic Masking: This hides PII only for specific users while leaving the original data intact. This is useful when certain roles, such as customer support, need access to some data but should not see sensitive information.

Anonymous Dumps

PostgreSQL Anonymizer allows for the creation of anonymized database dumps using the pg_dump_anon utility. This is especially useful when sharing data across teams or with third-party vendors, ensuring that sensitive information is masked before it leaves the secure environment.

Examples of PostgreSQL Anonymizer

Declaring Masking Rules

The following steps show how to declare masking rules using PostgreSQL Anonymizer to protect sensitive data:

Step 1: Install the Anonymizer Extension

To begin, ensure the `anon` extension is installed in your PostgreSQL instance. Run the following commands:


CREATE EXTENSION IF NOT EXISTS anon CASCADE;
SELECT anon.init();
        

Output: The extension will initialize successfully, ready for use.

Step 2: Create a Table

Let's create a table called `customers` with some basic customer information.


CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    full_name TEXT,
    birth DATE,
    phone TEXT
);
        

Output: A table named `customers` is created with the specified columns.

Step 3: Declare Masking Rules for Columns

Now, we will apply masking rules to the `full_name` and `birth` columns to anonymize this sensitive data.


SECURITY LABEL FOR anon ON COLUMN customers.full_name
IS 'MASKED WITH FUNCTION anon.fake_last_name()';

SECURITY LABEL FOR anon ON COLUMN customers.birth
IS 'MASKED WITH FUNCTION anon.random_date_between(''1920-01-01'', now())';
        

Explanation: The `full_name` will be masked with a fake last name, and the `birth` date will be masked with a random date between 1920 and the current date.

Step 4: Insert Data

Insert some sample data into the `customers` table for testing.


INSERT INTO customers (full_name, birth, phone) 
   VALUES ('John Pascal', '1980-05-15', '123-456-7890'),
          ('Jane Smith', '1990-08-22', '987-654-3210');
        

Output: The sample data has been inserted successfully.

Step 5: View the Data

Let's check the original data before applying any masking:


SELECT * FROM customers;

 id  |  full_name  |    birth    |     phone     
-----+-------------+-------------+---------------
  1  | John Pascal | 1980-05-15  | 123-456-7890
  2  | Jane Smith  | 1990-08-22  | 987-654-3210
        

Output: The original data is visible without masking at this stage.

Static Masking Example

You can permanently anonymize data using the anon.anonymize_database() function. This function destroys the original data and replaces it with anonymized values.

Step 1: Apply Static Masking

Run the following command to permanently anonymize the database:


SELECT anon.anonymize_database();
        

Output: The data will be permanently masked based on the declared rules. After this, you cannot recover the original data.

Step 2: Check Anonymized Data

Now, view the anonymized data after static masking:


SELECT * FROM customers;

 id  |  full_name  |    birth    |     phone     
-----+-------------+-------------+---------------
  1  | Smith       | 1965-04-10  | 123-456-7890
  2  | Johnson     | 1973-11-02  | 987-654-3210
        

Explanation: The `full_name` and `birth` columns are now anonymized based on the masking rules. The phone number remains unchanged as no masking rule was applied to it.

Dynamic Masking Example

Dynamic masking hides data from specific user roles but keeps the original data intact for other users. This approach is useful when you want to mask sensitive data for certain users without permanently altering it.

Step 1: Create a Role

Create a new role that will have access to masked data:


CREATE ROLE customer_support LOGIN;
        

Step 2: Define Masking Rules for the Role

Apply masking rules specifically for the `customer_support` role:


SECURITY LABEL FOR anon ON ROLE customer_support IS 'MASKED';

SECURITY LABEL FOR anon ON COLUMN customers.phone
IS 'MASKED WITH FUNCTION anon.partial(phone, 2, ''******'', 2)';
        

Explanation: The `phone` column will be masked for the `customer_support` role, only showing the first two and last two digits.

Step 3: Test with Masked Role

Connect to the database as the `customer_support` user and view the masked data:


\c postgres customer_support;

SELECT * FROM customers;

 id  |  full_name  |    birth    |     phone     
-----+-------------+-------------+---------------
  1  | Smith       | 1965-04-10  | 12******90
  2  | Johnson     | 1973-11-02  | 98******10
        

Output: The phone numbers are partially masked for the `customer_support` role, while the other columns remain visible as is.

Other roles will continue to see the original phone number unless they have specific masking rules applied.

Limitations

While PostgreSQL Anonymizer is a powerful tool, it does have some limitations:

  • Dynamic masking works only with a single schema (by default, the public schema).
  • Anonymous dumps may not always be consistent, especially if the database is subject to frequent changes during the dump process.
  • Performance may be impacted depending on the complexity of the anonymization rules and the size of the database.

Conclusion

PostgreSQL Anonymizer is a valuable extension for any organization that needs to manage sensitive data and ensure compliance with data privacy regulations. Whether you're looking to anonymize data for testing, create anonymized dumps, or enforce dynamic masking for specific roles, PostgreSQL Anonymizer provides the tools necessary to protect sensitive information efficiently.



Rate Your Experience

: 1 : 0


Last updated in November, 2024

Online Tests
Read more

Cloud Technology
Read more

Oracle Database
Read more

MSSQL Database
Read more

PostGres Database
Read more

Linux
Read more

ASP/C#
Read more

Quick Access