LibGuides: Data Anonymisation: Character Masking

Explanation

What is it?

The change of the characters of a data value by using a constant symbol (e.g. "*" or "x"). Masking is typically partial, applied to some characters in the attribute.

When to use it?

When the data value is a string of characters and hiding part of it is sufficient to provide anonymity required.

Can you recognize your own data?

There is a scenario when character masking allows data subjects to recognize their own data. An example is the publishing of lucky draw results, whereby partially masked NRIC numbers of winners are published for individuals to recognize themselves as winners.
Generally, anonymised data should not be recognizable, even to the data subject themselves.

Source

PDPC Guide to basic data anonymisation techniques

Example

An online grocery store has a historical dataset that consists of postal code, most frequent delivery time, and average number of orders. For efficiency-study purpose, the store masked out the last 4 digits of the "Postal Code", leaving the first 2 digits, which correspond to the sector code within Singapore.

Before anonymisation:

Postal Code	Most Frequent Delivery Time Slot	Average No. of Orders / Month
100111	8 - 9 pm	3
200123	12 noon - 1 pm	9
300456	2 - 3 pm	1

After anonymisation:

Postal Code	Most Frequent Delivery Time Slot	Average No. of Orders / Month
10xxxx	8 - 9 pm	3
20xxxx	12 noon - 1 pm	9
30xxxx	2 - 3 pm	1