Skip to Main Content

Data Anonymisation

This guide aims to create awareness of basic data anonymisation concepts

Explanation

What is it?

The change of the characters of a data value by using a constant symbol (e.g. "*" or "x"). Masking is typically partial, applied to some characters in the attribute.

 

When to use it?

When the data value is a string of characters and hiding part of it is sufficient to provide anonymity required.

 

Can you recognize your own data?

There is a scenario when character masking allows data subjects to recognize their own data. An example is the publishing of lucky draw results, whereby partially masked NRIC numbers of winners are published for individuals to recognize themselves as winners.
Generally, anonymised data should not be recognizable, even to the data subject themselves.

Example

An online grocery store has a historical dataset that consists of postal code, most frequent delivery time, and average number of orders. For efficiency-study purpose, the store masked out the last 4 digits of the "Postal Code", leaving the first 2 digits, which correspond to the sector code within Singapore. 

Before anonymisation:

Postal Code Most Frequent Delivery Time Slot Average No. of Orders / Month
100111  8 - 9 pm 3
200123 12 noon - 1 pm 9
300456 2 - 3 pm 1

After anonymisation:

Postal Code Most Frequent Delivery Time Slot Average No. of Orders / Month
10xxxx  8 - 9 pm 3
20xxxx 12 noon - 1 pm 9
30xxxx 2 - 3 pm 1