Skip to Main Content

Data Anonymisation

This guide aims to create awareness of basic data anonymisation concepts

Explanation

What is it?

The replacement of identifying data with made up values.
Pseudonymisation is also known as coding.
It can be irreversible, where the original values are disposed.
It can be reversible, where the identity database is securely kept and not shared.

 

When to use it?

When data values need to be uniquely distinguished.

Example

The dataset contains names of persons who obtained their driving licenses. Instead of suppressing the "Person'" attribute, it was replaced with pseudonyms, because the organisation wanted to be able to reverse the pseudonymisation if necessary.

Before anonymisation:

Person Pre-Assessment Result Hours of Lessons Taken
John Rohit B 25
Stella Campbell D 26
Ming Siew Lee A 30
Poh Boon  B 32
Siva Vasanth C 29
Siti Raudhah A 25

After anonymisation:

Person Pre-Assessment Result Hours of Lessons Taken
4135891 B 25
3229873 D 26
4398642 A 30
783127 B 32
583419 C 29
983429 A 25

For reversible pseudonymisation, the identity database is securely kept in case there is a future legitimate need to identify individuals.

Identity database

Pseudonym Person
4135891 John Rohit
3229873 Stella Campbell
4398642 Ming Siew Lee
783127 Poh Boon 
583419 Siva Vasanth
983429 Siti Raudhah