LibGuides: Data Anonymisation: When to anonymise?

Consider these before you anonymise

Purpose and utility
Anonymisation should be done specifically to the purpose on hand.
The process of anonymisation reduces the original information in the dataset by some extent, hence reduces the utility (e.g. clarity, precision). You need to decide on the degree of the trade-off, between acceptable utility and reducing risk of re-identification.

Nature and type of data
Different anonymisation techniques are suitable for different type of data.

Anonymisation techniques
Certain techniques may be more suitable for a situation than others.
For example, character masking are usually used on direct identifiers and aggregation for indirect identifiers.

The various anonymisation techniques also modify data in significantly different ways.
For example, character masking modifies only parts of an attribute, pseudonymisation replaces the entire attribute with unrelated, but consistent information, and attribute suppression removes the attribute entirely.
Inferred information
It may be possible for certain information to be inferred from anonymised data.
For example, masking may hide personal data, but it does not hide the length of the original data in terms of the number of characters.
The anonymisation process must therefore take note of every possibility, both before deciding on the actual techniques and after applying the techniques.

Expertise with the subject matter
An “identifiability” assessment should be performed before and after anonymisation techniques are applied, and this requires a good understanding of the subject matter which the data pertains to. Hence, if the dataset is healthcare data, it likely requires someone with sufficient healthcare knowledge to assess how unique (i.e. how identifiable) a record is.

Competency in anonymisation process and techniques
Anonymisation is complex. Look out for persons well-versed in anonymisation techniques and principles.

The recipient
Factors such as the recipients’ expertise with the subject matter, play an important role in the choice of the anonymisation techniques.
Data released to public will require a much stronger form of anonymisation compared to data shared under a contractual arrangement.

Tools
Software tools can be very useful to aid in executing anonymisation. Note that even the best tools will need adequate inputs or may have limitations

Source: PDPC Guide to basic data anonymisation techniques
Adapted for non-commercial and educational purposes only

Controlling access/Imposing access restriction

Image by TheDigitalWay from Pixabay

Controlling access and imposing restriction on access are ways to ensure that anyone who has access to the data agrees to not attempt to re-identify it.

Contracts and agreements are usually applied, on top of security measures.

This method can ensure that data is only used for legitimate purposes, and can significantly lower the risk of re-identification. However, overly draconian control may reduce the utility of the data.

Best practices

Plan early in research
Do not collect data if you don't need!
Anonymisation helps to share/re-use the data, but avoid over or under anonymisation
Be aware of disclosure risks, even with anonymisation
If needed, consider anonymisation + obtaining informed consent for data sharing
If needed, consider anonymisation + controlling access/imposing access restriction