Skip to Main Content

Data Anonymisation

This guide aims to create awareness of basic data anonymisation concepts

Consider these before you anonymise

  1. Purpose and utility
    Anonymisation should be done specifically to the purpose on hand.
    The process of anonymisation reduces the original information in the dataset by some extent, hence reduces the utility (e.g. clarity, precision).  You need to decide on the degree of the trade-off, between acceptable utility and reducing risk of re-identification.

  2. Nature and type of data
    Different anonymisation techniques are suitable for different type of data.

  3. Anonymisation techniques
    Certain techniques may be more suitable for a situation than others.
    For example, character masking are usually used on direct identifiers and aggregation for indirect identifiers. 

    The various anonymisation techniques also modify data in significantly different ways.
    For example, character masking modifies only parts of an attribute, pseudonymisation replaces the entire attribute with unrelated, but consistent information, and attribute suppression removes the attribute entirely.
     
  4. Inferred information
    It may be possible for certain information to be inferred from anonymised data.
    For example, masking may hide personal data, but it does not hide the length of the original data in terms of the number of characters.
    The anonymisation process must therefore take note of every possibility, both before deciding on the actual techniques and after applying the techniques.

  5. Expertise with the subject matter
    An “identifiability” assessment should be performed before and after anonymisation techniques are applied, and this requires a good understanding of the subject matter which the data pertains to. Hence, if the dataset is healthcare data, it likely requires someone with sufficient healthcare knowledge to assess how unique (i.e. how identifiable) a record is.

  6. Competency in anonymisation process and techniques
    Anonymisation is complex.  Look out for persons well-versed in anonymisation techniques and principles.

  7. The recipient
    Factors such as the recipients’ expertise with the subject matter, play an important role in the choice of the anonymisation techniques.
    Data released to public will require a much stronger form of anonymisation compared to data shared under a contractual arrangement.

  8. Tools
    Software tools can be very useful to aid in executing anonymisation. Note that even the best tools will need adequate inputs or may have limitations

 

Source: PDPC Guide to basic data anonymisation techniques
Adapted for non-commercial and educational purposes only

Controlling access/Imposing access restriction

Image by TheDigitalWay from Pixabay
Image by TheDigitalWay from Pixabay

Controlling access and imposing restriction on access are ways to ensure that anyone who has access to the data agrees to not attempt to re-identify it. 

Contracts and agreements are usually applied, on top of security measures.

This method can ensure that data is only used for legitimate purposes, and can significantly lower the risk of re-identification. However, overly draconian control may reduce the utility of the data.

Best practices

  • Plan early in research
  • Do not collect data if you don't need!
  • Anonymisation helps to share/re-use the data, but avoid over or under anonymisation
  • Be aware of disclosure risks, even with anonymisation
  • If needed, consider anonymisation + obtaining informed consent for data sharing
  • If needed, consider anonymisation + controlling access/imposing access restriction