How to Mask PII Data: A Guide With Examples

Data masking helps you and your organization improve data security. The term means hiding sensitive information but keeping it useful and realistic, so you can continue to develop and test with it while keeping it private. Many companies underestimate the power and importance of data masking. According to IBM Security, the average cost of a data breach has reached almost $4 million. Therefore, it’s essential to shore up your data security, no matter the size of your organization.

Data masking personally identifying information (PII) may include many techniques, such as data shuffling or data substitution, that help you make data unidentifiable. It’s an important aspect of data security that every organization should care about.

This article will help you understand what data masking is and why it matters. Further down the line, we’ll discuss a few data masking techniques—including one that isn’t recommended for most types of sensitive data. But first, let’s discuss data masking in general.

What Is Data Masking for, and Who Needs It?

Data masking means altering information to make it impossible for anyone else to reconstruct the original object. In other words, you alter the data in a way that means others can’t identify the original data.

Larger organizations that deal with vast amounts of data often use data masking. However, as recent data breaches have hit the headlines, smaller enterprises are starting to think about implementing data masking techniques.

Data masking is simple to implement using the techniques in this article. Even a startup can use these methods.

The main reason companies use data masking is to alter PII. A company that leaks raw PII data may face reputation damage, lawsuits, and lost business.

Are you unfamiliar with PII data? Here are a few examples:

Name, mother’s maiden name, address, and age
Email address or phone number
Place of birth
Credit card details
Social Security number
Biometric data (such as fingerprint, iris, or DNA) or medical data
Passport, driver’s license, or bank account information

You’ll want to mask any data that can distinguish an individual’s identity. If you’re unsure about the integrity of your data, consider a data quality audit.

Why Is It Essential to Mask PII Data?

Let’s get into the specifics of why masking personally identifying information is so important.

1. Protect Your Data in Case of a Data Breach

Even organizations that have strong security mechanisms in place can still experience a data breach. Often, human error is the main cause of a breach. To protect your organization’s reputation, mask all data, and spend time on data governance.

This means you should mask not only production data (which you need to conduct day-to-day business) but also non-production data (which you use in test environments). Many companies make copies of their production data for analysis or testing but forget to apply the same data security standards for data they use for testing. If there’s a breach, then this non-production data is of great value for the intruder. Why? It often includes real data that might help an intruder identify a person.

Next, let’s look at how to fight data threats.

2. Combat Data Threats From Within Your Company

It might be hard to believe, but many data breaches happen from inside a company. A study by the Open Security Foundation found that 19.5% of data breaches come from insiders. Don’t underestimate the possible risks of employees handling sensitive data.

Even your employees shouldn’t work directly with unmasked data. Always provide masked data to guarantee the protection of your users’ or clients’ information.

Recently, Desjardins Group fell victim to an insider who leaked the data records of 2.7 million users. The Canadian bank reported that an insider gathered this data and shared it with a third-party financial institution.

The Desjardins Group case is clearly an example of a large data breach. However, even a minor data breach can be damaging.

3. Comply With Data Protection Standards, Including GDPR

The European Union implemented the General Data Protection Regulation in May 2018. The GDPR standard was created in reaction to data breaches, and the law aims to protect EU citizens’ personal data.

Companies that don’t comply with the GDPR rules face heavy fines of up to 2% of annual worldwide turnover. It’s better to play safe and protect your users’ data.

Next, let’s look at five ways to mask data.

5 Simple Data Masking Techniques

Most of these methods are fairly simple to implement so you can level up your data security. I strongly recommend using one or more of these techniques. Let’s take a look!

1. Data Shuffling

The data shuffling technique scrambles information. Consider a dataset that holds three columns: name, place of birth, and country. The shuffling technique would create new records by picking a random name, place of birth, and country field.

You can use this technique for testing an application in a staging environment. It allows the testing engineer to test the application with real (but altered) data.

You can extend the data shuffling group so it can also handle data links. Let’s take a look at shuffling data groups.

2. Data Shuffling With Data Groups

For some applications, it might be important to retain logical links between data. For example, you’d want to couple a phone number record with a country record because the phone number format works for the specified country only.

Now, imagine you have a dataset that holds the following columns: name, phone number, and country. When randomly shuffling the data set, you’d pick a name and match it with a randomly picked “phone number and country” data group.

3. Interstate Data Shuffling

Interstate data shuffling means shuffling matching records.

Let’s say you have a dataset with the following columns: name, place of birth, and country. Next, look for rows that have the same country. This allows you to swap values like the place of birth between records with the same country. In this way, you can create a more realistic shuffled dataset.

4. Substitution

The substitution technique focuses on replacing data with realistic-looking data. Instead of using real data, you’re replacing it with other data that has the correct format.

For example, say you want to hide the date of birth of users because it’s PII data. You can create a substitution rule that replaces the original date of birth with a fake generated date of birth.

5. Nulling

In the nulling technique, you replace a value with a null value. This technique isn’t recommended in most cases because it doesn’t represent real data. However, it’s useful for hiding extremely sensitive data.

So, we’ve covered all data masking techniques. Use one or combine several to secure your data.

Data Masking Done Right

Nowadays, it’s important for you and your organization to spend time on data security. Data masking is an important aspect of data security that helps you protect your PII data. Don’t forget to mask both production data and non-production data.

The data shuffling with data groups technique is common because it allows an organization to retain logical links between the data while improving the data security. On top of that, the nulling technique can remove especially sensitive information from a dataset.

Remember, data masking is a continuous process. It isn’t enough to mask data once because you’re gathering new data daily. Good luck as you mask your data and protect sensitive information—your clients’ and your own.

This post was written by Michiel Mulders. Michiel is a passionate blockchain developer who loves writing technical content. Besides that, he loves learning about marketing, UX psychology, and entrepreneurship. When he’s not writing, he’s probably enjoying a Belgian beer!