What Is a Data Analytics Internal Audit & How to Prepare?

No one wants to deal with a data audit. You haven’t invested so much time into putting everything together just to have someone else come in and start raising questions. Hopefully, an audit will never happen to you. Still, the possibility that an audit could happen tomorrow is there, and this post is about what a data analytics internal audit is and how to prepare for it.

What’s the Point of an Audit?

What the internal auditor (that is, the person running this process) wants to do is to achieve a level of confidence in the data you have collected. The process can vary from one situation to the other, but there are several different things to check. The audit can be focused on access to said data, or it might also relate more to the confidence and traceability of the information.

The reasons for the audit may not be just internal, though; the internal auditor (IA) might want to check for compliance. Are you serving European customers? GDPR compliance is mandatory. Are you operating in California? Then they’ll check for compliance to the CCPA. While an audit of your internal data can be stressful, it can also bring significant benefits to the organization. So relax.

Proving that you comply with government laws not only increases your confidence, but it can also show your commitment to security policies. Also, as data increases in both quantity and value, it makes today’s companies more reliant on the intelligence coming from said data. It’s essential to be able to trust it.

What’s Going to Happen?

Great question!

It’s easy to become anxious when you hear the word audit. But let me dive a bit deeper into why you actually want an internal audit to happen. A proper audit of the data can help to uncover potential issues. A solid audit process (that is, having a knowledgeable and experienced IA running the process) will make your policies stronger. Moreover, the audit can help you reorganize and improve your data on a deeper level.

The audit methodology won’t change much from one auditor to the next. It’s a checklist-based process. At a very general level, there are a few things that need to be checked:

  • Access to the data
  • Location of the data
  • Condition of the data
  • Usage of the data

All of those items will be the go-to for any auditor, with some auditor prioritizing one point over another. I’ve purposely left out what you might call step zero. The first thing any auditor will do is to map out the data, trying to assess what kind of information they’re looking at. And even more critical in this step is classifying and categorizing the data.

A good auditor will also take into consideration the value of the data for your organization. It’s important to understand that the mapping stage sets the tone for the rest of the process. It’s a monster on its own and outside of our scope. In this post, though, we’ll focus on some other stages of the process—the company side of things.

Pay attention, because this won’t just help you understand what the process looks like. It’s also going to help you know how to prepare. Two birds with one stone!

Access

When an auditor determines “access,” they’re aiming to audit how the data is stored logically (as opposed to where they’re stored physically). They want to know whether it’s in your RDMS or being aggregated in your fancy ELK stack. Remember that, by now, the auditors already have a good idea of what kind of data you’ve been collecting. Now, the auditor will need to know who has access to this information. They need to know how the data is accessed, when, and by whom.

At this stage, it’s not uncommon for the auditor to conduct surveys with key actors in the organization, such as the engineering manager, the DevOps leader, and the database administrator (if available). Standard questions include how they received access to the information, what kind of data they have, and if they run into any problems accessing and using it.

The outcome of this stage of the process would be a chart of the data, data types, and who has access to the data. It’s not uncommon for the auditors to come up with their own checklists to fill out, or even resort to an audit methodology.

Location

The auditors will want to document and notate where you keep the data.  I know it sounds ambiguous, and I meant to keep it that way. The where part refers to both its physical locations and what kind of infrastructure you use to keep it safe. Generally, in this stage, we tend to think databases, but it doesn’t stop there. It extends to cloud storage, IM conversations, that one file you sent via email—everything.

It should be noted that at this point, you should be prepared to work with the auditor on credentials, especially for working with confidential data. In some dire cases, the auditor will need to sign a non-disclosure agreement, which mostly serves to protect both sides from an information leak. Part of preparing for this stage will require reaching out to your legal department and consulting as needed.

Another aspect of this section is to prove the traceability of data. We’ll expand on this below, but essentially, proving the traceability of data builds upon the consistency and reliability of the data. In layman’s terms, it’s how to track modifications to a piece of information over time.

Condition

A step up from the previous section, the next stage of an internal data audit deals with the condition of the data. The IA will try to assess the status and relevance of the data. Checking the integrity of the data is critical. If they haven’t done so already, at this point, the auditor may ask you for summary reports. The audit team will use the reports as the base for the procedure.

Sometimes, these summary reports will be the gateway for the IA to audit the data behind them. This is because the only way to prove good practices, as well as consistency, accuracy, and reliability, is to drill down to raw data. This stage also evaluates your data collection strategy and iterates how organized the information is.

This step can sometimes include verifying the knowledge of the users accessing the data. So, be prepared to summarize the critical information and walk the IA across your data sources. The audit team will expand on the collection of your assets and will help them establish a 360-degree overview of your information management.

Usage

The name of this stage should speak for itself. At this point in the audit, it’s important to assess how’s the data being used. This point is particularly important to prove compliance with some of the government policies mentioned above, such as GDPR and CCPA. As an example, one of the most critical factors is to determine how long you keep the data.

As a general rule, you should be prepared to justify the usage of the data you collect. Bear in mind that every piece should serve a purpose. A good IA will have mapped out the value of the assets already, but they may have some questions about what its place in the organization is.

A common situation that comes to mind is auditors finding duplicated data and data islands. Expanding too much on how to comply with the latest trends is outside of the scope of this post, but check out this post about security compliance if you want to learn some more.

Wrapping Up

A data audit is not something that only big companies will go through. Any organization can task their IA to check operational performance and data collection. I know it sounds scary, but it will only make your data strategy better, more consistent, and compliant with your security policies and government regulation. An IA works for the company and has your back. Help them help you.

Your auditors will come up with a formal report. Go through the report with your upper management and start implementing the suggestions. Some of them won’t be easy to apply, but in the era of the data and business intelligence, everyone will be more confident knowing there are procedures to audit the data collection and intelligence process.

This post was written by Guillermo Salazar. Guillermo is a solutions architect with over 10 years of experience across a number of different industries. While his experience is based mostly in the web environment, he’s recently started to expand his horizons to data science and cybersecurity.