Are you responsible for the security of big data at your organization? Have you ever wondered if big data security is different from regular data security? Would you like to know more about some of the challenges of big data security?
If you answered “yes” to any of these questions, read on. Today, we’ll examine what makes big data unique and special. Then, we’ll focus on the high-level architecture and security challenges that come with big data. In the end, we’ll visit an imaginary car factory and have some fun with big data security. We have lots to cover, so let’s get started.
What Makes Big Data Unique?
The term big data refers to large volumes of unstructured and complex data that we can’t process or analyze using traditional methods. What makes big data unique is that it has some interesting characteristics and behaviors. These are volume, velocity, and variety—or the three Vs for short.
- Volume refers to the huge amount of data generated.
- Velocity means that both the speed of data generation and the speed to process and analyze that data occur in almost real time.
- Variety describes data complexity and refers to the lack of structure of that data.
Before we move on, there are more Vs to big data, like veracity and volatility, but we’ll leave those for another time and focus on big data architecture. After all, if we want to understand what makes big data security unique and challenging, we should be familiar with the big picture of its architecture.
Big Data Architecture: The Bird’s-Eye View
Our big data journey starts where big data is generated: with a large variety of data sources. Big data comes from social media feeds, emails, videos, audio data, IoT, and pretty much anything else connected to the internet. Next, all that data is sent to some kind of storage service for further processing. You can call it a data lake, a storage blob, or as I like to call it, the big data dumping ground.
Next, the data is processed and transformed through multiple steps, including classification, data enrichment, and cleansing. Following that, the data goes through business analytics. Now we’re in the realm of data mining, prediction models, and mathematical algorithms. The resulting valuable information and insights are sent to fancy dashboards and interactive graphs that allow people to make intelligent and data-driven business decisions.
Remember, big data flows through all these stages in near real time and in large volumes, which means there will be some unique security challenges for your organization.
Big Data, Big Security Challenge
First, let’s remind ourselves of the core principles of data security. We have to preserve data confidentiality, integrity, and availability. Let’s apply these principles to big data, with all its characteristics that make it unique. Next, we’ll factor in all the architecture complexities and we’ll end up with a security challenge unlike anything we’ve seen before
Generating and Sending Big Data
Let’s start with how big data is born. We have a wide variety of data sources, like IoT devices, smart devices, refrigerators, and so on that generate big data with various levels of security capabilities. These devices sometimes can’t protect their data.
In addition, big data must be kept secure during its lifecycle, which includes all the stages it flows through. In traditional data security, your data sources (databases, file shares, and so on) have more mature security capabilities, and you have better control over how they operate. This is not always the case with big data, and big data vendors should really step up their security game.
Big Data, Big Mess
Remember, big data doesn’t have a defined structure. It can be an innocent-looking metadata in log files or even payment card data, personal data, health, or other sensitive personal data.
Hold on, my “compliance and data privacy risk” sense just started tingling! This is a unique challenge.
With traditional data security, you have a lot of ways to protect data, because you have well-defined structures to work with. You could remove, anonymize, or tokenize sensitive data or choose from a wide variety of other security controls. But with big data, removing, anonymizing, or tokenizing data is a lot more complicated because you don’t have a data structure to rely on. This means your security controls have to evolve and adapt to monitor and protect big data.
Big Data, Complex Processing at Speed
We need to generate, capture, process, and analyze huge data volumes in near real time. This requirement makes big data security unique because data flows through various stages and complex architectures at a breakneck speed. This means a lot of technologies are involved, and if just one of these technologies fails, your security can be compromised.
In comparison, traditional data security and processing have more maturity with a wide variety of technical security controls to choose from. These allow us to monitor, detect, and alert security incidents. Traditional data flows are well defined, the architecture is more manageable, and we know that as long as the architecture is kept simple, data security is easier.
The Big Data Car Factory Assembly Line
I’m going to use a car factory example to illustrate what makes big data security unique compared with regular data security.
Imagine, every week car parts arrive at our car factory from a limited number of quality suppliers. Our suppliers have mature security standards and they send quality parts all the time. These materials have well-defined dimensions, and our engineers know exactly how to deal with them. Each part has been classified based on its importance and value. When our factory receives the parts, each component is identified and kept secure easily. Our supply chain is stable, and our engineers know how many parts they’ll get each month.
These car components move through our assembly line at a manageable speed. This allows everyone to monitor each car part closely at each stage and understand what happens next in the car assembly process. When things go wrong, our engineers can intervene and stop the production line at any time. Our engineers are safe, while all car components are secure and always accounted for. As a result, our cars comply with the highest manufacturing quality and security standards, and we are leaders in the car industry.
This car factory assembly line is like traditional data security. The car parts are the data you need to protect all the way until the car is ready to roll out from the factory. Now, let’s see how this analogy translates to big data security. I’ll introduce a few changes from our factory control room. Are you ready?
New Materials Arrive at the Factory
First, our car factory suddenly gets blasted with huge amounts of car parts and other materials from a lot of new suppliers. Those suppliers don’t always follow car manufacturer security standards. Additionally, the components come in many shapes and forms.
Our engineers can’t decide which parts are more valuable than others or how to secure them properly. This means the factory will need to spend a lot of time and money building a new automated system that helps us identify the car components that arrive at a much faster pace.
Car Factory Assembly Lines on Steroids
Until now, our engineers supervised and kept all the car parts secure as they moved through the assembly line at a manageable speed.
Now, we are cranking up the speed because our factory has to keep up with the large number of car parts flowing in. Our engineers can’t really see what exactly is happening with the components anymore; everything happens a lot faster and it all looks kind of blurry.
To make things worse, car parts are starting to fall off the assembly line and it’s hard to keep track of them. This means we have to build a lot more safety and security around the assembly line. Also, we may have to put in some extra hours to track the missing car components.
More Complex Assembly Lines
Next, I’m adding more assembly lines and many more steps to our car manufacturing process, which is now a lot more complicated.
I might outsource some steps and assembly lines to a third party because our factory can’t cope anymore. And I have to realize that this complexity and outsourcing could result in non-compliance with our car manufacturer security standards. This means we have to be even more diligent and do our research before we outsource.
Because I haven’t made a plan before I started making these changes, things are out of control. All of our engineers moved away from the assembly line to a safe distance and put their safety glasses on.
Suddenly, I hear a firm knock on the control room’s door. Our friendly auditor wants to know why car parts are all over the place and why the factory looks like a war zone. We will have to put in long hours to fix this.
Now, our car factory describes what makes big data security unique and special. First, we received all kinds of non-standard car parts. These components illustrate the ever-changing and unstructured nature of big data. This caused challenges for our engineers to classify and secure those components, which is similar to the challenges you’ll have with data classification. Next, we cranked up the speed of the assembly line and made the whole production line more complex, which is the nature of big data architecture. As a result, our engineers could not always account for the car parts which is what organizations definitely have to avoid with valuable big data.
The Car Factory Epilogue
Now you understand the uniqueness of big data security through the car manufacturing example. You also see the consequences of facing complexity without preparation and a plan.
To help you with this challenge we focused on the characteristics of big data. Then, we touched on big data architecture and its complexities and security challenges. Make sure you tackle these challenges as early as you can, and keep the safety glasses close, just in case.
This post was written by Janos Zold. Janos fell in love with IT while playing the beat ’em up video game Golden Axe as a child. Since then, he’s held various technical roles (engineer, consultant, architect, manager) in Hungary, Ireland and in the UK. He has a passion for information security and cloud (AWS and Azure), and he runs a consultancy that helps financial and telecommunication organizations with information security challenges. He’s an open source (and dog) enthusiast and an OWASP technical contributor who likes researching and sharing knowledge with others.