The Secret to Winning IT Security Roulette
Cybersecurity can oftentimes feel like you’re playing roulette. It can also feel like a long night in the casino where the longer you stay, the more likely you are to go home a loser. IT security is much the same way., Your company may be okay for a while but deep down, you know it’s only a matter of time before you get hacked.
What should we be demanding from the security industry?
When we look at this industry today, the number of attacks and breaches are rising continually. We see them nearly every day in the news, varying in size and scope. When I asked our customers, many of them said that what they really need is a window of assurance – a window of time in which a CISO can feel comfortable knowing that if a vulnerability is exposed, the infrastructure is going to be secure. Then, in hushed tones, they also say that they want that window to be around 72-hours; they say it in hushed tones because they know it sounds ridiculous. But if this is what customers want, why aren’t we demanding it of the industry? Further, why isn’t the industry delivering it?
Generally, when I look at history, the circumstances around events are far more complicated than any one answer can provide. Just ask a historian why the Roman Empire fell. People have been trying to explain that for thousands of years and when they do, things get very complicated, very quickly! In our industry, it’s extremely rare that we find something that’s a definitive “smoking gun”. Well, how’s this for a smoking gun? 99% of infrastructure exploits occur where the security issue is already known to the organization. That’s astonishing.
Where does this problem come from? Where does the problem of a widening security gap in our infrastructure originate? Over the last 15 years, our ability to deploy applications, as well as our need to deploy them, has skyrocketed. Whereas organizations once had a team of five deploying five applications, they now have a team of five deploying 50 applications. We can simply deploy applications a lot faster than we used to, which means the size and scope of our infrastructure gets bigger, more diverse, and consequently, it spreads over multiple environments. We’re well beyond traditional firewalls being sufficient.
Defining the Security Tech Pyramid
Let’s take a step back and walk through the security industry today to see where we’re putting our money and our energy.We start by putting a lot of money into defining policies that we want to enforce. Maybe it’s based on CIS or STIG, then we sprinkle a little NIST in there and next we set up scanners and auditors all throughout our infrastructure.
It takes time and requires buying a lot of products, all of which have to be properly deployed and managed. Once the policies are defined and the scanners are set, the first thing people often discover is that their infrastructure looks like Swiss cheese, with vulnerabilities, compliance issues, and a substantial gap between the secure gold standard and the reality of the infrastructure’s actual security posture. In no time at all, everybody has more security alerts than they know how to deal with. When this happens, teams go out and scan, then try to prioritize which issues actually matter. But there are so many vulnerabilities that it’s downright overwhelming. The sheer diversity of threats is intimidating: compliance over here, insider threats over there, vulnerabilities that way, network issues this way and before you know it, people are buried in information. To comb through and analyse this mass of information costs even more money.
Once we’ve spent that money and built this fantastic system to deliver visibility into our infrastructure, we’ve consequently enabled the elegant solution with which everyone is familiar: the IT helpdesk ticket. However, the question still remains: How can the systems actually be fixed and made secure at scale? There’s many good reasons why the security industry struggles with remediation at scale, but it’s primarily because ops teams are terrified that if they let other teams in, they’ll ruin everything and take the infrastructure down.
This is not a completely unjustified fear. I hear cases all the time where operations says “yes, we’ll enforce these policies”, and something terrible happens. Software gets updated, applications die and the operations people come back and say “now I’m to blame because you’re trying to enforce ridiculous policies!” And while operations people today are far more security-conscious than they were even a few years ago, the problem still exists. A security analyst does not understand the nuances of how an application actually runs inside of an infrastructure. It’s not their job. This disconnect and lack of collaboration is a major obstacle for modern organizations.
Understanding security and IT people
Who are these people and what drives them? Security and operations (DevOps, admins, SREs, etc.) have fundamentally different priorities, motivations, and objectives. Security teams are all about infrastructure security and compliance visibility. The bulk of their work revolves around scans. Their expertise is centered around being able to identify explicit problems.
Consequently, they end up with incredible views of the infrastructure. Security teams start by gathering a significant amount of data, which helps them understand if files have changed in ways they shouldn’t, illustrates compliance with hardening policies, and shows new vulnerabilities. All of this information gets spun into a bonanza of security alerts and, ultimately, their pinnacle is to file a bug report and ask operations to take care of it. It’s no different than taking a problem and throwing it over a fence for someone else to fix.
Operations people are the ones in the trenches, maintaining systems. They’re also the people who are maintaining cloud infrastructure and deploying applications.
In short, security people identify issues. Operations people ensure applications are running and that new applications are deployed. Both teams exist to ensure that developers’ lives are easy, that their code flows smoothly into production, and that a site is reliable.
Do operations people care about security? Absolutely, but are they going to be hurt more today by the infrastructure going down and an app not being deployed, or by a security breach?
Let’s look at it through their eyes for a moment. The game of roulette is already being played because nobody knows when they’re going to be breached. When the breach does happen, where is the blame going to fall? Most likely, it’s going to land on the security executive and their teams. Operations’ motivations are not aligned with the security team’s! Where this problem really stems from is the fundamental disconnect about how both groups think about their jobs, how they address the problems that they’re dealing with and how they deliver results.
When something’s not working you need to try something else
Threat intelligence today is about identifying threats that can be exploited in very specific ways. However, intelligence is focused on a breach that can happen because the system is exposed to the internet.
Therefore, it’s this particular type of vulnerability that receives more attention. However, when teams look at these breaches after the fact, many of them actually come from the inside, such as a DevOps engineer intentionally or accidentally leaving an S3 bucket exposed to external bad actors. Modern threats don’t fit our traditional models, and this should cause security professionals pause.
Now that things are happening faster and in greater numbers, it doesn’t take long for issues to pile up. So what do we do? We need to stop burdening our systems with security software. We need to be able to deploy remediations and fixes using the same tools that identify them. We need to be sure that we have a hardened infrastructure. Companies need to be able to go in front of a Congressional committee when a major financial institution gets hacked and give them an answer along the lines of “we acted responsibly”. I think we can all agree we don’t see much of that.
SecOps is the key
This is where the concept of SecOps comes in. Now, SecOps as a phrase has been tossed around a lot, but SecOps as a concept and, more importantly, as a practice, has yet to gain real traction.
One of the main reasons for this is we’re still following that “scan and over the wall” approach. SecOps stems out of the philosophies of DevOps. When we go back to the root of that movement, it had less to do with configuration management systems and CI/CD pipelines; rather, it has more to do with opening up inter-team communication. The question then becomes, how do we enable better team communication, especially between operations and security? How do we make sure that our operations teams can enable security to make changes in a safe, governed way?
Applying SecOps best practices to reality
At SaltStack, we see effective SecOps workflows where security teams and operations teams are tied together. Security is gathering information, which is fed into a new SecOps management tool that we’ve created (plug, plug) called SaltStack SecOps. The goal is one platform that scans and remediates in a single cycle. We need to take the scanning, compliance and vulnerability data and make sure the scan itself is accessible to the operations teams, because they need to have that automation as easily accessible as possible. Sure, there’s always going to be nuanced cases that are difficult – that’s life – but teams should have tools that can automate at least 80% of the necessary fixes. The automation routines should be delivered to the operations team so they don’t have to build them every time. Some of you have already started cursing my name for saying that we should have tools that can automate 80%.
Trust me, I know there are many situations where an automation tool is not going to be able to magically present the absolute operational solution. But teams who retain the capability to automate remediation for many (or even most) configuration issues will be far better positioned to treat nuanced problems with the attention they demand, especially compared to competitors who haven’t automated any of the remediation work at all.
We also need to consolidate scanning tools. For people working in security, that may sound unattainable because there are so many different agents to do these scans.
With regard to compliance scans, many of the vendors have completely different agents; one may do a CIS compliance scan and another will do a STIG scan. They can’t even make one agent do the same thing! Organizations need to demand more powerful agents or viable agentless alternatives from their vendors that can provide combined scans quickly. Waiting anywhere from 24 hours to a week to execute a scan is no longer acceptable. Teams need to be able to kick off a scan and get something back in minutes or seconds. The security industry needs to focus on the complete picture of how teams deliver secure infrastructure as opposed to focusing simplistically on ”threat intelligence.”
There are few things in my life that I find more unhelpful than when someone walks into my office and says “Tom, there’s a problem over there” and walks out. Highly effective employees walk into my office and say “we have a problem, here’s what we’re going to do about it”. There’s a discussion, some refinement, a little buy-in and then we go! How helpful is it really if the fire department knocks on your door, tells you your house is on fire and leaves? That’s exactly what many security tools do: they don’t get a bucket or a hose, they just stand panicked on the lawn and yell, “Oh my gosh you’re going to die!” We’ve had enough talk. It’s time to look for security software that acts.