Advancing digital transformation means protecting data in more places, faster and on a larger scale. However, detecting and responding to critical cyber threats at the speed of digital business is particularly challenging if you can’t accelerate your investigations. That’s where data sciences come in. Applied properly, they help us develop high fidelity security analytics that facilitate security problem-solving to alert faster and more accurately on what is truly malicious.
Understanding how data science is applied starts with three questions: “how much data do we have,” “can we get it labeled,” and “what is the objective?”
Let’s begin with data. Every security team has it. They have logs, SIEM and cloud buckets. That’s all great data but are they data sources for advanced analytics? Yes and No. Big data or data lake solutions are different. One piece of data – like an event log or alert – might need to be saved multiple times in multiple ways. This allows the data to be most available for the next steps.
The next step, a critical one, is labeling. Algorithms can work with any type of data if they understand what data to expect. Understanding a random piece of data on the fly is computationally expensive and not feasible at scale. By storing data in ways that are appropriate to how it will be used and then labeling it, we enable creative algorithms. The art of labeling comes from security experience and may include things like “good and bad,” “specific data fields,” or “is it binary.”
The third element of problem-solving is an objective. Most people would say “my objective is to stop the bad guys,” but of course it is more specific than that. “Objectives” in this sense must also come from real-world insights from experts who know both security operations and threat behavior, or more specifically an attacker’s tools, techniques and procedures. By understanding how attackers work, we can get granular with our insights.
Here’s an example: “We know the adversary would prefer to steal and use credentials for legitimate users.” Now we can craft a high-level objective such as: “identify credentials which may have been stolen.” The next level is, “What are the different ways to determine a credential is stolen?” We know there are approaches A, B, and C. The next step then is to define each approach and make sure the data necessary for each is collected, labeled and optimized for the algorithms’ use. The results of all those approaches are aggregated and analyzed to determine which has the best accuracy. When analysts are starting their investigations with the context they need to take action and protect the business, you’ll know you have the right balance of machine and human intelligence in your security operation.
At Secureworks we apply different data science approaches, or even a combination of them, to the variety of security problems our customers face. I invite you to join one of my Thursday Tours of the Red Cloak™ Threat Detection and Response (TDR) software to see how different approaches can work together to accelerate detection and response against real world threats.