A Five Year Retrospective on Detection as Code
Five years ago, I co-authored the first public paper on the concept of Detection as Code. While having some technical peers review this paper, we found that a few more advanced security programs were already utilizing this sort of method, just not publicly talking about it. Our goal was to improve the way security teams manage and deploy detectors by incorporating DevOps principles into detection practices. The methodologies we introduced were inspired by software engineering practices such as Continuous Integration (CI) and Continuous Deployment (CD). In this reflection, I will discuss the evolution of Detection as Code over the past five years, share insights from client implementations, and explore current best practices in this approach.
Detection as Code Does NOT Mean Writing Detection Logic in a Programming Language
One of the biggest misconceptions about Detection as Code is the belief that detection logic must be written in a full-featured programming language. This is a significant misunderstanding. Limiting your detection logic to languages like Python can be detrimental to your detection engineering team.
Detection as Code means adopting the advantages of codifying processes, such as those found in infrastructure management. This includes benefits like native version control, easy reversion to prior states, and the ability to run automated checks to enforce controls. Most Infrastructure as Code tools, such as Ansible, AWS CloudFormation, and Kubernetes, use the YAML format. Similarly, the most common codified detection standard, the Sigma rule project, also uses YAML.
While some complex detection use cases may require the flexibility of a full programming language, relying solely on languages like Python can unnecessarily increase complexity, make tuning much more difficult, raise the level of effort for other detection engineers on a team to understand, and increase the risk of errors and false negatives. It can also exclude skilled detection engineers who may not have programming expertise, making an already difficult-to-fill role even harder to staff.
Detection as Code: More Than Storing Detection Logic in Sigma Format
A common misconception about Detection as Code is that it only involves storing detection logic in Sigma format. For those unfamiliar, Sigma is a generic format for writing detection rules that can be applied across various tools, with the detection logic stored within YAML files. While this is partially true, it overlooks one of the primary benefits of codifying your detection logic: the ability to apply a CI/CD pipeline, which provides tremendous value.
I won't reiterate the entire concept (you can read the original paper), but an effective detection creation process typically involves peer review. By incorporating your detection logic into a CI/CD pipeline before deploying it to production, many basic checks can be automated. This allows reviewers to concentrate on the actual detection logic and ensure it accurately matches the intended activity.
The Future of Detection: Embracing Test-Driven Detection (TDD)
A common challenge in detection engineering is that a detection may be written without having been tested. This can happen for various reasons; in my experience, it is primarily due to the lack of a way for a security analyst to safely test the activity they are trying to detect, leaving them to write what they think an event will include.
Typically, the detection engineering process at a high level involves receiving threat intelligence, writing a detection rule, and then creating a test to match the intended detection logic. This process can seem backward, and it often is. There are several reasons for this, such as the difficulty of setting up a safe test environment and the lack of a structured, repeatable way to execute the tests used to trigger a detection. Tooling in this area is only now maturing with open-source projects like Mitre's Caldera and Red Canary's Atomic Red Team, as well as other commercial tools in this space. Sometimes, tests are retroactively built to match the detection logic rather than the source intelligence, which, while similar, isn't necessarily the same. For example an analyst may misunderstand the behavior in the intel, and build their test to match the misunderstood behavior.
These issues can unintentionally hinder detection engineering teams. However, we can adopt lessons from other disciplines, such as Test-Driven Development, to improve this process. Test-Driven Development is a software development methodology where tests are written before the code itself. The same process should be adopted by detection teams when at all possible.
By executing the intended behavior in a safe environment with proper visibility instrumentation and then building the detector based on its output, you can ensure the activity is captured as expected and that your detection logic is accurate. In cases where creating a test is too labor-intensive for the expected impact, these instances can be identified early, allowing for additional review levels before going into production.
Expanding the Role of Tests Beyond Detection Logic
Good detection logic has several characteristics: it matches the intended intelligence, provides a clear description that gives analysts context, and explains why it triggered and what normal behavior looks like. These elements can also be tested, and it's important for organizations to ensure they are.
For instance, you can test the length and completeness of a detector's description to ensure it's well-documented. A description that is under 50 characters, for example, is very unlikely to be useful. You can also validate the proper formatting of YAML (or whichever format your organization uses) to prevent errors in your detection tooling and ensure any other formatting standards your organization may use.
Why I Believe Large Language Models Can Excel in a Detection CI/CD Pipeline
Companies today are integrating "AI" and large language models (LLMs) into every application they can, often without considering how well these models perform specific tasks. This trend is particularly evident in cybersecurity, where LLMs are frequently used to describe events, often inaccurately, which can have significant impacts on investigations and legal outcomes.
However, I believe there are areas where LLMs can genuinely enhance a security operations workflow, and auditing the content of codified detectors is one such area. For example, using an LLM to evaluate the descriptions of detectors and offer recommendations for improvements can be highly effective. LLMs can help ensure context accuracy, identify technical errors, provide suggested improvements, and tag detection logic with the relevant Mitre ATT&CK IDs. They can even catch potential logic errors, such as improperly written regex patterns.
While LLMs may have limitations in creating detection logic and should not be relied upon due to their error-prone nature, they can be very useful for lower-stakes tasks like evaluating detector descriptions and identifying potential logic bugs, helping to catch errors before they become significant issues.
Conclusion
Reflecting on the past five years since co-authoring the initial paper on Detection as Code, it's clear that the landscape of detection engineering has evolved and matured significantly. By integrating DevOps principles, we've seen notable improvements in the way detection logic is managed and deployed. Despite common misconceptions, Detection as Code is much more than writing logic in programming languages or storing it in Sigma format—it's about leveraging modern coding practices to enhance the accuracy, efficiency, and reliability of detection mechanisms. In the original whitepaper I co-authored, you can find a real production process that we used to implement some of these capabilities.
The future of detection engineering lies in methodologies like Test-Driven Detection (TDD), where tests are created before the detection logic itself, ensuring robust and reliable detections from the outset. If you can detect it, you can test it; it's just a matter of whether the level of effort scales with the effectiveness of the detection logic. Moreover, expanding the role of tests to include aspects beyond mere detection logic, such as descriptions and contextual accuracy, is crucial for a comprehensive approach.
Large Language Models (LLMs) also offer promising potential in this field. While they may not be perfect for creating detection logic, their ability to audit and improve existing content, identify errors, and ensure contextual accuracy makes them invaluable in a Detection CI/CD pipeline.
By continuing to adopt and refine these practices, we can build more resilient and effective security operations, ultimately staying ahead of evolving threats and ensuring robust protection for organizations.