Five Insights From My Time Building 3 SOCs and Consulting For Over 40 Fortune 500 Companies and Federal Agencies
Throughout my cybersecurity career, I’ve had the opportunity to build three Cyber Security Operation Centers (SOCs) from scratch, including two for Managed Detection and Response (MDR) providers. My time at Palo Alto Networks, where I served as a SOAR (Security Orchestration, Automation, and Response) Architect, was one of the biggest boons to my career, providing incredible insights that are hard to gain elsewhere. During this period, I worked closely with Demisto, now known as XSOAR. This role went beyond implementing technology; it was about acquiring deep technical knowledge of our client environments and their processes, enabling me to define and automate SOC processes for several Fortune 500 organizations and major federal agencies and their security monitoring program.
Every SOC I built was a learning experience. I observed firsthand which operational decisions and playbooks led to success, which were awkward to implement, and which simply failed. These lessons were invaluable and shaped my approach to cybersecurity. My consulting experience at Soteria and my startup further broadened my perspective, allowing me to work with a diverse range of clients and challenges.
One of the crowning achievements of my career was translating these processes and procedures into an award-winning program. The Cogswell Award from the Defense Counterintelligence and Security Agency recognized the overall excellence of the entire security program at the organization where I built my last Security Operations Center, with the new contributions of the SOC being a major contributing factor to this success.
Drawing from this experience, I’ve decided to write about some of the most important takeaways for building and running a successful SOC. These insights are for anyone looking to either build a new SOC, or perhaps, improve their existing security operations program.
Defining a Security Operations Center
Throughout this blog post, we'll explore how different organizations perceive and implement a SOC, and the roles they assign within it. For clarity, when I refer to a Security Operations Center (SOC), I mean the team primarily responsible for receiving and investigating alerts. The team that manages the underlying infrastructure may or may not be part of the SOC, depending on the organization.
1. Defend the SOC's Mission at All Costs
If you've heard me speak at a conference about building a SOC, you've heard me emphasize this point: You must defend the SOC's mission at all costs. Before managing or building a Security Operations Center, I didn't see the importance of a mission statement. However, after going through this process, I now find it to be the most crucial element.
As mentioned earlier, organizations and individuals often have varying perceptions of a SOC's responsibilities. Therefore, it's essential to define the SOC's mission from the outset with a clear mission statement, and to secure management's sign-off and support. Crafting a mission statement doesn't have to be difficult. When I first attempted to write one, I followed various online recommendations, such as keeping it short and strategic, which made it surprisingly challenging to create something meaningful. Instead of adhering strictly to these guidelines, focus on defining what you want the SOC to achieve. If a list of bullet points works better, then use that. The goal is to create a mission statement that is practical and genuinely useful, rather than a vague, unused document that conforms to typical internet advice.
A basic example:
The SOC is responsible for defining the organization's security monitoring and logging strategy, responding to suspicious system behavior from end users, investigating cybersecurity incidents and alerts, and triaging these alerts effectively.
When I reference a SOC, assume this is the mission statement of the SOC unless otherwise specified.
Why This Is Important
Unless you've worked as a Security Operations Center (SOC) Analyst and had to triage and investigate alerts, it can be hard to understand how disruptive certain tasks can be. This disruption, often referred to as context switching, is well-understood in the software development world but is equally, if not more, problematic when investigating potential security issues. Despite this, SOCs are commonly overloaded with tasks that interrupt their core mission. A phrase I've heard repeatedly in various organizations, and echoed as a problem from other SOC Managers, is:
"That's security, so send it to the SOC."
Because the term Security Operations Center is so broad, I.T. Operations often attempt to offload any security-related management tasks onto the SOC, even if the SOC lacks the staffing or training to handle these tasks effectively.
For example, early in a SOC's development, I've often seen I.T. Operations teams try to push off operationally intensive and tedious processes, especially those considered "tier 1" break-fix issues (e.g., email security gateways, firewall break-fix operations). These tasks fall outside the SOC's primary mission of investigating and triaging alerts, particularly without the necessary extra headcount. This misalignment is extremely problematic for several reasons:
- Priority Misalignment: Operational issues, even low-priority ones like releasing emails caught in a spam filter, often take precedence because they have immediate operational impacts. Consequently, investigative tasks become secondary, undermining the SOC's core mission of monitoring and investigating security issues.
- Talent Acquisition and Retention: The SOC struggles to hire and retain talent because the day-to-day work is not what cybersecurity professionals typically want to do. The emphasis on engineering to support operational tasks leads to lower quality investigative output. Those who acquire the necessary investigative skills often leave for organizations with more mature programs. The poor work output due to the high churn and skill set misalignment typically gives the SOCs poor internal reputations.
- Compounding Issues: The mundane and low-paying nature of the work makes it challenging to attract individuals with the proper investigative background. High churn and low-quality work output due to skill misalignment contribute to a poor internal reputation. This poor reputation often results in lower pay ceilings for SOC Analysts and perpetuates a cycle of low quality that is difficult to break out of.
This issue is not isolated to new or junior SOCs; even in mature programs, SOC Managers frequently need to push back on tasks attempting to be offloaded to their team. Once an organization reaches a certain size and has a mature cybersecurity operations program, it often becomes an attractive target for "one-off" requests from various external teams to handle tedious tasks or after-hours issues. While these external teams may have genuine intentions, they often do not realize how their individual requests can collectively add up and burden the SOC. Therefore, it is crucial for SOC Managers to know how to push back and be mindful of these accumulating requests.
One of the best recommendations for an organization starting a security monitoring program that fulfills the traditional investigative and triage roles of a SOC is to name it something other than a Cyber Security Operations Center. This term is fairly vague and can vary widely across companies. Instead, give the group a name that aligns more directly with its function, such as Detection and Response Team or Computer Security Incident Response Team (CSIRT). A name that accurately describes the objective of the organization provides innate protection against the tendency to offload unrelated tasks. For example, it is much more difficult to justify that the Detection and Response Team should maintain any sort of appliance than it is for a more generic name like Security Operations Center, which could imply a wide range of responsibilities. Reserve the use of Security Operations Center until your security monitoring program becomes large enough to contain multiple sub-teams, using the term SOC as an umbrella term referring to all of the teams working in tandem to support the security monitoring program.
2. Not Hiring the Proper Skillsets
I touched on this above to some degree, and as mentioned have written about it in the past in an article entitled The Analyst vs The Engineer. One of the biggest issues plaguing junior security monitoring programs is there is a lack of understanding on what skill sets are needed to hire for in a security monitoring program, typically overemphasizing and hiring very senior former systems administrators with little to no investigative experience. While I don't recall this being an issue in any of the larger organizations I worked with, it was the case in 100% of the new programs being built in areas or industries that are typically considered less "tech-forward." It is not an exaggeration to say that there were multiple security monitoring programs I consulted with that did not have a single person on staff who had ever actually worked on a cybersecurity incident from start to finish, and in more than one case, no one on staff had any hands-on incident response experience or former SOC experience whatsoever.
While it isn't mandatory and there are certainly SOCs out there that are exceptions to this, you will save yourself a lot of extremely painful and stressful lessons learned when building a new security monitoring program by hiring someone that has the experience of working incidents early on. Security monitoring programs are complex, and the work is typically high stakes. If you have a hard time finding someone with this skill set, it is worth bringing in consultants to help you get started.
3. Manage Your SOC Like a Product
One of the best experiences in my career was working at a startup with fewer than 15 people, led by a group of smart individuals/former NSA Tailored Access Operations members, who provided exceptional mentorship. Due to the small size of the organization, I was not only responsible for building out the MDR service but also became the "Product Owner" of the MDR product we were developing to sell as a service. I was involved in developing and identifying user stories and worked through the entire process from client sales calls to post-sale interactions discussing detections and other services. I have privately shared how I found this to be one of the most valuable experiences for building and managing a SOC. Red Canary recently echoed this sentiment in an outstanding article and webinar entitled Manage Your SOC Like a Product, which advises treating Threat Intelligence as your product manager to help prioritize and manage threats. I not only support their sentiment but also believe in expanding upon it.
Analysts often lose sight of the fact that cybersecurity jobs exist to enable a business to operate securely. Whether your SOC is part of an MDR provider or monitors a single enterprise, you have customers who expect a certain level of service from your SOC. For an MDR provider, the customer is more obvious, and is the end organization that consumes the alerts. In an enterprise setting, the "customer" may be internal technical teams or even non-technical teams such as the C-suite. In either case, you should be in constant contact with your end clients to gather information that can be used to enhance the value your SOC provides.
When interacting with clients or reporting to them, ask if there is anything they need to know to confidently make determinations. Turn these needs into user stories to improve the product output of your SOC. This approach ensures that your SOC is not only constantly improving but also adding value to your end clients. Many SOCs do not provide actionable reports or data that organizations need to make informed decisions. For example, are the number of cases trending up or down, and why? Where are we consistently weak in terms of visibility? These insights can impact decisions about headcount, prioritization of preventative controls, or tooling necessary for visibility.
4. "Assume Breach" and Emphasizing Detection Does Not Mean Giving Up on Prevention
When the concept of "assuming breach" first became prevalent, there was an emphasis on detecting post-exploitation activity, which can be difficult to evade, rather than relying solely on prevention. This approach was often misinterpreted to mean that prevention is not worth the effort and that detection should take priority, with prevention being an afterthought. While this mindset is less common now, it is surprising how little emphasis is still placed on the effectiveness of strong preventative controls, especially considering that these controls also fundamentally reduce the operational load of the SOC itself.
For example, using brand reputation monitoring services to track DNS registrations and TLS certificate transparency logs allows you to take down malicious infrastructure before it becomes operational and alerts you to active targeting. Restricting outbound ports to only those necessary for business can block many default settings on outbound malware and attack tools. Blocking entire TLDs known to be abused by malware, such as .top
or .ru
, while allowlisting those needed for business purposes, is another powerful preventative measure. Baseline processes on critical systems, or commonly abused processes with your EDR, and kill any processes that deviate from the baseline. Implementing these preventative controls can go a long way in preventing headaches, and the time you spend on these measures will reduce the number of alerts that come in on a Friday at 5 PM.
5. An Overemphasis on Tools and "Blinky Boxes" Rather than People and Processes
I considered not including this point because it's repeated so often that it's almost a trope. That hesitation made me step back and contemplate why this phrase remains popular, yet the "blinky box" still seems to win the day. Reflecting on my consulting experiences, I noticed a common pattern: despite having extensive security engineering roles like firewall administrators and senior server admins, almost none of the SOC management or senior technical staff in these cases had any background in investigating cybersecurity incidents. Because this overlaps with point #2, I hesitated again, but understanding the underlying issue can help address the problem. Therefore, despite the redundancy, I included it.
Not having an investigative background is critical and leads to numerous issues that strike at the core of the people and process problems in running a SOC:
- People tend to shy away from tasks they are not familiar with.
- Organizations struggle to create playbooks if no one has investigated a cybersecurity incident from start to finish, as they've never done it.
- Historically, SOCs were fairly secretive about their operations, making it difficult to find good information publicly.
- Often, the information that is public is at a strategic level, making it difficult to operationalize without knowing the tactical steps needed for an investigation.
The Way Forward
To address these issues, focus on building a SOC culture that prioritizes continuous learning and hands-on experience. Here are some actionable steps:
- Invest in Comprehensive Training: Ensure that all SOC staff, including management, undergo rigorous training that includes real-world incident simulations. This hands-on experience is invaluable and helps bridge the gap between theoretical knowledge and practical application.
- Perform Frequent Tabletop Exercises: These can be awkward even for very senior teams when they know they are part of the exercise. However, they almost always uncover areas for improvement and help reduce everyone's nerves during a real incident.
- Foster a Culture of Knowledge Sharing: Encourage transparency within the SOC. Create an environment where team members can openly share their experiences and lessons learned from past incidents. This collective knowledge base can significantly improve the team's investigative capabilities. Hold regular lunch-and-learns, even if they occasionally cover redundant topics. The act of teaching helps reinforce knowledge for the instructor and benefits attendees. Topics can include MITRE ATT&CK techniques, reviewing threat intelligence reports as a group, and asking questions.
- Leverage Mentorship: Pair less experienced analysts with seasoned incident responders. This mentorship can accelerate learning and provide practical insights often missing from formal training programs.
- Partner with a Third-Party Incident Response Firm: Early in your SOC lifecycle, if your organization lacks the investigative capability to properly investigate an issue, consider partnering with a third-party incident response firm. Identify a firm before you need it, as a retainer for an incident response firm is often cheaper than engaging them without an existing relationship. Unused retainer hours can often be utilized as consulting time, providing opportunities for hands-on training and custom playbooks for your organization. Establishing this partnership from the outset can be highly beneficial.
By emphasizing the development of your people and processes, you can create a more resilient and effective SOC. Remember, tools are only as good as the people who use them. A well-trained, knowledgeable team can maximize the potential of any tool and ensure robust cybersecurity defenses.
Conclusion
Building and maintaining an effective Security Operations Center is a complex and challenging task, but it is also one of the most rewarding aspects of cybersecurity. By defining a clear mission, hiring the right skill sets, managing your SOC like a product, and balancing detection with prevention, you can create a SOC that not only meets but exceeds the expectations of your organization. Remember, a well-run SOC is not just a cost center but a critical component that enables the business to operate securely and efficiently. I hope these insights enable you to be well on your way to building a world-class Security Operations Center.