Apparently we’re not meant to Automate Bonusly…

Everyone in their job has a low importance, easy to forget, process driven responsibility that always get’s forgotten….

…for me that’s Bonusly.

Every month, from my employer, I am allocated 100 ‘Bonusly points’ which I can distribute to employees as a way of thanking them for going above and beyond to help.

Most months I spend them all before the end of the month, but sometimes it slips my mind. This sounds like a job for SOAR:

  • Scheduled for the end of the month
  • Prescriptive in nature (aka tedious and distracting)
  • I need to visualise at the end
  • APIs are available

High Level Needs

I want to only automate unused points, so I will run the process at the end of the month (30th) to consume anything currently unspent.

There are a list of people that help me a LOT on Slack, so I want to distribute any remaining points between that list.

Therefore I will spend 10 points per Appreciation, and repeat until points<10.

There are many ways to build this, but for today I want to keep the design as simple as possible.

We're gonna Need a montage - SouthPark Bad Time meme | Meme Generator
(joke, I can’t afford a montage clip)

Implement: Playbook

Simply, each time it runs it checks if I have more than 10 points to give, if yes then select a user and submit the Bonusly request.

You can see in this Playbook every task has the “lightning” symbol which shows each task is fully automated, i.e. no manual steps/validation.

Implement: Scheduling

As I only want to spend “unused” points I want to only trigger this at specific times, so I will use the advanced CRON timing.

For anyone that doesn’t know CRON, this essentially says:

  • Run every 30 minutes
  • Between 09:00 and 14:00
  • On the 30th of every month

Dashboarding / Visibility

I always love creating dashboards, because with automation it’s easy to lose visibility.

So I have tweaked my incident layout for a quick view on each Bonusly ‘Give’

And built a mini dashboard for trends over time (distribution is sporadic and uneven as I develop the usecase):


Love: helpful people, saying thank you, SOAR

Hate: Wasting time, losing my allowance


Does Homebrew SOAR scale?

I don’t believe it does.

Short answer

“A lazy sysadmin is a good sysadmin”.

For 30 years people have been writing scripts to do our job for us, and it’s still a mess.

  • Different scripts, with different standards/styles
  • Hardcoded cleartext passwords
  • Running on different servers
  • Maintained by different teams
  • With no documentation
  • No error handling
  • No RBAC
  • No reporting/visibility
  • And when that employee leaves the organisation? All knowledge is lost

We’ve had the ability to script for 30 years and we are still in this mess.

Long Answer

Engineers typically design and build “bottom up” (rather than project owners who design “top down”).

It took only 20 minutes to get product A talking to product B”? High Five!!!

But as you add more technologies the integration permutations (not just combinations) of integrations goes exponential. 2 technologies is 2 bits of code (1 each way), 5 platforms becomes 20. You have 20 technologies to integrate? Now get ready for RESTful, SOAP, JSON, XML, Oauth, etc

As you start processing lots of incidents you realise you have lost overall visibility, so you need to engineer in dashboards, reports, alerting, all need to be both Engineer friendly and CISO friendly

Then you realise that to investigate and manage specific incidents you need full case management for particular incidents with chat, attachments, SLA timers, ownership, team members, etc

Then you realise the platform holds API keys (aka keys to the kingdom) so it needs encryption and hardening. Does this now require pentests and code reviews?

You then discover that repeatable playbook design requires UI friendly building and debugging for tasks, conditions, loops, subplaybooks, etc

As the platform grows you realise that integrating a workflow with people is equally crucial (questions over slack, email, questionnaires) so you need to bake in communication tasks, data collection, non repudiation, etc

Then you realise that Threat Intel is a huge part of incident enrichment and decision making, SOAR so you need to double up your case management to also represent each Indicator type

The business then wants to realise this huge investment by opening it to more than just SOC, but the interface isn’t overly user friendly. So you have to redesign it.

Now everyone has access you need to retrospectively add RBAC to everything

On a random Sunday night someone updates a key piece of technology, without informing you, the vendor API has changed from version 6 to version 7 and your playbooks don’t work, you have to start programming very quickly.

Then to really annoy you. management strategically change vendor alliances, so all your API calls need rewriting

…and so much more, starting to get the idea?

None of that is considered when the engineer first puts pen to paper and says “give me 20 minutes to get the basics working”.

We recently won a POC, they have been managing a home brew for years (compared to full SOAR it was tiny) and they simply got exhausted maintaining it. Every time they wanted two teak anything they had to essentially rewrite huge parts from scratch… when I showed them how SOAR has done all the basics, they realised they were fighting a losing battle.


Best Practise : How to POC/POV SOAR

“Best Practise” : commercial or professional procedures that are accepted or prescribed as being correct or most effective.

Over the last 2-3 years I’ve carried out a LOT of SOAR workshops and POC. Here are my findings:

Maturity of environment/processes/team

There are no prerequisites to using SOAR… however… SOAR was created to fix the pain. If the team does not suffer pain (yet), the value of SOAR might be hard to convey.

  • Has the prospect performed any process enough times to truly understand it? i.e. Are processes defined and understood?
  • Has the prospect identified which processes cause the most issues, and which issues? analyst burn-out, process deviation, slowness, etc
  • Has the prospect tried to create homebrew automation? And how long did they last before they realised it doesn’t scale (and unless you have the resources of Netflix, your homebrew won’t scale)

POC Success Criteria

Your SOAR project likely needs some executive sign off, and the exec team drivers are typically similar to the technical team but positioned different.

Usually the technical team want to see:

  • Speeding up time to visibility
  • Reduce clicks to resolution
  • Automate challenging processes
  • Time savings / quicker results to business
  • A reduction in stress
  • Reduced errors/deviation
  • Process enforcement
  • Alert consolidation
  • Deliver more with same team size
  • Deputise(/delegate) processes to other teams
  • More examples here

Whereas I often see executive teams have the same concerns but framed differently:

  • Strategically over the next 5 years we need to offer more services (either internal, or to customers) therefore we need to prove that SOAR can automate new business flows allowing expanded/improved service delivery…. aka automate processes
  • To comply to regulation xyz we need certain process to be enforced, full and perfect audit records…. aka process deviation and logging
  • The cost of the platform is compared directly against the cost of increasing team size (which is hard due supply/demand of skilled IT analysts/workers). Therefore for every $xxx of platform cost, you need to automate yyy Hours… aka time savings
  • etc
Interestingly, in 3 years I never hear companies say "we need to use Automation to decrease this team size to save money".  
It's always the opposite, we need to grow but we're simply unable to, it's about doing more with what you have, not doing the same with less. 

With these high level goals identified we know if POC will focus on Automation vs Case Management vs ChatOps vs TIP/TIM vs reporting and visibility vs ….

Preparation and Delays

By far, the biggest delay in any POC is internal preparation. For each technology that SOAR integrates with, we need a password/API key/private cert, we need network access. It’s not always the security/SOC/IT team who owns every technology, this often leads to big process delays. The same issue with firewall rules, network access, etc. Be prepared to power through.

The good news is I can fall back to other options if integrating with your technology isn’t possible. Here are some examples:

  • If your internal ServiceNow team are unwilling/incapable to create a dummy account for testing, create a Developers instance at SNOW direct. Their dev instances only lasts for 10 days but is great to use and abuse to prove the functionality
  • If you can’t integrate to technology <ABC> directly, consider ingesting logs via existing technology paths, e.g. SIEM
  • If integrating with Active Directory is tough, ask your SOAR vendor to provide a mock integration that generates/consumes fake data as a way to show the workflow processing and completing

Think Blueberries not Watermelons

To begin with I strongly recommend small usecases that saves you 10-20 minutes of work, multiple times a day. This quick win is easier to design and deploy, and you’ll notice the impact more. Small and often, think Blueberries.

As SOAR matures, start to consider usecases that are huge but infrequent, there is still value here but it isn’t always as quick to realise. Infrequent and huge, think watermelons.

Think C-P3O not Skynet

You might be ready for automation, but elements of the business might not. Also I find many companies don’t understand their process as much as they thought they did.
Therefore when designing a usecase, I don’t plan for SOAR to do 100% of the work and make 100% of the decisions. This would be Skynet, and Arnold Schwarzenegger will teleport from the future to stop you.

Start smaller with SOAR performing the laborious work, but not doing any decision making, use manual checks at every important step. Remember SOAR should take the arduous work away from you, allowing you to merely supervise, authorise and guide workflows.

Think C-P30

Think Avengers, not Thanos

Each time you acquire a new technology you have to decide which vendor is best. I’ve often seen companies say “product A had 92% detection, product B had 93% detection, therefore B is better we shall buy their warez”

How often do you rate a technology in it’s ability to work as a team?

In my experience, the best setups emerge when we get multiple technologies working together as a team getting the best out of each other, like the Avengers.

A technology that is 1% stronger than each of it’s competitors, but can only be operated by a human analyst will fail when compared to team work, just like Thanos did.

I hope this is helpful?


Deputising vs Delegating in SOAR

Deputization Versus Delegation
Delegating means “do this task and bring it back to me.” Deputizing means “own this process and bring me the results.”

For this article I’m going to look first at a DevOps use case, then a workload use case, and lastly a SOC usecase.

Use Case: DevOps deputising an entire process to Me

As a Sales Engineer a common workload I have is creating many cloud based POC environments. The process is quite long, and involves managing multiple platforms:

  • Deploy Virtual machine instance
  • Patch and update the software image
  • Configure basic setting
  • Allocate IP, synchronise DNS, tag the machine
  • Manage asset management, billing management
  • ….and lots more

It’s a long process, with lots of change management, it includes lots of other platforms. Our DevOps team don’t have the resources to manage this for all the SEs around the world, but also they don’t want to give me access to all the underlying platforms. So how do we manage this?

Answer: Deputise the process to the SE using automation to handle the workflow

Imagine a form that I can complete, that asks some very simple questions:

This data is fed into the “New Investigation form”:

  • Owner: Allows SOAR to automatically find and ask my manager for approval
  • AWS Region: Tells the AWS API calls where to create the POC
  • Company name: Used for billing/asset/inventory control
  • etc

In the next 4-5 minutes the playbook does all the processing for me, configuring, installing, modifying, tracking, etc. As this work is done for me, I can concentrate or more important thing elsewhere.

This is a great example of DevOps safely deputising a process to the SE team. I have no access to any equipment, yet 24/7/365 with no advance warning I can spin up my own POC environment in 5 minutes because I run the process but DevOps own it. DevOps only need to be involved where something goes wrong.

Consider your own work environment, are there any services your team would like to offer the business in a quicker/safer way without losing control?

Use Case: Delegate a single task

When running a POC I sometimes need help from an architect to complete 1 step of the process. I don’t want them to own the entire process, but I do desperately need their help for one bit.

A playbook can, automatically or by your choice (as with this example) assign specific tasks to a different team or a specific person.

If I chose “Assign Architect” the playbook takes the Right hand path and assigns the task to the Architect team. The investigation is now assigned to them, and only their user/team/role can progress through this task.

As this task belongs to a different team, I don’t want their workload to affect my SLA, so let’s create a specific SLA for just this task:

With Delegation, I own the overall process, but a particular task can be automatically given to another member of the team (with their RBAC, SLA), or even a completely different team, but the ownership of the workflow, all the graphs/dashboards and reports show that the work is still mine.

Use case: False positives and tuning a signature/policy: Delegate or Deputise?

Imagine a noisy IDS signature creating many False Positives every day, I need the signature/policy improving, and that responsibility might be with a different person/team.

This could easily be built as a simple Delegation, a single task in a playbook assigned to someone else, that is a spur or added on to the main workflow:

Deletation is good, delegation would work.

But if we change this and Deputise it we can get a more controlled process:

As the Analyst, by answering “Yes” the current investigation creates a new investigation, assigns it to someone else, but I can still track the progress from the original ticket, my own ticket.

In the below screenshot, I am “AndyAdmin” and I own the ticket to investigate the alert. From here I can see that a new incident was created to Tune the alert, I can see it was assigned to Laura and I can monitor the status.

This way:

  • The original analyst doesn’t have to wait for tuning before they can close the investigation
  • Monthly reports correctly show 2 separate tickets by the team
  • Each team has cleaner/simple SLA and dashboards
  • We can have 1-many relationships of investigations for tuning
  • Tuning investigation can be reopened cleaner (as not linked to an investigation)
  • etc

This isn’t going to win a Nobel Prize, but it is an effective way to automate the inefficient way most organisations handle assigning tickets, tasks, delegating, etc across all the different teams.



Auto validating BitCoin Miners with SOAR [Guest Post]

They say the best way to learn is to do and this applies to nothing better than technology. As a relatively new SE at Palo Alto, I knew that if I tried to learn XSOAR by watching hours of videos and reading articles, I would never build that muscle memory that comes from repetition. With that in mind, I decided to find a use case for it and then work out how to make it do that.

After experimenting with automated jobs to block and allow kids games taught me the basics of jobs and incidents and alerting me when high risk URL categories are accessed taught me how to get log data into XSOAR, I wanted something useful. Something that would increase my security and automate a job that I simply would not have the time to carry out myself.

That Golden Goose came in the form of Bitcoin. Now, I know this sounds like the opening line of almost every crypto schill you’ve ever heard, but rest assured I’mnot going to try to sell you unicorns and crypto Rainbows. I run a Bitcoin Node at home. I don’t make any profit from it, but I believe in Bitcoin and running a node to calculate transactions on the Blockchain helps the community and ultimately the currency itself.

The Problem?

I have to open up my firewall to the entire world. Now, because I have a Palo Alto NGFW at home, I can restrict that traffic to ‘Bitcoin’ specifically and only on the default Bitcoin port, but I’m still completely open to any potentially malicious actor. Checking the reputation of IP addresses manually and then blocking those of poor repute is possible, but there are several Bitcoin connections per minute and checking each and every IP would be neither enjoyable or possible without giving up my day job and becoming a reclusive hermit living in the Jundland wastes.

This is where XSOAR came in. Due to its many integration points, I was able to instruct XSOAR to check the reputation of incoming connections in Palo Alto’s Threat Intel Autofocus and then block further connections from that IP in the future, if said IP’s reputation is poor, by adding the IP address to an externally hosted text file (EDL) that Panorama is configured to pull data from every hour. The reason an EDL is used, rather than pushing IP’s to policies, is simple; no change commit is required from Panorama, making the whole process simpler and more dynamic.

The playbook is as follows:

  • Dedupe Incidents – I noticed many connections are from the same IP address. This sub playbook recognises that fact and closes the case if the source IP has been seen within the last hour.
  • Check for source IP – Allows for a graceful end to the playbook if an IP address is not present in the log that generates the incident.
  • Send IP address to Autofocus.
  • If Reputation is good, close the incident.
  • If Reputation is bad, enrich with IPInfo for investigation, add the IP addressto an EDL that Panorama is constantly pulling data from and email me to make me aware that this has happened.

But there was a problem. My XSOAR server sounded like it was about take off after a few minutes of operation. What was happening? I didn’t take into account the sheer number of inbound connections and the processing required just to run the de-dupe playbook. I was looking at hundreds of incidents being opened per hour.

This is where pre-processing saved the day! Rather than have XSOAR spin up docker images every time a dedupe automation was needed; I added a pre-processing rule to take on some of that burden.

This is how my dashboard looked after 12 hours:

For the IP’s that fail reputation checks, the cases stay open and I receive an email notification containing the IP address and Autofocus’ verdict.

The cases are open for easy access and to allow me to investigate further if I wish, but there is nothing required from me, as the IP addresses have already been added to the text file that my Firewall is reading from for IP addresses to block.

No businesses in their right mind would run Bitcoin node in their Data Centre, but this playbook could be used for any service that has to be open to the outside world. Mail and Web servers must accept connections on port 25/80/443. They may be in your DMZ but do you know who’s attempting to connect to them? Do you check IP reputation before allowing connections to them? Using the IP enrichment info, you could also block IP ranges based on Geographical location to drastically reduce attack surface if you wish.

But it doesn’t end there. Because of XSOAR’s many integration points, these IP addresses can be shared with other technologies:Have a custom policy within Mimecast or Proofpoint for blocked IP’s? XSOAR can hook into and add to those policies for you. Using cloud security technologies? XSOAR can also have them add the IP’s to their blocked lists. XSOAR is like an Octopus amid all these disparate technologies, holding them together and orchestrating collaboration.


Amazing article, thanks Ben for the concept, your work building it, and time typing it up for everyone to read!


Major Incident Management use case [Guest Post]

Editor’s note: A massive thankyou to Patrick Bayle for today’s guest post on using a SOAR playbook to handle major incident management comms.

When does a use case be classified as something other than a use case?

When posing the question to customers I engage with, the responses are fairly typical from a small subset of common challenges focused around security incident management. Some typical examples as follows: 

  • triaging SIEM alerts
  • malware investigations
  • Phishing

And so on. It’s hard sometimes not to feel like I am a terrible game show host seeking audience participation but I would much rather this than hear crickets!

I am nothing if not prepared for such engagements and having first hand experience helps when focusing attention to the matters that would bring MOST value to the SOC and one such example is common but rarely thought of in the realms of incident handling. 

In an industry plagued with three-letter acronyms for stuff this one appears to have slipped through the net, for now at least. I have heard it described as “Major Incident Management” (MIM) or “Critical Incident Handling” (CIH). Neither the acronym nor naming convention matters in truth, yet this is one challenge that can have serious repercussions to a business not automated. Pertaining specifically to MIM/ CIH, the SOCs goal is to:

Ensure a consistent methodology is applied during a major/ critical incident and regular communications occur during the investigation of said incident. 

If you’ve ever had to firefight (metaphorically or other) then you will know that communication is often automatically placed to the bottom of the list as the priority is to fight fire (obviously). A SOC analyst has to diagnose and mitigate a threat as quickly as possible and naturally all attention is on performing this duty. Updating management slows the response and detracts from the task at hand so quite simply automation is the only option. 

How about this: a playbook that runs on a schedule and sends an email with a predefined format to a distribution list in the event of an incident that exists that matches “critical”. That would solve the problem of communication whilst ensuring that the SOC can do their thing and put the fire out as quickly as possible! Well look no further as we have a playbook for that:

The playbook logic is very simple: If there is no critical incident, then no email is sent; if there is a critical incident then the playbook generates the report at a schedule defined by the business (in my case the requirement was to update management at least every thirty minutes but this does vary). Here’s the filter on the search incident task:

And the mail received, with an attachment and a small amount of text with the incident numbers (customisable of course):

The attachment (funnily enough, also easily customised) is designed for management’s perusal. A separate email with much more detail can also be sent to the SOC manager… but this is probably unnecessary as they should be using the default XSOAR dashboard that shows them this 🙂

I have no doubt that every SOC has this need but maybe they just don’t know it yet?

My closing advice: always think broader than the two or three incidents you work on regularly or the most annoying cases you work on within the SOC. The business should know the value the SOC brings by thwarting attacks in a timely manner and easily demonstrate your value to the organisation.

Awesome post Patrick I look forward to the your next one 🙂


XKCD on SOAR metrics

Is It Worth the Time?

That is to say that… a saving of 5 minutes, against an action that happens 5 times a day… you’re allowed to spend 4 weeks making it and still be in the green.

I’ve never spent 4 weeks purely on a use-case, that’s insane. Many playbooks I build take about ~1 day (that includes building, testing, case management, reporting, SLA metrics, etc), with some more complicated playbooks taking a few days.

I don’t recall ever needing 5-6 days, but even If I had…. that time is justified on a use case that:

  • is once a week 30 min saving (Green)
  • is daily 5 minute saving (Blue)
  • is frequent 1 minute saving (Red)


(Randall you’re awesome, keep writing those books!)

Automated Testing of Defences and Alerting

Yes, defenCe, not defenSe, I’m British darling.

When I was a SOC team leader (before SOAR existed) I tried to build automated processes to confirm technology and process worked as excpected. Even though I suffered from scalability, my aim was to test:

  • Was existing technology blocking known bad as designed?
  • Were alerts being raised to my Analysts?
  • Was the team reacting quick enough?

Technology Configuration Testing

Over time, policies and allow/block lists get abused by inexperienced staff making unsafe/incorrect changes.

(I once saw “allow encrypted PDF” at the top of a proxy config. #Fail)

Imagine a playbook that could:

  • Test Web policies by downloading an encrypted zip
  • Test AV by downloading Eicar
  • Test firewall policy by connecting inbound HTTP 80 to your DMZ
  • Test SSL policy by connecting to an invalid Certificate

We could run this playbook every 60 minutes, and any test that “fails” can create a Critical Severity incident for the team to investigate WHY it was successful.

Testing Alert-Workflow

Referencing the “Connect to known C2C” validation above as an example, this should be blocked, but even when it is blocked we can test more:

  • Was the HTTP block logged in your LogStore/DataLake/SIEM?
  • Was this malicious request raised as a new Alert to your analysts?

Can we check this automaticaly, and check whether the alert creation is happening quick enough?

This kind of playbook can be left running for weeks, and you only get involved if an alert fails to be created. That’s a lot of peace of mind for a very small amount of effort.

Alternative Usecases

The list of actual test is endless, but if I still ran a SOC here is a simple list I would want to create for endless validation:

  • Bruteforce a random account and test if it becomes locked out
  • Is inbound password spraying detected?
  • Test inbound checks of SSL Cert validity and TLS1.0 handshakes
  • Inbound port scans, unsecure protocols
  • Add a new account to a sensitive OU (e.g. Domain Admins) and see if anyone notices
  • Run encoded/obfuscated PowerShell against endpoints
  • Probe internal lateral movement to sensitive networks
  • Large file transfer, transmit easily detected PII

What else could you test?


Python Cheatsheet for SOAR

I find that the vast majority of vendor integrations and playbooks automations are 90% identical: ingest inbound data (array, object, etc), parse through it whilst validating and extracting, then finally pushing it out.

This means I use the same code aaaaaall the time. So I thought I would make a little cheatsheet for the basics (to prevent me googling the same things over and over). This list will change/grow over time.

Not covered in this cheatsheet:

  • Local file handling (open, read, close, etc)
  • HTTP and SSL request/replies
  • Time/date handling

To keep formatting simple, “<<tab>>” respresents a real Tab in the code

#Common Lib Imports
import json, re, pprint, time, random, base64
#Basic string
myString = “hi”
myString += ” I added a bit”
myNum = int(myString)
myInt = str(myNumber)
if varA > 5:
elif varA > 0:
Basic structure
Remember indents are tabs
if myString == “compare me”:Simple string comparison
if myVar is None:None / Null
if not myVar:Check if value exists, but is empty
if (a == b) and (not b == c):
if (a < 5 < b):If both are true
if isinstance(myVar, list):list, dict, str, int… etc
#Lists (arrays)
myList = []
[output1, output2] = myList.split(‘character’)
myList = myString.split(” “)
myString = “-“.join(myList)
#Python Dict
myDict = {}alternative >> myDict = dict()
myVar = myDict.get(“key”, default_value)Extract a value into another var
myVar = myObject[‘key’]Similar to above, but will return error if not found
myDict[‘key’] = ‘value’Set a new value to the Dict
if myKey in myDict:Check if key exists
myList = myDict.keys()Same for “.values()”
my_dict.pop('key', None)Safe is key exists or not
del my_dict['key']errors if key doesn’t exist
#Json(Technically not a Dict, but handled very similarly)
import jsonImport to code to use the calls
myJson = {}
myJson = json.loads(string)Where string is in the form  “{‘key’:value}”
myStr = json.dumps(string)
myString = myJson[‘key’]
myJson[‘key’] = newValue
myJsonAsString = json.dumps(myJson)
if myKey in myJson:
for value in myList:
Loop a List
while condition == True:Simple while
for myInt in range(5):myInt will be 1,2,3,4,5
for index in range(len(myList)):index will represent a number of the position in array
for key, value in myJson.items():Loop through Json/Dict
for key in jsonObject:
<<tab>>value = jsonObject[key]
Alternative to the above
breakBreak out of the current loop (1 layer) to next code
continueStop processing this loop, and go to next iteration of this loop structure
newString = re.sub(r”pattern”, “replaceWith”, targetString))regex substitution
arrayResults = re.findall(r’pattern’, targetString)regex findall
matches = re.match(r”Goodbye”, “Hello”)
if match is None:
regex match and test
objectResults =“[a-z]+”,myVar)Returns complex output
encoded = base64.b64encode(‘Hello World’)Encode a string into Base64
readableData = base64.b64decode(encoded)Convert Base64 back into the original string
<<tab>>print(“Something went wrong”)
<<tab>>print(“The ‘try except’ is finished”)
Remember tab indents
print()Simple print
pprint()Print more complicated objects
from pprint import pprint
#Bits and Bobs
myInt = random.random()
Int between 0.0 and 1.0
randomInt = random.randint(1,100)Int between x and y
print(type(x))To get the Type of a variable