Python Cheatsheet for SOAR

I find that the vast majority of vendor integrations and playbooks automations are 90% identical: ingest inbound data (array, object, etc), parse through it whilst validating and extracting, then finally pushing it out.

This means I use the same code aaaaaall the time. So I thought I would make a little cheatsheet for the basics (to prevent me googling the same things over and over). This list will change/grow over time.

Not covered in this cheatsheet:

  • Local file handling (open, read, close, etc)
  • HTTP and SSL request/replies
  • Time/date handling

To keep formatting simple, “<<tab>>” respresents a real Tab in the code

#******
#Common Lib Imports
import json, re, pprint, time, random, base64
#Basic string
myString = “hi”
myString += ” I added a bit”
#Convert
myNum = int(myString)
myInt = str(myNumber)
#If
if varA > 5:
<<tab>>print(“High”)
elif varA > 0:
<<tab>>print(“Medium”)
else:
<<tab>>print(“Zero”)
Basic structure
Remember indents are tabs
if myString == “compare me”:Simple string comparison
if myVar is None:None / Null
if not myVar:Check if value exists, but is empty
if (a == b) and (not b == c):
if (a < 5 < b):If both are true
if isinstance(myVar, list):list, dict, str, int… etc
#Lists (arrays)
myList = []
myList.append(“bob”)
[output1, output2] = myList.split(‘character’)
myList = myString.split(” “)
myString = “-“.join(myList)
#Python Dict
myDict = {}alternative >> myDict = dict()
myVar = myDict.get(“key”, default_value)Extract a value into another var
myVar = myObject[‘key’]Similar to above, but will return error if not found
myDict[‘key’] = ‘value’Set a new value to the Dict
if myKey in myDict:Check if key exists
myList = myDict.keys()Same for “.values()”
#Json(Technically not a Dict, but handled very similarly)
myJson = {}
myJson = json.loads(string)Where string is in the form  “{‘key’:value}”
myString = myJson[‘key’]
myJson[‘key’] = newValue
myJsonAsString = json.dumps(myJson)
if myKey in myJson:
#Loops
for value in myList:
<<tab>>print(value)
Loop a List
while condition == True:Simple while
for myInt in range(5):myInt will be 1,2,3,4,5
for index in range(len(myList)):index will represent a number of the position in array
for key, value in myJson.items():Loop through Json/Dict
for key in jsonObject:
<<tab>>value = jsonObject[key]
Alternative to the above
breakBreak out of the current loop (1 layer) to next code
continueStop processing this loop, and go to next iteration of this loop structure
#Regex
newString = re.sub(r”pattern”, “replaceWith”, targetString))regex substitution
arrayResults = re.findall(r’pattern’, targetString)regex findall
matches = re.match(r”Goodbye”, “Hello”)
if match is None:
regex match and test
objectResults = re.search(“[a-z]+”,myVar)Returns complex output
#base64
encoded = base64.b64encode(‘Hello World’)Encode a string into Base64
readableData = base64.b64decode(encoded)Convert Base64 back into the original string
#Try
try:
<<tab>>print(nonExistentVariable)
except:
<<tab>>print(“Something went wrong”)
finally:
<<tab>>print(“The ‘try except’ is finished”)
Remember tab indents
#Debug
print()Simple print
pprint()Print more complicated objects
from pprint import pprint
#Bits and Bobs
random.seed()
myInt = random.random()
Int between 0.0 and 1.0
randomInt = random.randint(1,100)Int between x and y
time.sleep(1)

SOAR helping out Unstable Server/Service

Socops.Rocks is hosted on a WordPress site:

  • Pro – WordPress allows for quick easy deployment
  • Con – WordPress gets attacked a lot, and crashes, needing restarts

The problem can be described:

  • 24/7 monitoring
  • When an outage is found, start a process
  • Require approval from team
  • Fix the situation automatically
  • Full Audit log

SOAR to the rescue! We need:

  • Testing criteria –> GET & HTTP Response Code
  • Automated frequency –> “Jobs”
  • Process approval –> Me
  • Remediation –> Reboot box (/restart service/other)

Job 1 – Configure a SSH integration using a secure SSH Key (i.e. not password auth)

Job 2 – Configure a Task to connect to Linux and issue a reboot

Job 3 – Build a very simple workflow around the ‘Reboot’ task. If we get a HTTP 200, simply close the ticket, ‘else’ ask the sysadmin whether to issue a reboot.

Job 4 – Create a schedule to run this process every 5 minutes

Job 5 – Enjoy life and at a social distancing BBQ

Job 5.1 – If/when needed, approve the process (here I’m using the mobile app… because I’m at the BBQ)

Job 6 – Make it a Dashboard / Report

There are of course many improvements I can make (and I probably will to squeeze a second blog article out of this….)

  • Reboot, wait 180 seconds, and retest HTTP 200
  • Check the HTML content for unpredicted changes
  • Check SSL cert validity
  • Restart web service instead of a reboot
  • Download last 20 log entries (pass through Threat Intel Platform)
  • Etc

Without a video it’s hard to show this in action, but I’m happy to say that it works perfectly.

Result

  • With no manual labour, every 5 minutes, if there’s an issue, I get a mobile notification to ask for my authorisation to reboot
  • I can reboot the server from anywhere in the world without needing my SSH keys with me
  • Full audit log, easy to expand
  • Dashboards

Andy

Auto closing tickets based on workload

Last week I had an intersting chat with a security team:

  • Our workload is very unpredictable
  • We want SOAR to intelligently auto-prioritise incidents
  • And when we are ‘busy’ auto close low priority tickets
  • but we still want automated IOC enrichment, full auditing, etc

Coupled with intelligent prioritising this is a great idea

Request : “if workload is high, auto close incident”

  • After a new incident workflow enriches, we calculate the current team workload
  • For every open incident: Priority1 = 4 points, open Priority2 = 3 points, etc
  • If total points is >20 then auto close the incident with a note “auto closed due to too much workload”

This is great, but I see an improvement. Workloads change very quick, you might be busy right now, but in 1 hour everything gets resolved, then you have no tickets to look at.

My alternative: “create, enrich, wait, auto close”

  • Any low priority incident starts a 3 day timer
  • Incidents are assigned to the team, not an individual
  • If an analyst has capacity then can self assign and now own the ticket
  • If the incident isn’t touched in 3 days it is auto closed
  • We create dashboards that look at the incident count per close duration
  • This dashboards show how many incidents / type are closed without being looked at

I’m an ex-analyst, I know that low quality alerts can contain valuable information, we don’t always have the time, but that ticket still needs enrichment for future analysis if we need to come back to it.

At least using SOAR for automation you ensure that:

  • The incident was logged
  • The details were enriched
  • You were able to reach out to members of the company to validate
  • Auto log all information/decisions for future audit and reviews
  • The playbook had the option to double check the alert is low priority (and self re-prioritize if not)

…which is significantly more than I was able to control a few years ago 🙁

Andy

Intelligent SLA vs Knock Knock jokes

A man walks into a bar, ouch

This is a quick ‘joke’, it takes 2 seconds to say, everytime I tell it. I have no concerns giving a SLA for this joke. On the other hand…

Knock Knock…

Whilst I know *I* can tell this joke in under 5 seconds I’m entirely relying on the person I’m talking to, is it representative to apply a SLA on me?

Compare this to a SOAR playbook: any local task we have control over, but it’s not so simple when we wrap a business process around this:

  • Any interaction that involves human input (especially where that person is not part of our team, and we can’t kick them)
  • A query that potentially takes hours to complete
  • Unstable technology we can’t change
  • Technology belonging to another team

So how do we apply such SLA to playbooks ?

SLA for an entire Incident

Pro – Quick to configure. Great for small simple playbooks.

Con – Very inflexible.

A timer starts with the incident, if the ticket takes longer, we have a SLA breach.

SLA for each individual task

Pro – Finely tuned

Con – Administrative overheads building and maintaining

Start a timer for each specific task, if that task takes too long we can either alert, skip the task, or take a different playbook route and escalate the process to the senior team.

Timed Section

Pro – Flexible. quick to deploy

Con – none?

E.g. Task 1 starts timer, task 5 pauses it, task 7 resumes it, task 10 closes it.

Knock Knock (including SLA)

  • The “Joke SLA” represents the entire incident
    • Terminology “Incident SLA”
  • The “My Team SLA” stops and starts
    • Terminology “Timer”
  • The “Punchline SLA”
    • “Task specific SLA”

Andy

How SOAR saved my marriage!

After recently getting married, I quickly discovered my wife was cheating on me, with these three:

As soon as I left the house to go shopping, seeing the family, or working away my (NOT-SO-)good lady would jump into bed, turn on the TV and spend quality time watching series without me, getting ahead in a series we were watching together !

This is disastrous and would no doubt lead to a playbook on how get divorced. Urgent actions were needed, so I turned to SOAR!!

(Really this is a blog post about automating the whitelisting/blacklisting of IP/domains. Either for a SOC team who are detecting new attacks, or whether it’s members of staff managing their own policies. But hey I love drama, pun intended. Read to the end where I discuss “This In Business”)

I need a process that:

  • Automatically detects this activity
  • Automatically remediates this activity
  • Validate if I’m home
  • Communicates with the culprit (played by my hussy wife)
  • Seeks confirmation from sysadmin (in this post the victim is played by the flawless handsome and honorable me)
  • Audit logs, SLA, etc

Setup

SOAR running on a virtual machine at home

The DNS activity/alert is generated by PassiveDNS on a Raspberry PI on a network sniffer port

My home network has lots of Unifi in. Unifi do amazing kit with a full API available to control the WAP, Firewall rules, network config etc. I <3 the Unifi!

Workflow Process

To avoid a SIEM at home, I simply have PassiveDNS forward logs for Netflix DNS requests direct to SOAR.

PreProcessing is then used to make sure that all prior tickets/incidents are closed (i.e. check this is a new situation)

SOAR then queries my Unifi controller to see if my personal mobile phone is connected to the WIFI. If “yes” then I am home, if “no” I’m likely out travelling.

The VICTIM (innocent me) who is likely in a hotel/shop is then either sent an email with a choice to block the activity instantly, or to request a justification….

….or if I’m being an uber admin, I can use the mobile app to decide….

If I chose to ask “EnforceJustification” a questionnaire is send to the wicked one!

Answers are forwarded to the sysadmin

Of course being the benevolent kind generous soul I am, I of course decide to Allow this traffic (between you and me, I watched this the other day, so I’m already ahead of her… #guilty)

And thus our marriage is saved. Should I train to become a marriage counsellor?

SOAR Value Realised

I previously talked about the value of SOAR and I think this playbook ticks off many of those:

  • Reduce alert overhead
  • Quicker to act
  • Standardise your workflow
  • Standardise approvals
  • Revitalise legacy/simple tools

This In Business

Many of us are familiar with this phone call from a member of staff who was denied a work related website:

“How dare you block my access to the internet, I need that website to do my job! You’re stopping business! Stop being paranoid! I’m going to report this!”

So let’s adapt the above process:

  • A workflow initiated by the end user (through a ticketing system, SIEM, emails, or other)
  • Playbook asks the user which domain to whitelist, for how long, and for what justification
  • Playbook then checks if this domain was requested before. If denied, ticket can be closed. If approved a few times should we consider a long term whitelist?
  • Playbook then enriches the domain against Threat Intel: Is it known malicious, is it less than a week old, is it an inappropriate category of site
  • Playbook enriches with ActiveDirectory to determine that users manager
  • Playbook emails manager for approval with all the details and buttons of “yes approve” or “no, block”
  • If the manager approves the domain is whitelisted
  • And after the correct amount of time, automatically removes the policy change to clean up.

We now have a process that is operational 24/7, works at the speed of the affected staff (not the huge workload of the security team), takes no effort on your team, does threat enrichment and sanity checking, cleans up after itself, has SLA, RBAC, is fully audited. All the while, no member of staff was given access to any security tool!

Useful ?

Andy

Keeping Control during Automation in SOAR

I previously posted “but don’t forget that over-automating can lead to reduced visibility“.  Machines do what we tell them (to a fault), how do we retain some control?

Example – At Demisto, when you ask for access to our help-center, the email is processed by a SOAR playbook to validate the request, manage access, and respond to the user, like self-service. 

However recently it took a wrong turn so I had to open and take control and override our usual logic.  This was easy as each incident (which the automation belongs to) is tracked like case management, so we simply “re-open”, open up the playbook to find the issue, and correct it.

So what best practise can we utilise to keep control over SOAR?

Human Checks – Before any critical steps (e.g. pushing IP to firewall) you might want to ask a human analyst to verify (either using ticket management, email/slack question, or using the Mobile App)

Playbook Design – Many playbooks have forks based upon automated decisions. Consider the chain of events in a ticket if is restarted from a certain point taking a different action. Do any original steps have to be undone? A good design allows users to quickly make changes and walk away.

Human Notifications – If anything is seen slightly odd (too much data in a reply) then continue a human analyst of the observation with a direct link “click here to see the playbook in operation”

Summary Page – All key data and decisions should be in the tickets summary page, so any analyst/team leader having a quick view can see the key points of ticket (e.g. User not found resulting in playbook taking a specific course of action)

Andy

SOAR in Banks

Great article
https://biztechmagazine.com/article/2019/07/how-security-orchestration-and-automation-make-banks-cyber-resilient?utm_source=linkedin.com&utm_medium=referral

“Visibility is critical in all contexts: network, endpoint, DNS, email, web and, most importantly, the hybrid cloud, where monitoring workloads and accessibility presents a big challenge.”

Agreed, in a way SOAR isn’t a security tool, it orchestrates them, so make sure you have visibility. Even a free/open/cheap tool with SOAR can show value if it’s integrated to a complete workflow.

Andy

best-practices-for-the-soc-team

Lots of great points in the article, I’ve taken a few out below.
https://www.infosecurity-magazine.com/opinions/best-practices-for-the-soc-team/

“Organizations are being forced to hire Tier 1 analysts with little or no experience, and spread their Tier 2 analysts too thin”

To help your Tier1 team, either hire anyone in IT and just do data collection, or use Automation:

” …if there is no judgment to be made, you don’t need a human analyst – you need to automate. “

To help your Tier2 team

“Analysts should be equipped with tools that can help them automatically investigate incident “

Andy

Bruce Schneier Talk

I recently attended a talk by Bruce Schneier talking about Automation and his new book “Click here to kill everybody” (charming)

Though the talk wasn’t specific to SOAR, it’s still relevant to IT Security I think this borders on similar concepts to SOAR, so here are my personal notes from the talk

  • There will always be vulnerabilities as all software is crap…. we want it cheap, fast and now
  • All computers are platforms and therefore extensible by design
  • Bigger means complex, more attack surface, more insecure
  • Putting 2 Systems together which were not designed together, creates a vulnerability, it’s no one’s fault but it’s there
  • Security hasn’t changed in 10 years but computers platforms are changing a lot (IOT etc)
  • Stealing blood type information from a hospital is bad, changing blood type information is worse (integrity vs confidentiality)
  • Computers break at scale, all at once, think contact-less hotel door, once a vulnerability is discovered every door is ‘broken’ in the same moment
  • It’s a style of failure we’re just not experienced in dealing with
  • Best way to patching legacy kit is simply throw it away and rebuild
  • Replace a phone battery yearly, replace a fridge every decade
  • The world will be swamped with non patched devices soon
  • We are moving further away from “thing to person auth” and even more to “thing to thing” auth, we don’t know how to do this on the scale needed
  • Imagine a city of 1,000,00 cars needing “thing to thing” auth to inform and talk to each other
  • Cyber skills gap, so we need to automate more
  • How do you build something secure, on top of unsecure parts?

The talk summarised with the need for regulation from the govt.

  • Regulation is the only answer. You trust a restaurant won’t poison you, and that the building you’re under wont collapse on you as it’s regulated. Regulation isn’t perfect but it works all around us quite well.
  • Regulate in one place, and every territory should benefit. i.e. companies don’t want 2 code bases, it’s simpler and cheaper to to work it out for the area with the highest standards then use this in other locations

Some food for thought?

Andy