OCR in a playbook

I recently hosted a demo where a Playbook analysed a phishing email, and amongst other things (e.g. enriching using Active Directory, detonating files in many sandboxes, interacting with the end user…) we compared IOCs against ThreatIntel.

I was asked “but what happens if the phishing link is actually an image of a URL?”

Keeping in mind that SOAR is not an anti phishing platform, and that our demo was more about automating and orchestrating across many different solutions…..

….I accepted the challenge.

OCR.Space

First I had to find a platform that did OCR for me. You might have an enterprise instance in house, however I needed a free API cloud instance and OCR.Space does exactly that [1]

Next was to build the integration, which this case is lovely and simple (the full file is attached, but here are the two real lines of code)

    encoded = base64.b64encode(open(demisto.getFilePath(entryID)['path'], 'rb').read())
    r = urllib2.urlopen(url, urllib.urlencode({"apikey":apikey, "base64Image": "data:image/png;base64," + encoded }))

Now I need an image of a URL

Now I need a playbook. It should allow you to upload an image, decode the image, then set the output to something useful.

We can see the output of the OCR

And see how that ParsedText is then Set to the Details section of the Incident 🙂

Other uses

For my demo I actually integrated this into the main phishing playbook for links.

How about tactical testing of data exfiltration?

How about swapping out OCR Space for a de-steganography tool?

All data decoded should also go through IOC extraction (hash, IP, domain, machine name, custom regex structures) and be indexed against other tickets.

[1] Note – As I do not have a contract with OCR.Space I would never send any images that potentially contain sensitive data.

No API? No Problem!

As we know the future holds great things. Flying cars, tasty 0% Beer, and every platform offers a complete API.

But for now how do we interact with those odd solutions that don’t provide a good API?

All solutions must have a input/output otherwise they wold be a brick (though maybe a UI in JAVA is the only way), and often we can explore this.

Use SSH Commands in a playbook

Create a shell script on a Linux box, this file modifies the system (files, commands…) then restarts a service to take the config and then we simply call this script using a playbook.

Example: Squid Proxy.  I wanted to add bad domains to Squid blacklists, but squid has no official API.  So my SOAR playbook called, through SSH, a script that added the domain, checked the config, then reloaded the service.  Squid now has near real time data.

Update static .txt files

Modify a remote file (HTTP PUT, SCP, FTP, other…) that the other solutions is known to use.  The end application knows to re-ingest that file every x minutes even though it doesn’t understand SOAR.

Example – Bluecoat Proxy can read in txt files and use them in ACL. So a playbook needs to update http://10.0.0.1/bad_domains.txt and no API is needed.

HTTP scraping and parse

Maybe your target solution provides a webpage with data on, and you want it, well the data is there, it’s structured, so we just need SOAR to download the HTML body and extract elements.

Example – We’ve all tried to curl HTML and then parse through looking for the contents of ‘div id=”username” ‘ as we wanted info from a webpage.

Use an Expect-script

Expect scripts mimic a user typing keys on a keyboard and can even listen to what the end application has prompted for (e.g. when it sees “password>” it knows to type in ‘123456’).  In this case we really are pretending to be an actual person!

Example – Connecting to an old Cisco switch via telnet

I’m sure there are other ways for other problems, but hopefully this gives an idea and helps you work around problems using old solutions.

Andy

Levels of De-Duplication

  • Do you have an external service provider that sends in the same alert multiple times
  • Or a ‘Smart Next Gen’ device that repeats alerts as it isn’t actually that smart
  • Or users that submit a ticket twice if you don’t reply within 1 minute

These are great use-cases for ticket de-duplication, however the definition of duplication might vary, e.g.

  • Some duplicated alerts have the same time stamp (e.g. a resend)
  • Some duplicated alerts have a different timestamp as the service is not stateful “it’s 13:00 and I’m still seeing this issue”.
  • The unique key attribute might be a compound, IP+CVEE (eg. vulnerability mapping)
  • The unique attributes might be auth login name + service (e.g. VPN brute force attempt from different source IP)
  • Time bound grouping, so a 5 min break signifies a new unique alert
  • …Or other

There are many ways to handle different types of duplication, each with different benefits.  I’ve listed some examples here, from most aggressive duplication removal to the most passive.

Drop new ticket completely, no evidence saved (aggressive dedupe)

Simplest and quickest, great for low severity tickets noise

Drop the new ticket but add a comment to existing ticket

Helps with tracking frequency of events and timings for audit, but without the extra workload of extra tickets for your team to analyse and close

Create a new ticket as a child ticket

Each new ticket is logged as an independent ticket but becomes a child of the first ticket so they can share data. Though this ‘child’ gets it’s own workplan and playbook execution.

Create new ticket and link them

Each alert is an independent ticket that requires closing, however the tickets are linked in the database allowing analysts to track the relations and quickly visit them all

Create a new ticket and add a note in old ticket (with a quick-button ‘click here to close’)

Or maybe create the new ticket and just add a comment to the first to say “hey I might be related, but I’m not sure”.  The playbook would encourage the analyst to check off all potential links before closing the ticket.

Create the ticket regardless (mininal dedupe)

This scenario actually has no dedupe functionality and tickets are not checked.  However the platform proactively suggests similar tickets (using Machine Learning to look for IP addresses, IOC, email addresses, etc.) that an analyst might want to check out.

So there you go, many ways to make automation even more automated !!

Andy

Machine Learning Fails

twitter.com/mogwai_poet/status/1060286856493813760

  • “A robot arm with a purposely disabled gripper found a way to hit the box in a way that would force the gripper open”

OMG Skynet is born

  • “Agent kills itself at the end of level 1 to avoid losing in level 2”

No it’s fine, we’re safe

People complain that computers don’t do what they are told. The truth is the opposite, they do exactly what they are told.  The real problem is that we as humans badly set the environment/parameters/questions.

Which is why a fire and forget SOAR approach isn’t always best, consider adding interactive steps:

  • SOC/CIRT analyst guiding the playbook via the WebUI
  • Comms over email/slack/sms/other
  • Interact with non SOC/CIRT user, e.g. let a business owner control the playbook flow

Look at the following two approaches, and decide which is safer.

Option 1 – Automatically find the alert, auto extraction, auto enrich, auto decision making, auto block

Click to enlarge

Option 2 – Automatically find the alert, auto extraction, auto enrich, auto decision making, but ask a user (email, slack, sms) to validate

Click to enlarge

 

Andy

the-devopsification-of-security

https://medium.com/lenny-for-your-thoughts/the-devopsification-of-security-e62604203adc

 

A point on automation

Automation: Talk to any CISO and they’ll tell you that hiring and retaining qualified security personnel is their greatest challenge. Couple that with the fact that the average large enterprise has deployed anywhere from 50 to 70 disparate security products. The result is ”

 

But also

Built inside-out, not outside-in […] Enterprise security is depicted as having “a hard, crunchy shell, with a gooey interior,” […] Most importantly, it requires a cognitive shift away from prevention and towards control and response”

I find it amusing and disappointing that some solutions hit the market with no API or horrific API.  In my personal experience, these bad API offenders typically have one trick pony solutions, that “work in our way and only our way” and are least flexible.  In contrast to the solutions I’ve worked with where the WebUI uses the API itself !! Now that is a solution I like.  Anything the vendor can do, we can do too which means we can make that little box sing and dance like a pro.

 

I’ve heard a few times recently about RFI that specifically asks “provide details on your API, the functionality allowed, and its maturity”.  We need to see more customers demanding this type of professionalism from products!

 

Andy