OCR in a playbook

I recently hosted a demo where a Playbook analysed a phishing email, and amongst other things (e.g. enriching using Active Directory, detonating files in many sandboxes, interacting with the end user…) we compared IOCs against ThreatIntel.

I was asked “but what happens if the phishing link is actually an image of a URL?”

Keeping in mind that SOAR is not an anti phishing platform, and that our demo was more about automating and orchestrating across many different solutions…..

….I accepted the challenge.

OCR.Space

First I had to find a platform that did OCR for me. You might have an enterprise instance in house, however I needed a free API cloud instance and OCR.Space does exactly that [1]

Next was to build the integration, which this case is lovely and simple (the full file is attached, but here are the two real lines of code)

    encoded = base64.b64encode(open(demisto.getFilePath(entryID)['path'], 'rb').read())
    r = urllib2.urlopen(url, urllib.urlencode({"apikey":apikey, "base64Image": "data:image/png;base64," + encoded }))

Now I need an image of a URL

Now I need a playbook. It should allow you to upload an image, decode the image, then set the output to something useful.

We can see the output of the OCR

And see how that ParsedText is then Set to the Details section of the Incident 🙂

Other uses

For my demo I actually integrated this into the main phishing playbook for links.

How about tactical testing of data exfiltration?

How about swapping out OCR Space for a de-steganography tool?

All data decoded should also go through IOC extraction (hash, IP, domain, machine name, custom regex structures) and be indexed against other tickets.

[1] Note – As I do not have a contract with OCR.Space I would never send any images that potentially contain sensitive data.