Levels of De-Duplication

  • Do you have an external service provider that sends in the same alert multiple times
  • Or a ‘Smart Next Gen’ device that repeats alerts as it isn’t actually that smart
  • Or users that submit a ticket twice if you don’t reply within 1 minute

These are great use-cases for ticket de-duplication, however the definition of duplication might vary, e.g.

  • Some duplicated alerts have the same time stamp (e.g. a resend)
  • Some duplicated alerts have a different timestamp as the service is not stateful “it’s 13:00 and I’m still seeing this issue”.
  • The unique key attribute might be a compound, IP+CVEE (eg. vulnerability mapping)
  • The unique attributes might be auth login name + service (e.g. VPN brute force attempt from different source IP)
  • Time bound grouping, so a 5 min break signifies a new unique alert
  • …Or other

There are many ways to handle different types of duplication, each with different benefits.¬† I’ve listed some examples here, from most aggressive duplication removal to the most passive.

Drop new ticket completely, no evidence saved (aggressive dedupe)

Simplest and quickest, great for low severity tickets noise

Drop the new ticket but add a comment to existing ticket

Helps with tracking frequency of events and timings for audit, but without the extra workload of extra tickets for your team to analyse and close

Create a new ticket as a child ticket

Each new ticket is logged as an independent ticket but becomes a child of the first ticket so they can share data. Though this ‘child’ gets it’s own workplan and playbook execution.

Create new ticket and link them

Each alert is an independent ticket that requires closing, however the tickets are linked in the database allowing analysts to track the relations and quickly visit them all

Create a new ticket and add a note in old ticket (with a quick-button ‘click here to close’)

Or maybe create the new ticket and just add a comment to the first to say “hey I might be related, but I’m not sure”.¬† The playbook would encourage the analyst to check off all potential links before closing the ticket.

Create the ticket regardless (mininal dedupe)

This scenario actually has no dedupe functionality and tickets are not checked.  However the platform proactively suggests similar tickets (using Machine Learning to look for IP addresses, IOC, email addresses, etc.) that an analyst might want to check out.

So there you go, many ways to make automation even more automated !!





A point on automation

Automation:¬†Talk to any CISO and they‚Äôll tell you that hiring and retaining qualified security personnel is their greatest challenge. Couple that with the fact that the average large enterprise has deployed anywhere from 50 to 70 disparate security products. The result is ”


But also

Built inside-out, not outside-in‚Ää[…]¬†Enterprise security is depicted as having ‚Äúa hard, crunchy shell, with a gooey interior,‚ÄĚ […]¬†Most importantly, it requires a cognitive shift away from prevention and towards control and response”

I find it amusing and disappointing that some solutions hit the market with no API or horrific API.¬† In my personal experience, these bad API offenders typically have one trick pony solutions, that “work in our way and only our way” and are least flexible.¬† In contrast to the solutions I’ve worked with where the WebUI uses the API itself !! Now that is a solution I like.¬† Anything the vendor can do, we can do too which means we can make that little box sing and dance like a pro.


I’ve heard a few times recently about RFI that specifically asks “provide details on your API, the functionality allowed, and its maturity”.¬† We need to see more customers demanding this type of professionalism from products!




TLS1.3 and SOAR?

I was asked recently if SOAR works with TLS1.3.  Some of you might already know this, but for those that might not:

TLS1.3 is commonly used for encrypting data as it travels over a network, or put another way “encryption on the wire” as opposed to “encryption on disk”.¬† When the data reaches it’s destination this encryption is removed and the recipient can see the original traffic.

SOAR is not an inline tool, by that I mean it does not observe and regulate the moving network traffic (like a firewall/IPS does) so this “in transit” encryption does not meet the SOAR solution.¬† SOAR is infact out-of-band and communicates with these tools that are themselves inline, the problem of TLS visibility remains with them and not SOAR.

Example – A SOAR playbook that is triggered by a user visiting “https://banking.com” will only trigger if the Proxy or IPS that is handling that request is inspecting inside SSL/TLS and can therefore inform SOAR/SIEM of the request.

Of course, SOAR platforms talk to other solutions (IPS, Firewall, Case Management, etc) and for these communications are encrypted, and yes SOAR should be able to use TLS1.3 itself.


Shameless link – before starting SOCOPS.ROCKS I wrote another little blog for self learning and I wrote an article on TLS1.3 (a previous focus of mine), much more information here:



Build your own Machine Learning inside a SOAR Playbook

*bbbbzzzz* this blog has already failed the buzzword test.

  • Maybe your ticketing system doesn’t support Machine Learning **bbzzzzzz** classification
  • Maybe you want to apply some¬†Machine Learning **bbzzzzzz** to an old piece of technology on your network
  • Maybe you just want to apply¬†Machine Learning **bbzzzzzz** using a dataset you can control and tweak

A real POC use case: 

A MSSP is limited to 1 mailbox, and receives 150+ human written free-form emails a day covering all ticket types.  An analyst who takes 1 minute to read+log+classify+prioritise+assign+move the original email is wasting two and a half hours a day (whilst suffering eye fatigue/burnout?).   Or put another way, 70 hours a month!!

So how about a playbook to do the time wasting labour:

  1. Monitor an inbox near realtime
  2. Ingest all emails, and start a playbook I will call¬†“ReCategorise”
  3. The email body passes through a learned dataset
    1. If >80% confidence match is made by FastText we have an answer
    2. If <80% we try simple keyword matching
    3. If nothing still matches we run a default playbook
  4. With this decision, we can reprocess the email into the correct playbook type
  5. Humans can correct any wrong prediction
  6. All these predictions/corrections are logged, so at the end of the month we can analyse/tweak any issues

The steps needed for this:

  1. Monitor an inbox to see the initial email
  2. Create a dataset for ML **bbbzzzzz** to read
  3. Write an integration that compares email to dataset
  4. Build this into a playbook workflow

#1 Monitor an inbox

This is easy, just create a standard Mail Listener that reads new emails every x seconds

#2 Create a learned dataset

For a real POC I would download thousands of old categorised tickets (done by humans) and label these with the confirmed ticket type. However for this blog post I will create a simple model myself in “data.txt”.

__label__<Category> <text>

__label__Phishing account locked
__label__Phishing invoice.pdf
__label__Phishing account closed
__label__Phishing your purchase
__label__Phishing your delivery
__label__Phishing netflix
__label__Phishing bank account
__label__Phishing payment transfer
__label__DeviceLost I lost my phone on a bus
__label__DeviceLost my tablet was stolen
__label__DeviceLost I can’t find my laptop
__label__DeviceLost my desktop was stolen by aliens
__label__Enrichment have you seen this url before?
__label__Enrichment can you check this file hash
__label__Enrichment here is a suspicious IP address
__label__Enrichment this ioc looks bad
__label__Enrichment please enrich this attribute

Then setup the environment, Linux, Python and FastText (I chose the pyFastText implementation)

yum -y install gcc gcc-c++ build-essential redhat-rpm-config python-devel python-devel
curl “https://bootstrap.pypa.io/get-pip.py” -o “get-pip.py”
python get-pip.py
pip install –trusted-host pypi.python.org cython argparse
CFLAGS=”-Wp,-U_FORTIFY_SOURCE” pip install cysignals
pip install pyfasttext

This next Python script uses FastText to compile a binary file (which we use) and a vector file (which this use case won’t use).¬† The main line is in Bold.

#!/usr/bin/env python
import json
import pyfasttext
from pyfasttext import FastText

dataSourceText = ‘./data.txt’
model = FastText(label=’__label__’)
dataOutModel = ‘./model’
model.supervised(input=dataSourceText, output=dataOutModel, epoch=100, lr=0.7)

When we execute this:

# ./myfasttext.py
Read 0M words
Number of words: 50
Number of labels: 3
Progress: 100.0% words/sec/thread: 962300 lr: 0.000000 loss: 0.735017 eta: 0h0m
# ls -lt
-rwxrwxrwx. 1 root root 45346 Oct 22 23:27 model.vec
-rwxrwxrwx. 1 root root 22165 Oct 22 23:27 model.bin

#3 Using Python build a SOAR wrapper Integration to ingest that .bin file and compare against the trained dataset

Thought I try and stay vendor neutral, this is written in Python for Demisto.  Full yml file attached.  The important coding bit is:

model = FastText()
modeloutput = model.predict_proba_single(email.body + ‘\n’, k=2)[0]
EntryContext = {‘fasttext’:{‘classification’:modeloutput[0], ‘confidence’:modeloutput[1]}}

#4 Test the Integration manually

Pass the text “account locked” into FastText and check for the category decided, and the confidence of this decission.¬† Here the FastText Integration correctly guesses Phishing, with 99.6% confidence (though keep in mind, for this blog post my learning data set is hilariously small).

#5 Build a playbook that uses FastText as a primary comparison

“Analyse Email in FastText” (task #31) represents our above integration “ft-analyse”.¬†¬†

“Over 80% confidence” (task #33) asks whether FastText is confident in it’s prediction.

If this is over 80% confident, “What was the outcome” (task #34) looks at what this prediction was and take the appropriate workflow path.¬†(click to enlarge):

#6  Run test emails through the playbook

First test – I have simulated a user forwarding a real phishing email to this playbook, containing the lines:

Is this real or phishing?[

Click here to verify your account<http://<removed>/membershipkey=343688408873184732/>

Failure to complete the validation process will result in a suspension of your netflix membership.

Netflix Support Team

And the playbook took this route (click to enlarge)….

Which in-turn automatically re-processed the ticket as phishing

Success!! that’s 1 minute saved on classifying the email (…fine and another 60 minutes saved on actually processing a Phishing email… but today is about ML **bbbzzz** and not SOAR as a whole, don’t take this away from me, I’m still proud of my 1 minute!).

Second test – I have created a small simple email simulating a lost device

I lost my laptop on the bus

Stupid bus

And the playbook took this route (click to enlarge)….

Which in-turn automatically re-processed the ticket as phishing

Then to add a cherry on top, lets assign to analyst based on some other criteria (I chose time of day).

The time taken to put this all together was 1-2 (plus another 1-2 days to get my head around FastText and pyFastText), but remember everyday we save 2hours 30mins of tedious work.

Whilst this playbook classifies inbound emails, we can have multiple datasets running at the same time classifying anything you have a dataset for.¬† I’m sure you the reader could think of other use cases…




Great article, it covers the pains, and ultimately says “trust your SOC guys”.¬†¬†Reminds me of the saying “A good sysadmin is a lazy sysadmin”.¬† Most IT people know how to make their life easier, we’re logical people ultimately.