A man walks into a bar, ouch
This is a quick ‘joke’, it takes 2 seconds to say, everytime I tell it. I have no concerns giving a SLA for this joke. On the other hand…
Whilst I know *I* can tell this joke in under 5 seconds I’m entirely relying on the person I’m talking to, is it representative to apply a SLA on me?
Compare this to a SOAR playbook: any local task we have control over, but it’s not so simple when we wrap a business process around this:
- Any interaction that involves human input (especially where that person is not part of our team, and we can’t kick them)
- A query that potentially takes hours to complete
- Unstable technology we can’t change
- Technology belonging to another team
So how do we apply such SLA to playbooks ?
SLA for an entire Incident
Pro – Quick to configure. Great for small simple playbooks.
Con – Very inflexible.
A timer starts with the incident, if the ticket takes longer, we have a SLA breach.
SLA for each individual task
Pro – Finely tuned
Con – Administrative overheads building and maintaining
Start a timer for each specific task, if that task takes too long we can either alert, skip the task, or take a different playbook route and escalate the process to the senior team.
Pro – Flexible. quick to deploy
Con – none?
E.g. Task 1 starts timer, task 5 pauses it, task 7 resumes it, task 10 closes it.
Knock Knock (including SLA)
- The “Joke SLA” represents the entire incident
- Terminology “Incident SLA”
- The “My Team SLA” stops and starts
- Terminology “Timer”
- The “Punchline SLA”
- “Task specific SLA”