Make some noise - a note on detections

The big idea

I know that most detection engineers will already know what I’m about to write about. This post is for companies that are likely without a dedicated detection engineer or dependant on external MSP/MSSPs for their security.

Now, let’s get some things clear.

The noise I’m referring to is actual usage. So the idea is, if you test a detection in an ideal scenario (a vacuum) you’ll likely get detections. Imagine your test environment where no one uses storage accounts for a whole year.

xychart-beta
    title "Signals"
    x-axis [jan, feb, mar, apr, may, jun, jul, aug, sep, oct, nov, dec]
    y-axis "Requests" 0 --> 10
    bar [0,0,0,0,0,0,0,0,0,0,0,0]

Then you want to test enumeration detection for storage accounts like I demonstrated in a previous blog post. Basically you create the resources you need for the test, then set up logging and run a enumeration tool to generate the data. The rule you created (or maybe you are testing a template) works!

But it’s not that simple. You introduce it to your environment thinking you are safe, but the rule doesn’t actually work.

The TL;DR is as follows:

Actual traffic to the monitored storage accounts while enumeration takes place makes the rule output nothing.

That’s not good.

On signals and noise

So how does this look when we develop the detection? Let’s imagine we have two storage accounts and we query them using something like cloud_enum.py:

xychart-beta
    title "Signals"
    x-axis [sa1,sa2]
    y-axis "Requests" 0 --> 10
    bar [5,5]

Alright, so in this scenario the same amount of queries are ran across both. The logic of the detection I showed in the blog I referenced above was simple;

amount of requests == amount of unique paths

So if cloud_enum always ran X requests and visited X unique paths, that’s a solid signal that someone is enumerating.

What this looks like in production, however, is a bit different:

xychart-beta
    title "Signals"
    x-axis [sa1,sa2]
    y-axis "Requests" 0 --> 50
    bar [25,25]
    line[37,45]

Imagine the line as the actual traffic and the bar as the enumeration. The numbers are as follows:

Storage account	Normal users	Tool
sa1	37	25
sa2	45	25

Users don’t usually behave like a scanner, so maybe there’s actually only one path they visit on the account. Let’s assume that and calculate for account sa1.

So we look first at the total requests for sa1:

37+25=62

Now let’s look at the unique paths in those requests:

1+25=26

If we apply the previous formula of amount of requests == amount of unique paths to this we can see that the numbers don’t match.

62 =/= 26

This is why the detection rule fails in an env with noise.

It’s very logical, simple really, but it should tell you something about why we want to adapt rules to our environments.

Rules that are developed in “perfect” envs tend to give no hits or waaaay to many hits. This leads to a false sense of security (“we have that covered through this detection”) and alert fatigue (“this detection creates so many alerts we can’t go through all of them manually”).

Is this relevant?

So you ask, is this prevalent? Or maybe you work as a detection engineer and think “no way this happens in real life”. I can assure you, it does.

I have seen many examples of environments managed by a few people or a giant external MSP where the strategy seems to be:

Log everything (no matter the cost)
Enable all the template rules we can

That’s it. No adaption. The worst I’ve seen is over 1000 incidents daily. Is someone handling that, or is the idea that when something happens you can say “well we actually detected it” or something?

Similarly, in small environments the deployments are usually more tailored towards what you know can be relevant, but a lack of experience know-how prevents you from properly changing rules to work.

*I’ve previously written extensively on the data aspect of Security Monitoring and developing good use cases so I won’t go more in detail on that here.

The specific topic of enabling everything (data, detection, AI) is something I’ve seen too much and also wrote a specific post on my field notes for security strategy based on this.

So what can we do? Well, we test in prod!

This is not generic advice. Test in prod is something I only advise for detections. You can mark it with tags to make sure it’s not “live”, but the data you need to build, tune and verify detections for actual production usage is in prod.

I don’t advise against developing detections in an environment where you can freely run tests, just be aware what you look for in your rules.

flowchart LR
    A[New threat] -->|Requires detection| C{Detection Engineering}
    C --> D["1. Information gathering"]
    C --> E["2. Develop rule"]
    C --> F["3. Deploy"]
    D -.-> |Informs| E
    E -.-> |In| Dev
    F --> |To| P
    P[Prod]
    Dev["Dev/test"]

The idea here is that even if you develop and test your rule in a dev env, you should deploy it to prod knowing it’s likely not going to work. You should think of deployment to prod as a test and hopefully be able to test it to the same extent with the noise pollution that is normally in your env.

Some times you might not be able to replicate the data you need in dev, thus, you deploy directly to prod. Again the same applies, have a system that allows you to test and tag these alerts so that everyone understands it’s in testing.

So that’s it. If you’re reading this and thinking “yeah, duh” - maybe this wasn’t for you, but thanks anyway.

weball

We outtie!