The KQL Playbook (Play #2): Mastering the Matching Game

The KQL Playbook (Play #2): Mastering the Matching Game

Alright, class. Welcome back to the KQL playbook.

In our last session, we covered the four core moves of KQL. You learned how to grab a table, filter it with where, count things with summarize, and tidy it all up with project. You've officially moved from staring at a blinking cursor in terror to having a real conversation with your data. That’s a huge win, and you should be proud of it.

But today, we need to talk about the next great wall every analyst hits. It’s a subtle but infuriating barrier that separates the rookies from the pros.

It’s the moment you know the data is there. You saw the alert, you have the user's name, you have the malicious process. You write a query that looks logically flawless, a thing of beauty. You confidently hit "Run," lean back in your chair, and prepare to be showered in results.

And you get… nothing.

A single line: "Query returned no results."

You check for typos. You run it again. Still nothing. You start to question everything. Is the data connector broken? Did the attacker use magic to delete the logs? Is the entire Azure platform just an elaborate prank designed to make you look foolish in front of your boss?

The platform is fine. The logs are there. The problem is that your query was just a little bit sloppy. And in the world of security operations, the difference between a sloppy query and a precise one is the difference between writing an incident report and updating your resume.

Today's play is all about mastering the art of comparison. We’re going to turn your blunt instrument of a where clause into a surgeon’s scalpel. We will dissect the crucial differences between the operators you think you know, like ===~has, and contains. This isn’t just about syntax; it’s about mindset. Mastering this is the foundation of every effective detection rule and every successful threat hunt you will ever conduct.

Case-Sensitive vs. Case-Insensitive: == vs. =~

Let's start with the classic trap, the one that bites every new analyst without fail. You're looking for actions taken by a specific system or application, but you forget that computers are infuriatingly literal and have no appreciation for nuance.

The Scenario: You're investigating a series of unexpected configuration changes in your Azure environment. You suspect they are automated, and you've narrowed down the potential source to actions initiated by the "Windows Azure Service Management API". Your task is to find every single action initiated by this specific service in the last 24 hours.

Based on your initial exploration of the AuditLogs table, you build the following query. The rookie move is to jump right in with the double equals ==, because it logically means "is this thing equal to that thing?"

// The Rookie Mistake - A Recipe for Blind Spots
AuditLogs
| where TimeGenerated > ago(24h)
| extend initiatedByDisplayName = tostring(InitiatedBy.app.displayName)
| where initiatedByDisplayName == 'Windows Azure Service Management API'
| project TimeGenerated, OperationName, Result, initiatedByDisplayName

You run the query. You get a handful of results back. "Great," you think, "I've scoped the activity. I can see the changes it made." You start writing up your report based on this limited data set.

But here’s the problem you can't see: some logs, particularly from different Azure services or older APIs, might record that same display name with different capitalisation. It might be logged as windows azure service management api (all lowercase) or even Windows Azure Service Management Api (inconsistent case). All of those logs, which could contain the critical evidence of the actual unauthorised change, were completely invisible to your query. You've just scoped an investigation based on partial evidence.

Why? Because == is the strictest, most pedantic comparison operator in KQL. It demands an exact, case-sensitive match. To ==W is not the same as wAPI is not the same as Api. Your query did exactly what you asked, but you asked the wrong question.

The Professional Fix: The Tilde Equal (=~)

This operator is your new default for string equality. Think of the tilde (~) as a relaxed shrug. The =~ operator means "equals, but I honestly don't give a damn about the case."

Let's fix our query and see what we were missing:

// The Professional Approach - Seeing the Whole Picture
AuditLogs
| where TimeGenerated > ago(24h)
| extend initiatedByDisplayName = tostring(InitiatedBy.app.displayName)
| where initiatedByDisplayName =~ 'Windows Azure Service Management API' // <-- The only change that matters
| project TimeGenerated, OperationName, Result, initiatedByDisplayName, InitiatedBy

You run this revised query. Boom. Suddenly, your results have doubled, maybe tripled. You’re now seeing the same actions logged with different capitalisation schemes. You have the full picture. You just went from partially blind to fully sighted with a single character change.

This isn't just a hypothetical. Log sources are notoriously inconsistent. Relying on == for anything that a human or another system has typed (like display names, user agents, or even hostnames) is a surefire way to create blind spots in your investigation.

Another Real-World Example: Catching Malicious Inbox Rules

This principle isn't just for system logs; it's critical for threat detection logic. Consider a common attacker technique: after compromising a user's mailbox, they create an inbox rule to delete any incoming emails that might warn the user automatically. They'll look for keywords like "phishing," "suspicious," "hacked," or "helpdesk" and move those messages straight to the trash.

Here is a version of a detection rule designed to catch this exact behaviour:

let Keywords = dynamic(["helpdesk", " alert", " suspicious", "fake", "malicious", "phishing", "spam", "do not click", "do not open", "hijacked", "Fatal"]);
OfficeActivity
| where OfficeWorkload =~ "Exchange" 
| where Operation =~ "New-InboxRule" and (ResultStatus =~ "True" or ResultStatus =~ "Succeeded")
| where Parameters has "Deleted Items" or Parameters has "Junk Email"  or Parameters has "DeleteMessage"
| extend Events=todynamic(Parameters)
| parse Events  with * "SubjectContainsWords" SubjectContainsWords '}'*
| parse Events  with * "BodyContainsWords" BodyContainsWords '}'*
| parse Events  with * "SubjectOrBodyContainsWords" SubjectOrBodyContainsWords '}'*
| where SubjectContainsWords has_any (Keywords)
 or BodyContainsWords has_any (Keywords)
 or SubjectOrBodyContainsWords has_any (Keywords)
| extend ClientIPAddress = case( ClientIP has ".", tostring(split(ClientIP,":")[0]), ClientIP has "[", tostring(trim_start(@'[[]',tostring(split(ClientIP,"]")[0]))), ClientIP )
| extend Keyword = iff(isnotempty(SubjectContainsWords), SubjectContainsWords, (iff(isnotempty(BodyContainsWords),BodyContainsWords,SubjectOrBodyContainsWords )))
| extend RuleDetail = case(OfficeObjectId contains '/' , tostring(split(OfficeObjectId, '/')[-1]) , tostring(split(OfficeObjectId, '\\')[-1]))
| summarize count(), StartTimeUtc = min(TimeGenerated), EndTimeUtc = max(TimeGenerated) by  Operation, UserId, ClientIPAddress, ResultStatus, Keyword, OriginatingServer, OfficeObjectId, RuleDetail
| extend AccountName = tostring(split(UserId, "@")[0]), AccountUPNSuffix = tostring(split(UserId, "@")[1])
| extend OriginatingServerName = tostring(split(OriginatingServer, " ")[0])

Look closely at the first two where statements:

  • | where OfficeWorkload =~ "Exchange"
  • | where Operation =~ "New-InboxRule"

Why is the tilde (=~) so crucial here? Because you cannot guarantee how the log source will capitalise these strings. Today, the log might say "Exchange". Tomorrow, after a Microsoft 365 update, it might change to "exchange". The operation might be logged as "New-InboxRule" or "new-inboxrule".

If this detection rule was written with a strict ==, it would be brittle and unreliable. A simple, benign change in log formatting could completely blind a critical security control. By using =~, the rule becomes resilient. It says, "I don't care how you capitalise it, if someone is creating a new inbox rule within the Exchange workload, I need to know about it." It's the difference between a rule that works today and a rule that works every day.

The Sniper Rifle vs. The Shotgun: has vs. contains

Alright, now for the main event. Both of these operators are designed to find a small piece of text inside a larger block of text, like a command line or a URL. But how they do it is fundamentally different, and choosing the right one will dramatically affect your query's speed and accuracy.

Think of it like this: has is a sniper rifle, fast, precise, and targets known weak points. contains is a shotgun, powerful, wide-ranging, and can make a bit of a mess if you're not careful.

The Sniper Rifle: has
The has operator is built for pure, unadulterated speed. It doesn't scan every single character in a string. Instead, it leverages KQL's full text index. It looks for a whole term a sequence of characters separated by spaces or punctuation (like -_./).

The Scenario: You're hunting for attackers using the infamous credential dumping tool mimikatz. You want to find any command line across your entire fleet that mentions the tool.

// Using the sniper rifle, 'has'
DeviceProcessEvents
| where ProcessCommandLine has "mimikatz"

This query will execute in the blink of an eye, even across millions of events. It will instantly find command lines like:

  • C:\temp\mimikatz.exe sekurlsa::logonpasswords
  • powershell -c "invoke-mimikatz"
  • rundll32.exe C:\payloads\mimikatz.dll,_some_function

It works because "mimikatz" is a distinct, whole term in all these examples, separated by spaces, slashes, or dots.

The Shotgun: contains
The contains operator is your brute-force tool. It will find a substring anywhere, even if it’s squished in the middle of other characters. It’s more flexible, but it comes at the cost of performance because it has to perform a slow, character-by-character scan of the entire field.

The Scenario: You're investigating a web attack. You suspect an attacker is using a specific obfuscated SQL injection string, and you want to find any URL requests that contain it. The string you're looking for is Char(115)Char(101).

Let's try our sniper rifle first:
| where Url has "Char(115)Char(101)" -> This will fail. Why? Because that string is not a "whole term." It's surrounded by other characters in the URL string, like ?id=1;UNION%20ALL%20SELECT%20Char(115)Char(101). The has operator can't isolate it.

Now, let's bring out the shotgun:

// Using the shotgun, 'contains'
AzureDiagnostics
| where OperationName == "ApplicationGatewayAccess"
| where requestUri_s contains "Char(115)Char(101)"

This works perfectly, contains doesn't care about word boundaries; it just finds that exact sequence of characters buried deep inside the larger string.

The Specialists: startswith and endswith
Don't forget these incredibly useful specialists. They do exactly what their names imply and are highly optimized for their tasks.

  • startswith: Perfect for finding processes launched from a specific untrusted directory. A query like | where ProcessCommandLine startswith "C:\\Windows\\Temp" is a classic hunt for malware execution.
  • endswith: Your go-to for finding specific file types or hunting for ransomware extensions. A query like | where FileName endswith ".locked" is a simple but powerful detection for active encryption.

Putting It All Together: A Real-World Detection Rule

Nothing illustrates the has vs. contains debate better than seeing them used together in a real detection rule. Let's look at the logic behind a rule designed to detect "Office Policy Tampering", an action an attacker might take to weaken your security controls. The goal is to find any admin activity that disables or removes critical security policies like ATP Safe Links, DLP, or Anti Phishing rules.

Here's a logic used to identify the malicious operations:

let opList = OfficeActivity 
| summarize by Operation
//| where Operation startswith "Remove-" or Operation startswith "Disable-"
| where Operation has_any ("Remove", "Disable")
| where Operation contains "AntiPhish" or Operation contains "SafeAttachment" or Operation contains "SafeLinks" or Operation contains "Dlp" or Operation contains "Audit"
| summarize make_set(Operation, 500);
OfficeActivity
// Only admin or global-admin can disable/remove policy
| where RecordType =~ "ExchangeAdmin"
| where UserType in~ ("Admin","DcAdmin")
// Pass in interesting Operation list
| where Operation in~ (opList)
| extend ClientIPOnly = case( 
ClientIP has ".", tostring(split(ClientIP,":")[0]), 
ClientIP has "[", tostring(trim_start(@'[[]',tostring(split(ClientIP,"]")[0]))),
ClientIP
)  
| extend Port = case(
ClientIP has ".", (split(ClientIP,":")[1]),
ClientIP has "[", tostring(split(ClientIP,"]:")[1]),
ClientIP
)
| summarize StartTimeUtc = min(TimeGenerated), EndTimeUtc = max(TimeGenerated), OperationCount = count() by Operation, UserType, UserId, ClientIP = ClientIPOnly, Port, ResultStatus, Parameters
| extend AccountName = tostring(split(UserId, "@")[0]), AccountUPNSuffix = tostring(split(UserId, "@")[1])

Let's break down why this query is so smart:

  1. | where Operation has_any ("Remove", "Disable")
    The analyst who wrote this chose has_any for the first filter. Why? Because Exchange PowerShell command names (which are logged in the Operation field) are structured like Verb-Noun, for example, Set-MalwareFilterPolicy or Disable-SafeLinksRule. The "verb" part, like Remove or Disable, is a distinct, whole token at the beginning of the string, often separated by a dash. Using has_any here is incredibly fast and efficient because it can use the KQL index to find every operation that starts with these common "destructive" verbs.
  2. | where Operation contains "AntiPhish" or Operation contains "SafeAttachment"...
    For the second filter, the analyst switched to contains. Why not has? Because terms like AntiPhish or SafeLinks might not be perfectly separated tokens. The full operation name could be Remove-AntiPhishRule or Disable-SafeLinksRule. If they had used | where Operation has "AntiPhish", the query might fail because the term in the index is "AntiPhishRule", not just "AntiPhish". The contains operator provides the necessary flexibility to find that keyword anywhere within the operation name, ensuring that no variations are missed.

This is a masterclass in choosing the right tool for the job. The analyst used the fast, indexed has_any for the clean, predictable part of the string (the verb) and the flexible, brute-force contains for the variable part (the noun). This creates a query that is both performant and robust.

The Power of "Not": Your Exclusion Toolkit

Great threat hunting isn't just about finding evil; it's about efficiently filtering out the overwhelming amount of good. Every comparison operator we've discussed has a negative counterpart, and they are the key to building high-fidelity alerts and reducing analyst fatigue.

The Scenario: You’ve built an analytics rule to detect when a new user is added to a privileged group. It works great, but it keeps firing every time your identity management system's service account, IDM_Sync@itprofessor.cloud, performs its legitimate, hourly synchronisation task. This is a classic "benign positive," and it’s clogging up your incident queue.

The goal is to see all privilege escalations EXCEPT the ones performed by this legitimate service account.

AuditLogs
| where OperationName == "Add member to group"
| where InitiatedBy.user.userPrincipalName !~ "IDM_Sync@itprofessor.cloud" // The magic is here
| extend TargetUser = tostring(TargetResources[0].userPrincipalName)
| extend GroupName = tostring(TargetResources[0].displayName)
| project TimeGenerated, TargetUser, GroupName, Actor = InitiatedBy.user.userPrincipalName

The !~ (not equal, case-insensitive) is the hero here. It tells the query to process everything and then, at the last moment, throw away any events initiated by our sync account.

The full exclusion toolkit includes:

  • != (Not equal, case-sensitive)
  • !~ (Not equal, case-insensitive)
  • !contains (Does not contain the substring)
  • !has (Does not have the whole term)
  • !in (The big one for lists, which we'll cover just in a moment)

Another Real-World Example: Filtering Out Localhost Noise

Negation isn't just for filtering business-as-usual activity like logins from corporate IPs. It's also a critical tool for cleaning up noisy system level logs and focusing on legitimate network traffic.

Let's examine a detection rule for a specific AD FS attack where a threat actor tries to connect to the server to extract configuration settings. The rule correlates AD FS authentication events with Windows Filtering Platform (WFP) logs, which record network connections. The goal is to spot suspicious incoming network traffic on the AD FS server.

Here's the query that analyzes the network connection events (Event ID 5156):

// Adjust this to use a longer timeframe to identify ADFS servers
//let lookback = 0d;
// Adjust this to adjust detection timeframe
//let timeframe = 1d;
// SamAccountName of AD FS Service Account. Filter on the use of a specific AD FS user account
//let adfsuser = 'adfsadmin';
// Identify ADFS Servers
let ADFS_Servers = (
    SecurityEvent
    //| where TimeGenerated > ago(timeframe+lookback)
    | where EventSourceName == 'AD FS Auditing'
    | distinct Computer
);
SecurityEvent
    //| where TimeGenerated > ago(timeframe)
    | where Computer in~ (ADFS_Servers)
    // A token of type 'http://schemas.microsoft.com/ws/2006/05/servicemodel/tokens/SecureConversation'
    // for relying party '-' was successfully authenticated.
    | where EventID == 412
    | extend EventData = parse_xml(EventData).EventData.Data
    | extend InstanceId = tostring(EventData[0])
| join kind=inner
(
    SecurityEvent
    //| where TimeGenerated > ago(timeframe)
    | where Computer in~ (ADFS_Servers)
    // Events to identify caller identity from event 412
    | where EventID == 501
    | extend EventData = parse_xml(EventData).EventData.Data
    | where tostring(EventData[1]) contains 'identity/claims/name'
    | extend InstanceId = tostring(EventData[0])
    | extend ClaimsName = tostring(EventData[2])
    // Filter on the use of a specific AD FS user account
    //| where ClaimsName contains adfsuser
)
on $left.InstanceId == $right.InstanceId
| join kind=inner
(
    SecurityEvent
    | where EventID == 5156
    | where Computer in~ (ADFS_Servers)
    | extend EventData = parse_xml(EventData).EventData.Data
    | mv-expand bagexpansion=array EventData
    | evaluate bag_unpack(EventData)
    | extend Key = tostring(column_ifexists('@Name', "")), Value = column_ifexists('#text', "")
    | evaluate pivot(Key, any(Value), TimeGenerated, Computer, EventID)
    | extend DestPort = column_ifexists("DestPort", ""),
          Direction = column_ifexists("Direction", ""),
          Application = column_ifexists("Application", ""),
          DestAddress = column_ifexists("DestAddress", ""),
          SourceAddress = column_ifexists("SourceAddress", ""),
          SourcePort = column_ifexists("SourcePort", "")
    // Look for inbound connections from endpoints on port 80
    | where DestPort == 80 and Direction == '%%14592' and Application == 'System'
    | where DestAddress !in ('::1','0:0:0:0:0:0:0:1')
)
on $left.Computer == $right.Computer
| project TimeGenerated, Computer, ClaimsName, SourceAddress, SourcePort
| extend HostName = tostring(split(Computer, ".")[0]), DomainIndex = toint(indexof(Computer, '.'))
| extend HostNameDomain = iff(DomainIndex != -1, substring(Computer, DomainIndex + 1), Computer)
| extend AccountName = tostring(split(ClaimsName, @'\')[1]), AccountNTDomain = tostring(split(ClaimsName, @'\')[0])

The logic here is trying to find inbound connections (Direction == '%%14592') to the AD FS server (Computer in~ (ADFS_Servers)) over a specific port. But the analyst who wrote this rule added a crucial line:

| where DestAddress !in ('::1','0:0:0:0:0:0:0:1')

Why is this so important? The IP addresses ::1 (IPv6) and 0:0:0:0:0:0:0:1 (an unusual representation of IPv6 localhost) are loopback addresses. They represent the machine talking to itself. Many legitimate system processes and health checks will generate network traffic on localhost. These events are pure noise. They are expected, benign, and completely uninteresting from a threat hunting perspective.

Without this line, the detection rule would fire constantly on the AD FS server's own internal health checks, creating a storm of false positives. By using !in to explicitly exclude these localhost addresses, the analyst carves away all that noise. The rule is now surgically focused on what it was meant to find: connections originating from other machines on the network, which is exactly the attacker behaviour it's designed to detect.

This is a perfect example of using negation not just to filter business logic, but to clean up the inherent "chattiness" of low-level system logs.

Mastering Lists with has_any and in

As your skills grow, you'll stop hunting for single items and start hunting for entire categories of evil. KQL has amazing operators for working with lists of items, and they are far more efficient than writing a dozen painful or statements.

The Scenario: You want to run a hunt for a whole family of known malicious hacking tools and reconnaissance commands.

The Rookie Way (The Long, Painful Way):

DeviceProcessEvents
| where ProcessCommandLine has "mimikatz" or ProcessCommandLine has "procdump" or ProcessCommandLine has "psexec" or ProcessCommandLine has "adfind" or ProcessCommandLine has "whoami"

This is painful to write, painful to read, and a nightmare to update when you want to add a new tool to your list.

The Professional Way (The has_any Operator):
First, we define our list of badness using the dynamic keyword. Then, we use has_any to check against it.

let knownHackingTools = dynamic(['mimikatz', 'procdump', 'psexec', 'adfind', 'whoami', 'net group']);
DeviceProcessEvents
| where TimeGenerated > ago(7d)
| where ProcessCommandLine has_any (knownHackingTools)

Look how clean and manageable that is! has_any will check if the ProcessCommandLine contains any of the whole terms from our list. We can easily add or remove items from the knownHackingTools list without rewriting the entire query.

The Other List Operator: in
So what's in for? in is like a case-sensitive == for lists. It looks for an exact, case-sensitive match against any item in the list. It's perfect for things with a fixed format, like Event IDs.

The Scenario: Your incident response plan requires you to investigate all successful logons, failed logons, process creation events, and account lockouts.

let criticalEventIDs = dynamic([4624, 4625, 4688, 4740]);
SecurityEvent
| where EventID in (criticalEventIDs)

This is the standard, most efficient way to filter for a specific set of Event IDs. And, of course, its counterpart !in is the absolute best way to work with exclusion lists, like filtering out your corporate IP ranges from a Watchlist. Don’t forget about in~, which is the case-insensitive version of in perfect for lists of usernames or process names where case might vary.

Mastering these matching operators is a fundamental step. It's the difference between fumbling in the dark and operating with the precision of a surgeon. Your homework is to find a query you wrote last week, maybe in an analytics rule or a hunting query, and refactor it. Can you replace a slow contains with a fast has? Can you swap a long or statement with a clean has_any list? Can you make a == more robust by changing it to =~?

This is how you level up. This is how you go from just writing queries to engineering detections.

Class dismissed.

Consent Preferences