Evaluating and Onboarding a Microsoft MSSP (Without the Chaos)

Last updated on 14 Apr 2026

All right class

If you read lesson one, you already know how to define your scope, lock down your log retention, and write an RFP that forces vendors to be honest. You have set the trap. Now we see who falls into it.

When you get to the evaluation stage, you are not having a friendly chat over coffee. You are conducting a job interview. You are about to hire strangers to sit inside your core security boundary and manage your most sensitive data.

Here is exactly what you can to ask them, how to test them, and how to survive the onboarding process without breaking your production environment.

Evaluating Providers Like You Are Hiring a Team (Because You Are)

When you get to actual conversations, treat them like job interviews. You are choosing people who will sit inside your Sentinel and Defender estate, read your logs, touch your incidents and make decisions about your security posture. This is not a catering contract.

Also, do your own due diligence outside the formal process. Nobody tells you this, but check Glassdoor and similar sites where current and former staff talk about the company. If you see two stars with multiple people saying analysts leave constantly because management is chaotic, expect that instability to land directly on your service. High SOC analyst turnover means the person handling your critical incidents has been there for three weeks, has no context about your environment, and is already interviewing elsewhere.

A provider can have the shiniest sales deck in the industry. If the people doing the actual work are miserable and leaving, your service quality will reflect that.

Interrogating Providers Like You Mean It

Sales decks are useless. Every MSSP says they have highly trained analysts, cutting edge threat intelligence, and seamless integration. Your job is to poke holes in the marketing until you find the actual technical capability underneath.

Why should we choose you?

You are giving them an open goal to differentiate themselves. If the best they can offer is "we have experience" and "we are trusted by many leading organisations," think very hard about whether there is anything behind the marketing. A strong provider will give you something concrete (detection engineering, custom rules, cost analytics as example) A weak one will give you a paragraph that could apply to any company on the planet.

What experience and credentials do your analysts have?

This is a loaded question and one of the most revealing you can ask. You are not just checking CVs. You are finding out whether this company invests in its people or treats analysts as disposable shift workers.

Do their analysts actually write KQL? Do they build Sentinel content or just deploy templates? Does the company pay for training and expect staff to gain Microsoft security certifications, or is professional development something analysts do in their own time if they feel like it?

The way they answer this tells you how their SOC actually runs. High turnover, no training budget and no career progression means the person investigating your critical incident at 2 am is someone who started three weeks ago and is already looking for another job.

What threat intelligence sources do you use?

You are trying to work out if they just consume the free Microsoft Threat Intelligence feed and call it a day, or if they have an actual TI programme.

If they say they rely entirely on what Microsoft provides, ask what they add themselves. Do they push IOCs discovered during investigations back into Sentinel Threat Intelligence or watchlists? Do they enrich incidents with context from their own sources? Or are they passengers, waiting for Microsoft to do the work for them? A provider with a mature TI function will be able to explain their process without hesitation. One that does not have a programme will talk around the question.

Explain how you perform alert tuning

This question is gold. It exposes how detection engineering actually works inside their operation. If they say they close noisy alerts or tune things when they get annoying, they are amateurs.

You want them to talk about modifying KQL in the analytics rules. You want them talking about using Sentinel Watchlist tables to build dynamic exclusion lists. Ask how often they review false positives. Ask how they record tuning decisions. Ask how many new analytics rules they typically create for a client over a year. Ask how they track detection coverage and identify gaps. Ask whether they map analytics to a framework or just react to noise.

If they only rely on the built in Microsoft templates and never write their own KQL, they add zero extra value to your deployment. If the answer is "we tune things when they get noisy," that is not a mature practice. That is firefighting dressed up as a service.

How do you reduce log noise and optimise ingestion costs?

This is important, and it is where your Sentinel bill either stays reasonable or spirals into something that makes finance send angry emails every month. Azure consumption billing is ruthless. If your MSSP is not actively protecting your wallet, they are a liability.

A competent MSSP should have a clear methodology for this. Not just "we tune alerts." That is the output. You want to understand the process.

On the noise reduction side, ask them exactly how they reduce log noise before it hits your Sentinel workspace. If they are not using things like Xpath to filter out useless Windows Event noise, or dropping junk network traffic at the KQL level, they are intentionally inflating your Azure bill because it is easier for them. The same goes for Data Lake, are they telling you straight away that you should ingest certain logs to a cheaper tier? (they should!) Do they review which Sentinel analytics rules are generating the most false positives and systematically tune or rewrite them?

On cost optimisation, ask how they decide what goes into the Analytics tier versus the Data Lake. Ask if they monitor your commitment tier and flag when you are consistently over or under. Ask whether they have ever recommended to a client that a data source should be disconnected because the security value did not justify the ingestion cost.

If they look at you blankly when you ask about data collection rules, transformation pipelines or Data Lake, they are not managing your Sentinel costs. They are just letting the meter run.

How do you collect logs from my environment?

This tells you what level of access they expect and what plumbing they will build. In principle, you want log collection through standard Microsoft patterns with least privilege access.

Not a provider who demands Contributor or Owner across every Azure subscription because it is easier for them. I have personally seen MSSPs with read/write permissions over an entire customer Azure environment because someone set it up that way during onboarding and nobody ever reviewed it. That is not a partnership. That is negligence.

Push for Azure Lighthouse based access with tightly scoped roles. If they push back on least privilege, that tells you everything about how they think about security, which is ironic given what you are paying them to do.

Walk me through your automation and SOAR capability

If an MSSP cannot articulate how they use automation in a Microsoft SOC, they are running a manual operation, and manual operations do not scale. If an MSSP does not leverage Sentinel automation rules and Logic Apps playbooks (or any other sort of automation of course), it's a red flag.

A room full of junior analysts manually clicking through Entra ID brute force alerts is not a modern SOC. It means that they are dumping all of the work on the analysts, creating a heavy burnout. Don't expect good checks against your alerts if that's the case.

In the Microsoft ecosystem, automation lives in a few places. Sentinel automation rules handle basic actions like changing incident severity, assigning incidents to owners, or running a playbook when specific conditions are met. Logic Apps playbooks do the heavier work, things like enriching an incident with user details from Entra ID, checking an IP against threat intelligence (some of those can also be achieved using KQL within the analytic rule itself), isolating a device through Defender for Endpoint, posting to a Teams channel, or creating a ticket in ServiceNow.

Ask them to give you some ideas about automations running in the environment. Before a human analyst even looks at an incident, a Logic App or KQL should have already queried the user manager, checked their recent MFA locations, and appended that data to the incident comments/results. If they do manual enrichment, they are wasting the time you are paying for.

Ask whether they maintain and update playbooks over time or just deploy them at the start and forget about them. Logic Apps break. Connectors expire. APIs change. A playbook that worked six months ago might be silently failing today if nobody is watching it. Ask how they monitor playbook health and how quickly they fix failures.

The Dead Log Source Test

You need to know how they handle reality when things break. Ask them what happens if a critical log source stops reporting to Sentinel.

Do they have a KQL heartbeat query watching the Syslog table for your firewall? Ask if they will actually call you when your Azure Firewall stops dropping logs into your workspace, or if they will wait until your monthly service review to mention it.

Show me your detection coverage map

You already know you should require this in the RFP, but the evaluation conversation is where you push harder.

Ask them to show you an actual MITRE ATT&CK coverage map from a current client engagement, anonymised obviously. You want to see which techniques they have analytics rules for, which data sources feed those detections, and how many gaps exist. If they say they align with MITRE but cannot produce a visual map, they are paying lip service to the framework without doing the work.

Ask how often the map gets updated. Ask what happens when a new technique is added to ATT&CK or when Microsoft adds a new data connector that enables detection of something previously invisible. Ask whether findings from threat hunts feed back into the coverage map as new detections.

A mature provider will walk you through this confidently and explain the trade offs. Some techniques are expensive to detect because they require endpoint telemetry you might not be collecting. Some are nearly impossible to detect reliably without producing mountains of false positives. A provider who admits these limitations is more trustworthy than one who claims they cover everything.

Do you use an official framework like MITRE ATT&CK?

This ties directly into the coverage mapping, but the question is broader. You are checking whether the framework is genuinely embedded in how they work or just mentioned in their marketing.

Do they classify incidents by ATT&CK technique? Do their hunt reports reference specific tactics and techniques? Do they use the framework to prioritise which detections to build next? If your internal team also uses ATT&CK, shared language makes everything smoother. If they do not use any framework at all, ask how they measure and communicate coverage. If the answer is vague, expect their reporting to be equally vague.

Describe your threat hunting programme

This is different from "do you do threat hunting" which will always get a yes. You want the detail.

How many dedicated hunting hours do they allocate per client per month? Is it the same analysts who triage alerts all day or do they have specialists? What does a hunt look like in practice? Do they start with a hypothesis, run queries, document findings and produce a report? Or do they occasionally poke around in the data when things are quiet and call it hunting?

Ask for a sample hunt report from a previous engagement. A real one will show a clear hypothesis, the KQL queries or data analysis used, what was found or not found, and whether the hunt resulted in new detections being created. A fake one will be a page of waffle with no technical substance.

The rest of the list

These are not throwaway questions. Every one of them reveals something.

How long have you been providing managed security services?
What services do you actually deliver, broken down in detail?
Can we run a one month proof of concept before committing?
Can you provide client references?
Can you share your operational metrics, such as initial response time and mean time to resolve?
Does your SOC hold any compliance certifications?
Have you suffered any breaches in recent years, and if so, how were they handled?
Describe how your analysts investigate and escalate incidents in Sentinel.
Can you design and build Sentinel for us or is it something we need to do?
What happens if an incident fires at 3 am on Christmas Day?
How do your analysts cope with limited knowledge of a customer's environment during an active investigation?
Can you provide sample escalation messages and incident reports so we can see the quality of your communication?
Where does our data reside, and can we choose the Azure region? (If you need data in UK South or West Europe for compliance, that needs to be confirmed before you sign anything, not discovered after deployment).

Onboarding Without the Slow Burn Disaster

After selection, onboarding is where you either set this up for success or trigger a slow burn disaster. Do not let them rush this just so they can start billing you.

Define key contacts on both sides

On your side, this is usually the SOC lead, security manager, Sentinel engineer and platform owner. On their side, the service delivery manager, SOC manager and technical lead. You want actual names and a direct phone line to the SOC team. If your critical incident escalation path is "email the shared inbox and hope and we will get back to you within 7 working days", you know something is wrong.

Verify compliance requirements are genuinely met

Check that monitoring hours, retention periods, encryption standards and data residency settings actually match your obligations. Not what the provider said in the proposal. What is actually configured?

If your policy mandates 24/7/365 monitoring but the MSSP only staffs Monday to Friday with an on call pager at weekends, you do not have 24/7 monitoring. You have a weekday service with a hope and prayer bolted on. Fix that now, not after a weekend breach.

Provide asset lists and documentation

The more context you hand the MSSP, the less time they waste guessing. Network diagrams. Azure subscription layouts. Entra ID tenant structure. Previous SIEM rules that still matter. Lists of high risk applications. Known gaps in logging. Anything that helps them understand your environment faster. Give them what they need to do the job. A good MSSP should already have a list of questions they are asking you to give them a good visibility and understanding of the environment.

Identify your crown jewels

Decide which assets matter most and tell the MSSP to prioritise visibility and detection tuning around them. Databases holding client data. Identity infrastructure. Payment processing systems. Executive mailboxes. Whatever represents the highest impact if compromised.

Deploy and configure Sentinel and supporting tools

Whether that means building Sentinel from scratch or cleaning up an existing deployment, this is where data connectors get wired up, workspaces get configured, Defender XDR integration is confirmed, retention policies are applied, and initial analytics rules and automation playbooks are deployed.

This is also where ingestion cost controls should be established from day one. Data collection rules configured to filter noise at the source. Transformation rules stripping unnecessary fields before data hits the workspace. Tables assigned to the correct tier based on query frequency and compliance need. If the provider deploys everything into the Analytics tier by default because it is easier, your first monthly bill will explain why that was a mistake.

Test alert criteria

Trigger test events or synthetic alerts and verify that Sentinel incidents are created correctly, routed to the MSSP, and handled within the agreed SLA times. Do not assume it works because someone said it does. Prove it. If a test critical incident takes forty five minutes to get a response when the SLA says twenty, you have found the problem before it matters. Fix it now.

Test escalation protocols

Do not wait for a real breach to find out if their phone system works. Trigger a synthetic alert. Run a simulated credential dumping script on a test server to intentionally trigger Microsoft Defender for Endpoint. Watch exactly how long it takes for the incident to appear in Sentinel. See who gets the phone call and read what the ticket looks like. Confirm who gets called, through what channel, how quickly, and what information they receive. If the escalation path has holes, find them during testing, not during an actual breach.

Test tool integrations

Validate that everything connected to the workflow actually functions. Ticketing system integration. Are you getting the data from a 3rd party application you asked about? Can you see the Workbooks in Sentinel? Can you access Sentinel at all?

Define reporting cadence and meeting schedule

Agree how often you meet and what you see. Monthly metrics reports. Quarterly strategy reviews covering detection coverage, tuning activity, hunt outcomes, automation effectiveness and recommendations. Make this part of the contract. If reporting is informal and optional, it will stop within two months.

Provide staff training

Your own people still need to understand what Sentinel and Defender are doing, how to read incidents, how to query the data, and critically, how to challenge the MSSP when something looks wrong. If your internal team cannot question the provider's work, you have outsourced accountability, not just operations. Often, you can ask your provider to provide a few hours of training explaining how the Sentinel/Defender works and how to look at it to understand it. Have someone to look at the incidents from time to time, ensuring they are up to your standard.

The Final Pre-Flight Checklist

Before you sign any contract, verify these exact items are explicitly written into the agreement. Verbal promises from a sales engineer do not count.

Access is strictly via Azure Lighthouse. Nobody gets permanent standing access to your tenant. No local admin accounts. Access should be delegated via Lighthouse with least privilege role assignments.
Workspace design and retention tiers are defined. The statement of work must explicitly define what data goes into hot Interactive storage, and what drops to Data Lake, so you do not get a surprise Azure bill in month two.
Proactive hunting is measured. The frequency of threat hunting exercises and the creation of custom KQL detections must be measured in the contract.
You own all the intellectual property. This is the big one. If you decide to fire them in two years, every custom KQL analytic rule, Logic App playbook, and workbook they built in your Sentinel workspace should remain yours. There will always be pushback based on the intellectual property, so find a common middle ground (i.e. all rules/logic apps tailored for your environment should stay in your environment).

Before you send that first email to a provider, use this to double check everything across both parts:

Have you documented exactly what services you need, from SIEM design through to ongoing monitoring, response, hunting and automation?
Have you defined your log retention requirements, including which tables sit in Analytics, Basic or Archive tiers?
Have you mapped your compliance obligations, including client contractual requirements, not just regulatory ones?
Have you agreed internally on SLA targets you are willing to fight for during negotiation?
Have you written the RFP with enough technical detail that a competent provider can give you a realistic, priced proposal?
Have you included a requirement for MITRE ATT&CK detection coverage mapping?
Have you defined what proactive threat hunting looks like, including cadence, deliverables and how findings feed back into detection engineering?
Have you included questions about log noise reduction and ingestion cost optimisation in your evaluation criteria?
Have you prepared your evaluation questions and decided what good answers look like before you hear any answers?
Have you checked the provider's reputation from the employee perspective, not just the sales perspective?
Have you checked what kind of reporting you will be getting? Are you able to think about anything custom you would like to see? Can they provide that at a later date if needed?

If you walk through all of this and a provider still looks good, you probably have someone worth working with. If they stumble on half of it, you just saved your company from a very expensive mistake.