Skip to content

Sentinel Data Lake is getting “operational”: how to choose KQL vs Async vs Jobs vs Notebooks (and why a small schema change can break big automations)

Over the last week, Microsoft’s Sentinel content has been unusually practical: less “vision deck,” more “here’s how to actually run investigations at scale without timeouts and without bankrupting your Analytics tier.” Two official posts focus on the Sentinel data lake and how you should think about queries and compute (KQL modes and notebooks). In parallel, we have a near-term footgun: an Advanced Hunting schema change landing Feb 25 that can quietly break downstream scripts and SOAR playbooks if you treat booleans as integers.

If you’re running Sentinel alongside Defender XDR (which is increasingly the default posture), these updates matter because they affect the two places teams most commonly feel pain:

  • Long lookback hunting (weeks/months) without repeatedly timing out or copy/pasting “temporary” queries into permanent rules.
  • Operationalizing discoveries (turning a hunt into a detection, dashboard, or automation) without over-hydrating raw telemetry into expensive hot analytics storage.

1) Sentinel Data Lake: stop treating retention as “cold storage you can’t really use”

The Sentinel data lake story is shifting from “cheap place to keep data” to “a tier you can meaningfully compute on.” Microsoft’s notebook post is the strongest articulation so far of why: the data lake becomes the place where you can do heavier analysis — Spark-based transformations, baselining, ML feature engineering, enrichment pipelines — while keeping the Analytics tier focused on real-time detection and day-to-day SOC operations.

What’s new (in practice): native integration with Jupyter Notebooks (Spark runtime) so analysts and engineers can use Python + Spark directly against lake data. The post calls out common SOC patterns that are awkward in “KQL-only, analytics-tier-only” approaches:

  • Extended investigations beyond 90 days (slow-moving campaigns, persistence, repeated low-frequency auth failures).
  • Behavioral baselining with time-series modeling and more flexible statistics than most teams implement in pure KQL.
  • Retrospective hunting when a new IOC drops and you need to look back across months quickly.
  • ML-powered enrichment that produces curated outputs your detection rules can later consume.

My take: even if you don’t have “data science” people in your SOC, notebooks matter because they give detection engineering a better place to do the messy part — transforming raw, high-volume logs into a stable, queryable signal that you can then operationalize. If your teams currently “solve” this by exporting logs to an external data platform, notebooks reduce friction and (often) governance complexity.

2) Three ways to run KQL on the lake: Interactive vs Async vs Jobs

The companion post on KQL execution modes is basically a field guide to avoid the two most common SOC anti-patterns:

  1. Trying to do month-long lookbacks in interactive queries until you hit timeouts and rage-quit.
  2. Hydrating everything into Analytics tier “just so we can query it,” then discovering the bill.

Microsoft breaks KQL-on-lake into three modes:

Interactive queries

  • Best for: ad-hoc triage, hypothesis checks, small/moderate datasets.
  • Reality: you want answers in minutes; you accept that results don’t persist beyond the session.
  • Rule of thumb: if an analyst must wait ~7–8 minutes and it times out, you picked the wrong mode.

Async queries

  • Best for: larger datasets / longer lookbacks during an active incident where you’ll revisit results repeatedly.
  • Key behavior: runs up to ~1 hour; results cached (hot cache) for ~24 hours so you don’t have to re-run the expensive query while pivoting.
  • Why practitioners should care: this is the “incident war room” mode: one analyst runs the big query, the team reuses the output while making decisions.

KQL Jobs

  • Best for: operationalization. Turning repeated queries into scheduled jobs that hydrate outputs into the Analytics tier so detections/dashboards can use them reliably.
  • Typical use cases: IoC matching, summarization, anomaly scoring tables, data quality checks.

My take: treat these modes like a maturity path for a signal:

  • Investigate: Interactive KQL (fast feedback)
  • Collaborate: Async KQL (shareable, revisitable)
  • Operationalize: KQL Jobs (persist output, feed detections)

Notebooks (Spark) sit alongside this as the “heavy compute / feature engineering” lane — particularly when you need joins, transformations, or ML workflows that are painful in KQL alone.

3) Analytics rules, hunting, and the Sentinel + Defender XDR reality

Most enterprise deployments are now “Sentinel + Defender XDR” whether they planned it or not. That matters because workflows span:

  • Defender XDR hunting (Advanced Hunting) for endpoint/identity/email signals
  • Sentinel detections (analytics rules) for cross-domain correlation + SOAR
  • Data lake retention for long lookbacks and cost control

The practical question is: where do you run the investigation and where do you store the output?

A solid pattern emerging from these posts looks like this:

  1. Use Defender/Sentinel detections to surface candidate signals quickly.
  2. Use Async KQL (lake) when an incident demands long lookbacks but you don’t want to permanently hydrate data.
  3. Use Notebooks when you’re creating a new signal class (baseline, scoring model, enrichment pipeline).
  4. Promote the stabilized output into Analytics tier via KQL Jobs, then build analytics rules that consume it.

This keeps your “hot” tier focused on what must be fast and always-available, while the lake handles the big, messy history.

4) Don’t get bitten: Feb 25 boolean schema change for Advanced Hunting exports

The SOC Automators note is short but important: on Feb 25, boolean fields in Advanced Hunting exports change from numeric (1/0) to textual (True/False). The key nuance: this is about downstream consumers of exported results. Your KQL and custom detection rules are expected to handle it automatically, but your scripts/playbooks might not.

What to do this week:

  • Inventory anything that consumes Advanced Hunting results outside of Microsoft tooling (Logic Apps parsing, Functions, custom Python/PowerShell, SIEM forwarders, ticketing bridges).
  • Search for comparisons like == 1 / == 0 against boolean-ish fields.
  • Update parsers to accept both representations (ideally defensively: treat 1, true, True as truthy; same for falsey).
  • Add a regression test that fails loudly if schema output changes again.

Link: Possible breaking change for Sentinel | Docs: Advanced Hunting schema changes

5) What I’d prioritize next (a practical mini-roadmap)

  • If you’re piloting Sentinel data lake: pick one painful investigation scenario (e.g., password spray with long dwell time) and implement it end-to-end with Async queries + a notebook-based baseline.
  • If you already have lake data: formalize a promotion pipeline: hunt → async → notebook → KQL job → analytics rule. Write it down as an internal playbook so your SOC doesn’t improvise every time.
  • If you have automations around Advanced Hunting exports: fix the boolean parsing now, before it becomes a mystery outage.
Published inDefenderM365Sentinel

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *