Thermal9
A real-time thermal admission engine for GPU data centers. Sub-millisecond ALLOW/DENY decisions on every workload placement, with a full decision proof. Production-ready, deterministic, fail-closed.
Thermal9 is an HTTP service that sits inline with a customer's scheduler — Slurm, Kubernetes, neocloud stacks. Before each workload placement, the scheduler asks Thermal9: "is this thermally safe?". The engine returns ALLOW or DENY in under 1.2 milliseconds median, with a structured decision proof. If ALLOW, the workload runs. If DENY, the scheduler tries another node. No machine learning. No probabilistic safety calls. Same telemetry plus same calibrated profile plus same candidate yields the same decision, every time. Deterministic, auditable, fail-closed.
Numbers, with a way to check them.
Every figure below is reproducible against a public reference. The mechanism is proprietary; the results are not.
1,000 sequential evaluate calls. Single-threaded. Commodity x86.
Measured against a 32-node DGX H100 SuperPOD reference profile. Single-threaded HTTP service, commodity x86 desktop. Each request runs the full engine — 7 admission gates, headroom math, decision proof generation.
- min · 917 μs (fastest call)
- p50 · 1,174 μs (median)
- p90 · 2,138 μs
- p95 · 2,323 μs
- p99 · 3,527 μs
- mean · 1,368 μs
A typical scheduling decision has a 50 ms end-to-end latency budget. Measured p99 of 3.5 ms uses 7% of that budget, leaving 46+ ms of headroom for network round-trip and the customer's own scheduler logic. The engine will not be the bottleneck in any production GPU scheduler.
Same workload trace. Same SuperPOD. Played twice.
A 1-hour workload trace on a 256-GPU DGX H100 SuperPOD configuration (32 systems, 4 racks, liquid cooling per NVIDIA reference architecture), replayed under industry-baseline scheduling and again under Thermal9 admission control:
- Workloads admitted · baseline: 52, Thermal9: 40
- Workloads denied (deferred) · baseline: 0, Thermal9: 12
- Thermal limit violations · baseline: 1,104, Thermal9: 0
- Throttle events · baseline: 6, Thermal9: 0
- Total thermal events prevented · 1,110 (100% reduction)
Validated against public NVIDIA data.
Reference profiles are derived directly from public NVIDIA datasheets and cross-checked against the independent mlco2/impact public GPU TDP dataset (MIT, 49 GPU entries citing NVIDIA datasheets directly). Reproducible by anyone with the bundled kit and one CLI command.
- DGX H100 SuperPOD · profile 700 W, public 700 W — +0.0% PASS
- DGX H200 SuperPOD · profile 700 W, public 700 W — +0.0% PASS
- DGX A100 SuperPOD · profile 400 W, public 400 W — +0.0% PASS
All headline numbers reproduce against an independent, industry-standard reference. No NDA required for the verification path.
NVIDIA datasheets + mlco2/impact (MIT)One HTTP call per placement decision.
Thermal9 is an HTTP service. Your scheduler sends a POST /evaluate request with the workload and candidate node. The engine runs in under 1.2 ms median and returns a JSON response with decision (ALLOW or DENY), gate trace, admission score, and minimum headroom.
- Resource footprint · ~20 MB RAM, one CPU core, no GPU required
- Throughput · thousands of placements per second per core
- Restart safety · profile loads in <100 ms at startup
- Network · same datacenter network as scheduler; not exposed to public internet
- Schedulers · drop-in for Slurm, Kubernetes, and neocloud stacks
- Latency budget impact · p99 of 3.5 ms uses 7% of typical 50 ms budget
Who buys this.
Hyperscale data centers, neocloud GPU operators, AI training facilities, HPC centers, and on-prem AI infrastructure teams running DGX H100 / H200 / A100 SuperPODs — or comparable thermally-dense GPU clusters — that need deterministic admission control inline with their scheduler.
Engagement options
- · 30-day shadow assessment (free, read-only) — capture 14–30 days of nvidia-smi or DCGM telemetry, no agents, no scheduler integration. Receive a full HTML/JSON report identifying every thermally unsafe placement during the window plus stranded-capacity dollar figures.
- · Production deployment — inline with your Slurm / Kubernetes / neocloud scheduler. Discussed under NDA after assessment review.