CQL · CANADA QUANT LABS · EST. 2026

Canada
Quant Labs.

Canada's open-weight model lab. We train, quantize, and deploy sovereign AI on Canadian Blackwell silicon — for the regulated industries that can't run on someone else's API.

Operates on
NVIDIA DGX B300
Headquarters
Victoria, BC · Canada
Serving
Legal · Medical · Defence · Finance
SCROLL
I   /   THESIS

The country is investing.
No one is building the model.

Public capital is flowing through every layer of the Canadian AI stack — data centres, compute subsidies, defence platforms, sovereign cloud. The one layer that's empty is the one where the IP actually lives.

Canada is investing $2 billion in sovereign AI. The country has no model lab.

$300M
AI Compute Access Fund. Oversubscribed in its first round.
ISED · MAY 2026
$705M
Sovereign Compute Infrastructure Program. A national supercomputer.
BUDGET 2024–25
$900M
Defence Industrial Strategy injection. Drone hub, NRC, quantum.
DND · MAR 2026
$4B
BDC Defence Platform. Up to $6B in cheques to Canadian defence cos.
BDC · 2026
II   /   LANDSCAPE

Where the Canadian stack actually sits.

Every dollar today flows through closed APIs, telco-hosted metal, or thin wrapper layers. Cohere just open-weighted Command A+ — a Canadian foundation worth building on. The layer above it — vertical-specialized, audited, customer-owned — is still unoccupied.

APPLICATION
Workday · ServiceNow · SAP integrations · North agents
MODEL
Cohere Command A+ (open-weight, frontier-general · Apache-2.0, May 2026)
INFERENCE / SERVING
Featherless · vLLM & SGLang derivatives
INFRASTRUCTURE
Bell AI Fabric · Telus Sovereign Factory · CoreWeave (US)
THE GAP

Open-weight,
sovereign,
vertical models.

Trained in Canada.
Quantized in Canada.
Deployed in Canada.
Owned by the customer.
III   /   WHAT WE DO

Three disciplines.
One toolchain.

We're not a wrapper. Not a hosting platform. Not a fine-tuning API. We take frontier open base models and produce shippable, quantized, sovereign deployments — with the IP and audit evidence customers actually own.

01   TRAIN

Train.

Post-training on open base models. SFT, DPO, GRPO, RLAIF. Customer data stays sovereign throughout the run.

  • TRL · Axolotl · DeepSpeed
  • Custom GRPO trainer
  • FP8 / BF16 training stable
02   QUANTIZE

Quantize.

Production W4A16, NVFP4, MXFP4 recipes — with MTP draft heads preserved so speculative decoding survives end-to-end. Patches land upstream in vLLM and llm-compressor.

  • llm-compressor · GPTQ · AWQ
  • SmoothQuant · QuaRot
  • Compressed-tensors format
03   DEPLOY

Deploy.

Air-gapped, on-prem, sovereign cloud. With audit trails, eval evidence, and model risk documentation packaged with the build.

  • vLLM · FlashAttention-3
  • CUTLASS · custom kernels
  • lm-eval-harness pipelines
IV   /   VERTICALS

Four regulated industries.
One model factory.

We're starting wide. The verticals will narrow themselves as contracts land — not as we pretend to know. Each one has a clear buyer set, a clear corpus, and a clear wedge against frontier general models.

LEGAL

Hallucination floors below frontier models on Canadian case law.

Buyers BigLaw · in-house counsel · courts · Department of Justice Corpora Canadian case law · statutes · CIPO · provincial regulation
MEDICAL

On-prem deployable, PHIPA/PIPEDA-clean, evidence-cited outputs.

Buyers Provincial health authorities · hospital networks · pharma R&D Corpora PubMed · MIMIC-IV · CIHI · Health Canada
DEFENCE

Air-gapped, classified-ready, doctrine-aware, Five Eyes interoperable.

Buyers DND · CSE · Shared Services Canada · prime contractors Corpora OSINT · NATO doctrine · allied technical material
FINANCE

OSFI-aligned, MRM-documented, audit-ready, on Canadian soil.

Buyers Big Six banks · credit unions · insurers · pension funds Corpora Public filings · regulatory guidance · market data · client corpora
V   /   WHY NOW

Six forces.
One window.

Sovereign AI in Canada has shifted from policy paper to active deployment in under twelve months. The first lab to ship audited, vertical, open-weight models with defence contracts behind them owns the category.

POLICY
Federal AI Strategy launching · Sovereign AI Compute Strategy in active deployment phase.
PUBLIC CAPITAL
$2B sovereign-AI program plus $4B BDC Defence Platform writing cheques today.
DEFENCE
SSC + DND + CSE jointly building a "made-in-Canada AI tool" — with an explicit company-partner ask.
GEOPOLITICS
US trade volatility has Canadian buyers demanding sovereign alternatives by procurement mandate.
REGULATION
AIDA, EU AI Act extraterritorial reach, OSFI MRM — compliance-grade AI is now table-stakes.
INCUMBENT GAP
Cohere's Command A+ is now open-weight — a base, not a finish. Bell hosts metal. Featherless wraps. Nobody ships vertical-specialized, audited, customer-owned deployments on top.
VI   /   CAPABILITIES

Hardware Canada
doesn't have yet.

Frontier-tier open models — 600B+ parameter MoEs, long-context dense models — need real Blackwell silicon to train and serve at production scale. We operate it here.

Hardware

  • PLATFORMNVIDIA DGX B300 · 8× Blackwell
  • MEMORY2.3 TB HBM3e total, NVLink Switch fabric
  • PRECISIONNative FP4 compute for both training and inference
  • LOCATIONHosted at Equinix Vancouver · sovereign by default
  • THROUGHPUT10–50× cloud-API throughput per dollar at steady state

Toolchain

  • BASECohere Command A+ · DeepSeek V4 · Llama · Qwen — open-weight foundations
  • TRAININGTRL · Axolotl · DeepSpeed · custom GRPO trainer
  • QUANTIZATIONllm-compressor · GPTQ · AWQ · SmoothQuant · QuaRot
  • INFERENCEvLLM · FlashAttention-3 · CUTLASS kernels
  • EVALUATIONlm-eval-harness · custom domain suites
  • UPSTREAMActive contributors to vLLM and llm-compressor
VII   /   MODELS

Open weights,
shipped and shipping.

What we've put on Hugging Face under open licenses — and what we're calibrating next. Plus an invitation: if there's a model your team needs quantized in the open, we'd like to hear about it.

SHIPPED · 22 MAY 2026 · MIT

DeepSeek-V4-Pro
NVFP4-FP8-MTP.

The first NVFP4-FP8 quant of DeepSeek-V4-Pro with the MTP draft head preserved for vLLM speculative decoding. A byte-deterministic conversion of V4-Pro's native MXFP4+FP8 source to NVFP4 on 8× B300 SXM6 — shipped alongside the upstream vLLM patch that makes the flashinfer_trtllm NVFP4 MoE backend load it, merged the same day.

+41.1%
batched throughput vs native MXFP4 · c=16
94.1%
GSM8K 8-shot · full n=1319, 0 truncations
#42209
vLLM NVFP4 MoE PR · merged 22 May
MIT
open weights · same license as the base
Batched throughput
572.8 tok/s vs 405.9 native MXFP4 @ c=16 · 8× B300 SXM6 TP=8 + EP
Single-stream + MTP
75.3 tok/s with MTP n=2 vs 69.8 baseline · +7.9%
GSM8K 8-shot
94.09% strict-match (full n=1319) · 95.67% on matched n=300 set
AIME 2024 pass@1
76.0% (19/25) · 0 truncations
MTP draft acceptance
91.21% vs 90.92% MXFP4 on focused probe · 92.83% cumulative (40,225 drafts)
Architecture
1,598.84B total · 49.60B active per token · MoE · NVFP4 experts (g=16, E4M3) + FP8_BLOCK attention & shared experts · MTP retained at BF16
SHIPPED · 21 MAY 2026 · MIT

DeepSeek-V4-Flash
W4A16-FP8-MTP.

A W4A16 + FP8_BLOCK quant of DeepSeek-V4-Flash that keeps the MTP draft head at BF16 — the first build on this base where speculative decoding survives end-to-end through GPTQ calibration, transformers save, and vLLM load on Hopper. The reproduction repo documents three upstream silent-drop bugs and the fixup pipeline that routes around them.

1.49×
decode speedup with spec-decode · bs=1, k=1
69.94%
MTP draft acceptance · 21,024 / 30,058 tokens
93.71%
GSM8K 8-shot strict · ± 0.67
MIT
open weights · same license as the base
Decode TPOT
6.02 ms with spec-decode vs 8.93 ms without · 8× H200 SXM5 TP=2
GSM8K 8-shot
93.71% ± 0.67 strict-match
MMLU 5-shot
86.88% ± 0.27
HumanEval pass@1
84.76% ± 2.82
ToolCall15
24 / 30 (80%)
Architecture
284B total · 13B active per token · MoE · W4A16 GPTQ experts (g=128) + FP8_BLOCK attention · MTP retained at BF16 · 159 GB / 4 shards
SHIPPED · MAY 2026 · MIT

DeepSeek-V4-Flash
NVFP4-FP8-MTP.

Datacenter-Blackwell NVFP4 build of DeepSeek-V4-Flash with the MTP draft head retained on-disk, so vLLM serves it with --speculative-config method=mtp end-to-end. Same quant math as the comparable peer; the structural difference is that MTP survives calibration instead of being stripped at load time.

3.5×
compression: ~600 GB BF16 → 172 GB
96.0%
AIME 2024 non-truncated · matches BF16+MTP
88%
MTP draft acceptance · flat c=1 → c=16
MIT
open weights · same license as the base
AIME 2024 pass@1
96.0% non-truncated · 83.3% raw (65K cap) · 4× B300 SXM6 TP=4
GSM8K 8-shot
91.81% strict-match · 95.15% flexible-extract
HumanEval pass@1
91.5% EvalPlus · 84.8% EvalPlus+
IFEval prompt-strict
85.4%
Coding throughput
278.7 tok/s @ c=1 · 1,577 tok/s @ c=16 · 4× B300 TP=4
Architecture
284B base + ~17B MTP · 13B active per token · MoE · NVFP4 experts + FP8_BLOCK attention · MTP retained
SHIPPED · MAY 2026 · MIT

DeepSeek-V4-Flash
W4A16-FP8.

A W4A16 + FP8_BLOCK quant of DeepSeek-V4-Flash with a vLLM serving recipe. Runs on Hopper (H200) and two Blackwell SKUs (DGX Spark, RTX PRO 6000), built alongside the upstream patches it depends on. Open weights, no token required.

7,161
HF downloads · 16 days from publish
3.8×
compression: 543 GB → 143 GB
1 M
token context · dual-Spark TP=2
MIT
open weights · same license as the base
GSM8K 8-shot
95.37% Spark TP=2 · 94.99% RTX PRO 6000
HumanEval pass@1
80.49% Spark TP=2 · 78.05% RTX PRO 6000
Toolcall15
92% Spark · 90% RTX PRO 6000
Long-context NIAH
PASS — 256K × 2 concurrent · 500K × 1
Decode throughput
84 tok/s aggregate · 2× RTX PRO 6000 @ concurrency=2
Architecture
284B total · 13B active per token · MoE · W4A16 experts + FP8_BLOCK attention
IN FLIGHT
Planning
Reasoning & agentic distill
Distilling frontier reasoning and agentic coding into a deployable open-weight quant.
Prototyping
Autonomous coding cluster
Cluster-scale autonomous coding framework across our Blackwell fabric.
Designing
CanLegal-Bench
Open eval suite for Canadian common-law and civil-code reasoning, with citation verification and bilingual EN/FR consistency.

Each lands on Hugging Face under the base model's license when it's ready. Status updates: partnerships@cql.ca.

OPEN QUANTS

Have a model you want
quantized in the open?

We ship one or two community quants per cycle. We're looking for:

  • OPEN-WEIGHT BASEApache-2.0, MIT, Llama-compatible — nothing research-only.
  • REAL DEMANDAt least one team waiting to deploy it.
  • RECIPE-FITW4A16, NVFP4, or MXFP4 + vLLM target.
  • REGULATABLENo use-restrictions that conflict with regulated-industry deployment.

First-look review within two weeks. Selected requests get a public reproduction repo and a model card under canada-quant/.

Target quantization
VIII   /   CONTACT

Sovereign AI
is a build, not a brand.

If your organization needs a model it can audit, deploy on-prem, and keep under Canadian jurisdiction — we should talk. We respond to every serious inquiry within 48 hours.

Get in touch. Build with us.

What vertical?

Or email partnerships@cql.ca · press@cql.ca directly.