Huawei's openPangu 2.0 Is Now Open-Source
Trained without a single NVIDIA GPU — 512K context, full training pipeline open source

On June 30, 2026, Huawei delivered on the promise made at HDC 2026: openPangu-2.0-Flash weights, inference code, and training operators went live on GitCode. This is the first frontier-scale open-source LLM trained entirely on Huawei Ascend 910B NPUs — no A100, no H100, no NVIDIA silicon anywhere in the pipeline. Two MoE variants share a 512K context window: Pro at 505B total / 18B active and Flash at 92B total / 6B active. Huawei plans to open-source seven components including pre-training and post-training code — a level of transparency almost unheard of at this scale. This guide covers the release timeline, architecture innovations (mHC, Muon, ModAttn, DSA+SWA), Ascend training metrics, competitor comparison tables, ModelArts API and GitCode deployment steps, hardware requirements, sovereign AI implications, and the openPangu License.

01

openPangu 2.0 release timeline, specs, and seven-component roadmap

Richard Yu unveiled openPangu 2.0 at HDC 2026 in Dongguan on June 12, 2026. Flash weights shipped eighteen days later. Pro follows in July. The rest of the training stack rolls out through H2 2026.

Release timeline

DateMilestone
2026-06-12HDC 2026 keynote: openPangu 2.0 officially announced
2026-06-30Flash weights, base inference code, and training operators live on GitCode
2026-07 (planned)Pro model weights and inference code
H2 2026 (planned)Pre-training code, post-training code (SFT/RLHF), additional training operators

Pro vs Flash specifications

SpecopenPangu 2.0 ProopenPangu 2.0 Flash
Total parameters505B92B
Active parameters18B6B
Sparsity ratio~28:1~15:1
Context window512K tokens512K tokens
StatusJuly 2026 (planned)Live since June 30
LicenseopenPangu License (permissive commercial use, royalty-free)

512K context is roughly eight full-length novels in a single prompt — entire codebases, contract bundles with appendices, or hours of transcript data without chunking.

Seven open-source components

Most labs release weights and inference code. Huawei is shipping the full stack in seven parts:

#ComponentStatus
1Model architectureReleased June 30
2Model weights (Flash)Released June 30
3Technical reportReleased June 30
4Inference code + training operatorsReleased June 30
5Model weights (Pro)Planned July 2026
6Pre-training codePlanned H2 2026
7Post-training code (SFT/RLHF)Planned H2 2026

Items 6 and 7 are the differentiator. When they ship, researchers can reproduce the full training pipeline on Ascend hardware and enterprises can run domain-specific pre-training from scratch — not just fine-tune released weights.

Architecture: mHC, Muon, ModAttn, DSA+SWA

openPangu 2.0 is a Mixture-of-Experts architecture with four headline innovations:

  • mHC (Multi-Head Combinatorial) routing: Improved expert routing that reduces load imbalance — a chronic MoE failure mode where a few experts absorb most tokens.
  • Muon optimizer: A second-order momentum optimizer from Microsoft research, adapted for large-scale training stability on Ascend.
  • ModAttn (Modular Attention): Modular attention blocks designed for ultra-long context without proportional compute blowup.
  • DSA+SWA ultra-sparse attention (Flash): Flash-only sparse attention achieving the ~15:1 sparsity ratio. At 6B active out of 92B total, each token activates roughly 6.5% of parameters — inference cost closer to a dense 6B model with access to a 92B knowledge pool.

If you are evaluating openPangu 2.0 against NVIDIA-trained alternatives, these six misconceptions will skew your decision:

  1. 01

    "Open weights" equals "open training": DeepSeek, Qwen, and Kimi ship weights and inference. openPangu 2.0 will also ship pre-training and post-training code — a qualitatively different transparency level.

  2. 02

    Context window is a checkbox: 512K is not marginally better than 128K. It is 4x DeepSeek and Qwen, 2x Kimi — and changes what you can fit in a single Agent context without retrieval hacks.

  3. 03

    Active parameters predict everything: Flash activates 6B per token. Pro activates 18B. DeepSeek V4 Pro activates ~200B. Raw reasoning depth still favors higher activation counts for the hardest tasks.

  4. 04

    NVIDIA-free means CUDA-free inference only: The training pipeline also ran on Ascend 910B. Sovereignty covers the full stack — not just deployment on alternative hardware.

  5. 05

    MoE train/inference mismatch is inevitable: openPangu reports >99% train/inference consistency — a metric that matters when routing and expert activation patterns diverge in production.

  6. 06

    Benchmark leaderboards are current: Independent third-party benchmarks were not available as of July 1, 2026. Architecture-based expectations place Pro in the dense-70B class for general tasks, with clear long-context advantages. Treat leaderboard claims as provisional until Hugging Face Open LLM Leaderboard and LiveBench publish results.

02

Ascend 910B training: 2x throughput and >99% consistency

Every other frontier open-source model — DeepSeek V4 Pro, Qwen 3.7 Max, Kimi K2.7, Llama 4 — was trained on NVIDIA hardware. openPangu 2.0 is the outlier: the entire 505B MoE pre-training run completed on Huawei Ascend 910B NPUs.

Huawei developed the Ascend 910B under US export controls that restricted China's access to A100 and H100 clusters. openPangu 2.0 is evidence that a frontier-quality training pipeline is now viable without CUDA.

Reported training and inference metrics

MetricResultWhy it matters
Single-card throughput2x vs mainstream open-source models on AscendNative Ascend affinity — models not designed for NPU run slower
Hypernode training efficiency+30%Multi-node scaling on Ascend clusters
512K long-sequence training+50% throughputLong-context training is usually the bottleneck; this is the headline feature
Train/inference consistency>99%MoE models often degrade when expert routing diverges at inference time
Inference latency1.2x better than comparable modelsLower time-to-first-token on Ascend-optimized paths
Flash-Int8 quantization40% less memory, <10% quality lossW4A8 quantization for constrained deployments

Software stack: CANN + torch_npu

Ascend runs on CANN (Huawei's CUDA-equivalent stack, open-sourced in late 2025) and torch_npu — a PyTorch adapter that lets standard PyTorch code target Ascend NPUs with a single import. Edge deployment adds a 30B embedded variant with 50% faster inference and 20% lower memory, running offline on Kirin mobile chips.

"In my dictionary, there is no second place — only first. We will go from number one in China to number one in the world."

Richard Yu's HDC 2026 framing positions openPangu 2.0 as a sovereignty statement as much as a product launch — proof that export controls did not halt frontier-scale AI development.

info

Citable hard numbers: 505B total / 18B active (Pro), 92B total / 6B active (Flash), 512K context on both variants, 2x Ascend throughput, >99% train/inference consistency, seven planned open-source components.

03

openPangu 2.0 vs DeepSeek, Qwen, Kimi, and Llama

English-language coverage of Chinese AI models often stops at DeepSeek. openPangu 2.0 occupies a different niche: sovereign hardware, extreme context, and full-stack openness. Here is how it compares on paper.

Parameter and licensing comparison

ModelTotal paramsActive paramsContextLicenseTraining HWOpen depth
openPangu 2.0 Pro505B18B512KopenPanguAscend NPUFull stack (7 components)
openPangu 2.0 Flash92B6B512KopenPanguAscend NPUFull stack (7 components)
DeepSeek V4 Pro1.6T~200B128KMITNVIDIAWeights + inference
Qwen 3.7 Max~400B+varies128KApache 2.0NVIDIAWeights + inference + partial training
Kimi K2.71T32B256KModified MITNVIDIAWeights + inference
Llama 4 405B405B128KLlama LicenseNVIDIAWeights + inference

Capability matrix (architecture-based)

DimensionopenPangu 2.0 ProDeepSeek V4 ProQwen 3.7 MaxKimi K2.7
Code generationStrongBest in classVery strongVery strong
Complex reasoningGoodBest in classBest in classVery strong
Tool use / AgentsVery strongVery strongVery strongBest in class (MCP ecosystem)
Ultra-long contextLeader (512K)Moderate (128K)Moderate (128K)Strong (256K)
Inference efficiencyLeader on AscendModerateModerateStrong
Sovereign / domestic HWOnly optionLowLowLow
Full training pipeline openLeader (planned)PartialPartialPartial

Selection guide

Your primary needRecommended modelReason
Code generation / complex reasoningDeepSeek V4 Pro~200B active parameters, current open-source leader
Agent / multi-tool orchestrationKimi K2.7MCP ecosystem maturity
Documents >256K tokensopenPangu 2.0 Pro512K context — no chunking required
Sovereign AI / no NVIDIA dependencyopenPangu 2.0Only frontier model trained entirely off NVIDIA
Ascend / Huawei Cloud deploymentopenPangu 2.0Native optimization, 2x throughput on Ascend
Edge / mobile (HarmonyOS)openPangu Embedded (30B)Kirin chip offline inference
Low-cost local inferenceopenPangu 2.0 Flash6B active, ~96GB unified memory
warning

Benchmark disclaimer: Independent third-party benchmarks were not available as of July 1, 2026. Capability ratings above are architecture-based inferences. This article will be updated when Hugging Face Open LLM Leaderboard, LiveBench, or peer-reviewed evaluations publish verified scores.

04

How to deploy openPangu 2.0: ModelArts API, GitCode, and hardware requirements

Three paths: cloud API for zero hardware, GitCode self-deploy on Ascend, or PyTorch + torch_npu for custom integration. Follow these steps in order.

  1. 01

    Choose your tier: Flash is live now for low-latency, high-concurrency workloads. Pro ships in July 2026 for maximum long-document quality. Flash-Int8 cuts memory by 40% with under 10% quality loss.

  2. 02

    Cloud API (fastest): Register at huaweicloud.com, open ModelArts, navigate to AI Gallery, search "openPangu 2.0", subscribe to Flash or Pro, and copy your API endpoint and auth token.

  3. 03

    Call the ModelArts Chat Completions API:

curl -X POST "https://modelarts.${REGION}.myhuaweicloud.com/v1/infers/openpangu-2-flash/chat/completions" \
  -H "Content-Type: application/json" \
  -H "X-Auth-Token: ${TOKEN}" \
  -d '{
    "model": "openpangu-2.0-flash",
    "messages": [
      {"role": "user", "content": "Explain MoE architecture in simple terms"}
    ],
    "max_tokens": 1024,
    "temperature": 0.7
  }'
  1. 04

    Self-deploy from GitCode: Clone from Ascend Tribe. Key repos: openPangu-2.0-Flash, openPangu-2.0-Flash-Int8, openPangu-2.0-Infer, openPangu-2.0-Op (Ascend custom operators).

  2. 05

    Run Flash inference on a single Ascend 910B:

python inference.py \
  --model_path ./openPangu-Flash \
  --device npu:0 \
  --context_length 512000 \
  --precision bf16

For the Int8 quantized build:

python inference.py \
  --model_path ./openPangu-Flash-Int8 \
  --device npu:0 \
  --quantization int8
  1. 06

    Run Pro distributed inference (when weights ship):

python distributed_inference.py \
  --model_path ./openPangu-Pro \
  --num_devices 8 \
  --context_length 512000
  1. 07

    Domain fine-tuning with LoRA:

python finetune.py \
  --model_path ./openPangu-Pro \
  --data_path ./domain_data \
  --output_dir ./fine_tuned_model \
  --method lora \
  --lora_rank 16
  1. 08

    PyTorch + torch_npu integration:

import torch
import torch_npu  # Switch PyTorch backend to Ascend

model = load_openpangu("./openPangu-Flash")
model = model.to("npu:0")

output = model.generate(
    input_ids.to("npu:0"),
    max_new_tokens=512,
    temperature=0.7
)

Hardware requirements

VersionRecommended hardwareMinimum configNotes
Flash (6B active)Single Ascend 910B~96GB unified memoryCommunity tests on large-memory systems
Flash-Int8Single Ascend Atlas A2~48GB VRAMW4A8 quantization, <10% quality loss
Pro (18B active)4+ Ascend 910B cardsMulti-card clusterWeights available July 2026
Embedded (edge)Kirin mobile SoC30B on-device model50% faster inference, 20% less memory vs prior gen

Pair this deployment guide with the OpenRouter June 2026 rankings if you are building a multi-model routing layer alongside Ascend-hosted inference.

05

Strategic significance: sovereign AI, HarmonyOS Agent, and the openPangu License

Geopolitical and industry context

US export controls have restricted China's access to NVIDIA's most advanced GPUs since 2022. Huawei responded by building the Ascend stack — CANN, torch_npu, and now a 505B MoE model trained entirely on domestic silicon. openPangu 2.0 is not an incremental release; it is the first proof that frontier-scale open-source AI can exist outside the CUDA ecosystem.

For enterprises under data-sovereignty, domestic procurement, or "信创" (information technology innovation) compliance requirements, openPangu 2.0 is currently the only frontier open-weight model with zero NVIDIA dependency across training and deployment.

Full-stack open source as ecosystem strategy

Releasing pre-training and post-training code — not just weights — serves three goals:

  • Academic reproducibility: Researchers can study MoE pre-training at 505B scale on documented Ascend infrastructure.
  • Enterprise verticalization: Companies in medical, legal, and finance can run domain-specific pre-training without reverse-engineering a black box.
  • Ascend adoption: Lowering the barrier to Ascend development expands Huawei's AI hardware ecosystem against NVIDIA's CUDA moat.

HarmonyOS Agent era

openPangu 2.0 is the AI backbone for Huawei's broader platform strategy, not a standalone model release:

  • HarmonyOS 7 enters the Agent era with openPangu 2.0 as the native inference engine for on-device and cloud Agent tasks.
  • HarmonyOS Agent Framework 2.0 reports >90% success rate on complex multi-step tasks, powered by openPangu routing.
  • 30B embedded model runs locally on Kirin phones without network access — relevant for privacy-sensitive edge Agents.

openPangu License terms

Released under the openPangu License on GitCode:

  • Commercial use permitted — deploy in production without royalty payments.
  • Royalty-free — no per-token or per-deployment fees to Huawei.
  • Non-exclusive — no lock-in to Huawei Cloud; self-host on your own Ascend cluster.
  • Specific usage terms apply — review the license file in each GitCode repository before redistribution or fine-tune commercialization.

What is coming in H2 2026

Pro weights in July. Pre-training and post-training code in H2. Each release is an opportunity to update routing policies, benchmark scores, and deployment playbooks. Watch GitCode Ascend Tribe and the HDC developer portal for component drops.

warning

Disclaimer: Benchmark and capability assessments in this article are architecture-based inferences as of July 1, 2026. Independent third-party test results will replace provisional ratings when published. Last updated: July 1, 2026.

06

Where NodeMini fits: execution layer for long-context Agents

openPangu 2.0 solves the model and inference layer — especially for 512K document Agents and Ascend-native deployments. It does not solve the execution environment for developers who also run iOS CI, CLI coding Agents, and multi-hour SSH sessions alongside model API calls.

A personal MacBook thermal-throttles during 12-hour Agent loops. A cheap Linux VPS cannot run xcodebuild, notarytool, or Keychain-isolated signing. Chasing sovereign inference on Ascend while running Agents on unstable local hardware optimizes the wrong layer.

Teams building long-context document Agents on openPangu — or routing between openPangu, DeepSeek, and Claude via OpenRouter — need a fixed SSH execution node for the workloads models cannot run: iOS builds, notarization, and persistent CLI Agent sessions.

NodeMini Mac Mini cloud rental is the execution-layer complement: swap ModelArts endpoints or GitCode deployments in your gateway while SSH nodes and CI labels stay fixed. When Pro weights ship in July or pre-training code drops in H2, you update inference config — not rebuild a laptop environment.

For stable iOS CI/CD and AI Agent automation alongside openPangu or multi-model stacks, NodeMini's Mac Mini cloud rental is usually the better fit than personal hardware or API-only setups. See the help center for access setup and rental rates for M4 Pro and M4 Max tiers.

FAQ

Frequently asked questions

Yes. The entire training pipeline ran on Ascend 910B NPUs with no A100 or H100 involvement. Inference uses CANN and torch_npu for native Ascend deployment. Flash weights are live on GitCode for self-hosted Ascend inference today.

DeepSeek V4 Pro leads on coding and complex reasoning with ~200B active parameters versus openPangu Pro's 18B. openPangu 2.0 wins on 512K context (4x most rivals), 2x Ascend throughput, planned full training-pipeline open source, and sovereign AI compliance with zero NVIDIA dependency. Route by task: DeepSeek for hardest reasoning, openPangu for ultra-long documents and Ascend environments.

Flash (6B active): single Ascend 910B or roughly 96GB unified memory. Flash-Int8: about 48GB VRAM on Ascend Atlas A2. Pro (18B active): multi-card Ascend 910B cluster when weights ship in July 2026. Compare infrastructure costs against NodeMini rental rates if you are splitting inference (Ascend) and Agent execution (cloud Mac).

Use Huawei Cloud ModelArts for API access without hardware, or self-deploy from GitCode on Ascend NPUs. Pair 512K document Agents with a dedicated cloud Mac for iOS CI and long-session CLI Agents. Start with the help center for provisioning and the six-step deployment guide in Section 04 above.