Can openPangu 2.0 run without NVIDIA GPUs?

Yes. openPangu 2.0 was trained entirely on Huawei Ascend 910B NPUs with no A100 or H100 in the pipeline. Inference is optimized for Ascend via CANN and torch_npu, and Flash weights are available on GitCode for self-hosted Ascend deployment.

What hardware do I need to self-host openPangu 2.0 Flash?

Flash (6B active) recommends a single Ascend 910B or roughly 96GB unified memory. Flash-Int8 needs about 48GB VRAM on Ascend Atlas A2. Pro (18B active) requires a multi-card Ascend 910B cluster when weights ship in July 2026.

How do I integrate openPangu 2.0 into an Agent development stack?

Use Huawei Cloud ModelArts for API access without hardware, or self-deploy from GitCode Ascend Tribe on Ascend NPUs. Pair long-context document Agents with a dedicated cloud Mac for iOS CI and CLI Agent sessions. See NodeMini rental rates and help center for execution-layer setup.

Huawei's openPangu 2.0 Is Now Open-Source
Trained without a single NVIDIA GPU — 512K context, full training pipeline open source

Q: How does openPangu 2.0 compare to DeepSeek V4 Pro?

DeepSeek V4 Pro leads on raw coding and complex reasoning with roughly 200B active parameters versus openPangu Pro's 18B. openPangu 2.0 wins on 512K context (4x most rivals), Ascend-native 2x throughput, full training-pipeline open source, and sovereign AI compliance with no NVIDIA dependency.

On June 30, 2026, Huawei delivered on the promise made at HDC 2026: openPangu-2.0-Flash weights, inference code, and training operators went live on GitCode. This is the first frontier-scale open-source LLM trained entirely on Huawei Ascend 910B NPUs — no A100, no H100, no NVIDIA silicon anywhere in the pipeline. Two MoE variants share a 512K context window: Pro at 505B total / 18B active and Flash at 92B total / 6B active. Huawei plans to open-source seven components including pre-training and post-training code — a level of transparency almost unheard of at this scale. This guide covers the release timeline, architecture innovations (mHC, Muon, ModAttn, DSA+SWA), Ascend training metrics, competitor comparison tables, ModelArts API and GitCode deployment steps, hardware requirements, sovereign AI implications, and the openPangu License.

openPangu 2.0 release timeline, specs, and seven-component roadmap

Richard Yu unveiled openPangu 2.0 at HDC 2026 in Dongguan on June 12, 2026. Flash weights shipped eighteen days later. Pro follows in July. The rest of the training stack rolls out through H2 2026.

Release timeline

Date	Milestone
2026-06-12	HDC 2026 keynote: openPangu 2.0 officially announced
2026-06-30	Flash weights, base inference code, and training operators live on GitCode
2026-07 (planned)	Pro model weights and inference code
H2 2026 (planned)	Pre-training code, post-training code (SFT/RLHF), additional training operators

Pro vs Flash specifications

Spec	openPangu 2.0 Pro	openPangu 2.0 Flash
Total parameters	505B	92B
Active parameters	18B	6B
Sparsity ratio	~28:1	~15:1
Context window	512K tokens	512K tokens
Status	July 2026 (planned)	Live since June 30
License	openPangu License (permissive commercial use, royalty-free)

512K context is roughly eight full-length novels in a single prompt — entire codebases, contract bundles with appendices, or hours of transcript data without chunking.

Seven open-source components

Most labs release weights and inference code. Huawei is shipping the full stack in seven parts:

#	Component	Status
1	Model architecture	Released June 30
2	Model weights (Flash)	Released June 30
3	Technical report	Released June 30
4	Inference code + training operators	Released June 30
5	Model weights (Pro)	Planned July 2026
6	Pre-training code	Planned H2 2026
7	Post-training code (SFT/RLHF)	Planned H2 2026

Items 6 and 7 are the differentiator. When they ship, researchers can reproduce the full training pipeline on Ascend hardware and enterprises can run domain-specific pre-training from scratch — not just fine-tune released weights.

Architecture: mHC, Muon, ModAttn, DSA+SWA

openPangu 2.0 is a Mixture-of-Experts architecture with four headline innovations:

mHC (Multi-Head Combinatorial) routing: Improved expert routing that reduces load imbalance — a chronic MoE failure mode where a few experts absorb most tokens.
Muon optimizer: A second-order momentum optimizer from Microsoft research, adapted for large-scale training stability on Ascend.
ModAttn (Modular Attention): Modular attention blocks designed for ultra-long context without proportional compute blowup.
DSA+SWA ultra-sparse attention (Flash): Flash-only sparse attention achieving the ~15:1 sparsity ratio. At 6B active out of 92B total, each token activates roughly 6.5% of parameters — inference cost closer to a dense 6B model with access to a 92B knowledge pool.

If you are evaluating openPangu 2.0 against NVIDIA-trained alternatives, these six misconceptions will skew your decision:

01
"Open weights" equals "open training": DeepSeek, Qwen, and Kimi ship weights and inference. openPangu 2.0 will also ship pre-training and post-training code — a qualitatively different transparency level.
02
Context window is a checkbox: 512K is not marginally better than 128K. It is 4x DeepSeek and Qwen, 2x Kimi — and changes what you can fit in a single Agent context without retrieval hacks.
03
Active parameters predict everything: Flash activates 6B per token. Pro activates 18B. DeepSeek V4 Pro activates ~200B. Raw reasoning depth still favors higher activation counts for the hardest tasks.
04
NVIDIA-free means CUDA-free inference only: The training pipeline also ran on Ascend 910B. Sovereignty covers the full stack — not just deployment on alternative hardware.
05
MoE train/inference mismatch is inevitable: openPangu reports >99% train/inference consistency — a metric that matters when routing and expert activation patterns diverge in production.
06
Benchmark leaderboards are current: Independent third-party benchmarks were not available as of July 1, 2026. Architecture-based expectations place Pro in the dense-70B class for general tasks, with clear long-context advantages. Treat leaderboard claims as provisional until Hugging Face Open LLM Leaderboard and LiveBench publish results.

Ascend 910B training: 2x throughput and >99% consistency

Every other frontier open-source model — DeepSeek V4 Pro, Qwen 3.7 Max, Kimi K2.7, Llama 4 — was trained on NVIDIA hardware. openPangu 2.0 is the outlier: the entire 505B MoE pre-training run completed on Huawei Ascend 910B NPUs.

Huawei developed the Ascend 910B under US export controls that restricted China's access to A100 and H100 clusters. openPangu 2.0 is evidence that a frontier-quality training pipeline is now viable without CUDA.

Reported training and inference metrics

Metric	Result	Why it matters
Single-card throughput	2x vs mainstream open-source models on Ascend	Native Ascend affinity — models not designed for NPU run slower
Hypernode training efficiency	+30%	Multi-node scaling on Ascend clusters
512K long-sequence training	+50% throughput	Long-context training is usually the bottleneck; this is the headline feature
Train/inference consistency	>99%	MoE models often degrade when expert routing diverges at inference time
Inference latency	1.2x better than comparable models	Lower time-to-first-token on Ascend-optimized paths
Flash-Int8 quantization	40% less memory, <10% quality loss	W4A8 quantization for constrained deployments

Software stack: CANN + torch_npu

Ascend runs on CANN (Huawei's CUDA-equivalent stack, open-sourced in late 2025) and torch_npu — a PyTorch adapter that lets standard PyTorch code target Ascend NPUs with a single import. Edge deployment adds a 30B embedded variant with 50% faster inference and 20% lower memory, running offline on Kirin mobile chips.

"In my dictionary, there is no second place — only first. We will go from number one in China to number one in the world."

Richard Yu's HDC 2026 framing positions openPangu 2.0 as a sovereignty statement as much as a product launch — proof that export controls did not halt frontier-scale AI development.

info

Citable hard numbers: 505B total / 18B active (Pro), 92B total / 6B active (Flash), 512K context on both variants, 2x Ascend throughput, >99% train/inference consistency, seven planned open-source components.

openPangu 2.0 vs DeepSeek, Qwen, Kimi, and Llama

English-language coverage of Chinese AI models often stops at DeepSeek. openPangu 2.0 occupies a different niche: sovereign hardware, extreme context, and full-stack openness. Here is how it compares on paper.

Parameter and licensing comparison

Model	Total params	Active params	Context	License	Training HW	Open depth
openPangu 2.0 Pro	505B	18B	512K	openPangu	Ascend NPU	Full stack (7 components)
openPangu 2.0 Flash	92B	6B	512K	openPangu	Ascend NPU	Full stack (7 components)
DeepSeek V4 Pro	1.6T	~200B	128K	MIT	NVIDIA	Weights + inference
Qwen 3.7 Max	~400B+	varies	128K	Apache 2.0	NVIDIA	Weights + inference + partial training
Kimi K2.7	1T	32B	256K	Modified MIT	NVIDIA	Weights + inference
Llama 4 405B	405B	—	128K	Llama License	NVIDIA	Weights + inference

Capability matrix (architecture-based)

Dimension	openPangu 2.0 Pro	DeepSeek V4 Pro	Qwen 3.7 Max	Kimi K2.7
Code generation	Strong	Best in class	Very strong	Very strong
Complex reasoning	Good	Best in class	Best in class	Very strong
Tool use / Agents	Very strong	Very strong	Very strong	Best in class (MCP ecosystem)
Ultra-long context	Leader (512K)	Moderate (128K)	Moderate (128K)	Strong (256K)
Inference efficiency	Leader on Ascend	Moderate	Moderate	Strong
Sovereign / domestic HW	Only option	Low	Low	Low
Full training pipeline open	Leader (planned)	Partial	Partial	Partial

Selection guide

Your primary need	Recommended model	Reason
Code generation / complex reasoning	DeepSeek V4 Pro	~200B active parameters, current open-source leader
Agent / multi-tool orchestration	Kimi K2.7	MCP ecosystem maturity
Documents >256K tokens	openPangu 2.0 Pro	512K context — no chunking required
Sovereign AI / no NVIDIA dependency	openPangu 2.0	Only frontier model trained entirely off NVIDIA
Ascend / Huawei Cloud deployment	openPangu 2.0	Native optimization, 2x throughput on Ascend
Edge / mobile (HarmonyOS)	openPangu Embedded (30B)	Kirin chip offline inference
Low-cost local inference	openPangu 2.0 Flash	6B active, ~96GB unified memory

warning

Benchmark disclaimer: Independent third-party benchmarks were not available as of July 1, 2026. Capability ratings above are architecture-based inferences. This article will be updated when Hugging Face Open LLM Leaderboard, LiveBench, or peer-reviewed evaluations publish verified scores.

How to deploy openPangu 2.0: ModelArts API, GitCode, and hardware requirements

Three paths: cloud API for zero hardware, GitCode self-deploy on Ascend, or PyTorch + torch_npu for custom integration. Follow these steps in order.

01
Choose your tier: Flash is live now for low-latency, high-concurrency workloads. Pro ships in July 2026 for maximum long-document quality. Flash-Int8 cuts memory by 40% with under 10% quality loss.
02
Cloud API (fastest): Register at huaweicloud.com, open ModelArts, navigate to AI Gallery, search "openPangu 2.0", subscribe to Flash or Pro, and copy your API endpoint and auth token.
03
Call the ModelArts Chat Completions API:

curl -X POST "https://modelarts.${REGION}.myhuaweicloud.com/v1/infers/openpangu-2-flash/chat/completions" \
  -H "Content-Type: application/json" \
  -H "X-Auth-Token: ${TOKEN}" \
  -d '{
    "model": "openpangu-2.0-flash",
    "messages": [
      {"role": "user", "content": "Explain MoE architecture in simple terms"}
    ],
    "max_tokens": 1024,
    "temperature": 0.7
  }'

04
Self-deploy from GitCode: Clone from Ascend Tribe. Key repos: openPangu-2.0-Flash, openPangu-2.0-Flash-Int8, openPangu-2.0-Infer, openPangu-2.0-Op (Ascend custom operators).
05
Run Flash inference on a single Ascend 910B:

python inference.py \
  --model_path ./openPangu-Flash \
  --device npu:0 \
  --context_length 512000 \
  --precision bf16

For the Int8 quantized build:

python inference.py \
  --model_path ./openPangu-Flash-Int8 \
  --device npu:0 \
  --quantization int8

06
Run Pro distributed inference (when weights ship):

python distributed_inference.py \
  --model_path ./openPangu-Pro \
  --num_devices 8 \
  --context_length 512000

07
Domain fine-tuning with LoRA:

python finetune.py \
  --model_path ./openPangu-Pro \
  --data_path ./domain_data \
  --output_dir ./fine_tuned_model \
  --method lora \
  --lora_rank 16

08
PyTorch + torch_npu integration:

import torch
import torch_npu  # Switch PyTorch backend to Ascend

model = load_openpangu("./openPangu-Flash")
model = model.to("npu:0")

output = model.generate(
    input_ids.to("npu:0"),
    max_new_tokens=512,
    temperature=0.7
)

Hardware requirements

Version	Recommended hardware	Minimum config	Notes
Flash (6B active)	Single Ascend 910B	~96GB unified memory	Community tests on large-memory systems
Flash-Int8	Single Ascend Atlas A2	~48GB VRAM	W4A8 quantization, <10% quality loss
Pro (18B active)	4+ Ascend 910B cards	Multi-card cluster	Weights available July 2026
Embedded (edge)	Kirin mobile SoC	30B on-device model	50% faster inference, 20% less memory vs prior gen

Pair this deployment guide with the OpenRouter June 2026 rankings if you are building a multi-model routing layer alongside Ascend-hosted inference.

Strategic significance: sovereign AI, HarmonyOS Agent, and the openPangu License

Geopolitical and industry context

US export controls have restricted China's access to NVIDIA's most advanced GPUs since 2022. Huawei responded by building the Ascend stack — CANN, torch_npu, and now a 505B MoE model trained entirely on domestic silicon. openPangu 2.0 is not an incremental release; it is the first proof that frontier-scale open-source AI can exist outside the CUDA ecosystem.

For enterprises under data-sovereignty, domestic procurement, or "信创" (information technology innovation) compliance requirements, openPangu 2.0 is currently the only frontier open-weight model with zero NVIDIA dependency across training and deployment.

Full-stack open source as ecosystem strategy

Releasing pre-training and post-training code — not just weights — serves three goals:

Academic reproducibility: Researchers can study MoE pre-training at 505B scale on documented Ascend infrastructure.
Enterprise verticalization: Companies in medical, legal, and finance can run domain-specific pre-training without reverse-engineering a black box.
Ascend adoption: Lowering the barrier to Ascend development expands Huawei's AI hardware ecosystem against NVIDIA's CUDA moat.

HarmonyOS Agent era

openPangu 2.0 is the AI backbone for Huawei's broader platform strategy, not a standalone model release:

HarmonyOS 7 enters the Agent era with openPangu 2.0 as the native inference engine for on-device and cloud Agent tasks.
HarmonyOS Agent Framework 2.0 reports >90% success rate on complex multi-step tasks, powered by openPangu routing.
30B embedded model runs locally on Kirin phones without network access — relevant for privacy-sensitive edge Agents.

openPangu License terms

Released under the openPangu License on GitCode:

Commercial use permitted — deploy in production without royalty payments.
Royalty-free — no per-token or per-deployment fees to Huawei.
Non-exclusive — no lock-in to Huawei Cloud; self-host on your own Ascend cluster.
Specific usage terms apply — review the license file in each GitCode repository before redistribution or fine-tune commercialization.

What is coming in H2 2026

Pro weights in July. Pre-training and post-training code in H2. Each release is an opportunity to update routing policies, benchmark scores, and deployment playbooks. Watch GitCode Ascend Tribe and the HDC developer portal for component drops.

warning

Disclaimer: Benchmark and capability assessments in this article are architecture-based inferences as of July 1, 2026. Independent third-party test results will replace provisional ratings when published. Last updated: July 1, 2026.

Where NodeMini fits: execution layer for long-context Agents

openPangu 2.0 solves the model and inference layer — especially for 512K document Agents and Ascend-native deployments. It does not solve the execution environment for developers who also run iOS CI, CLI coding Agents, and multi-hour SSH sessions alongside model API calls.

A personal MacBook thermal-throttles during 12-hour Agent loops. A cheap Linux VPS cannot run xcodebuild, notarytool, or Keychain-isolated signing. Chasing sovereign inference on Ascend while running Agents on unstable local hardware optimizes the wrong layer.

Teams building long-context document Agents on openPangu — or routing between openPangu, DeepSeek, and Claude via OpenRouter — need a fixed SSH execution node for the workloads models cannot run: iOS builds, notarization, and persistent CLI Agent sessions.

NodeMini Mac Mini cloud rental is the execution-layer complement: swap ModelArts endpoints or GitCode deployments in your gateway while SSH nodes and CI labels stay fixed. When Pro weights ship in July or pre-training code drops in H2, you update inference config — not rebuild a laptop environment.

For stable iOS CI/CD and AI Agent automation alongside openPangu or multi-model stacks, NodeMini's Mac Mini cloud rental is usually the better fit than personal hardware or API-only setups. See the help center for access setup and rental rates for M4 Pro and M4 Max tiers.

FAQ

Frequently asked questions

Yes. The entire training pipeline ran on Ascend 910B NPUs with no A100 or H100 involvement. Inference uses CANN and torch_npu for native Ascend deployment. Flash weights are live on GitCode for self-hosted Ascend inference today.

DeepSeek V4 Pro leads on coding and complex reasoning with ~200B active parameters versus openPangu Pro's 18B. openPangu 2.0 wins on 512K context (4x most rivals), 2x Ascend throughput, planned full training-pipeline open source, and sovereign AI compliance with zero NVIDIA dependency. Route by task: DeepSeek for hardest reasoning, openPangu for ultra-long documents and Ascend environments.

Flash (6B active): single Ascend 910B or roughly 96GB unified memory. Flash-Int8: about 48GB VRAM on Ascend Atlas A2. Pro (18B active): multi-card Ascend 910B cluster when weights ship in July 2026. Compare infrastructure costs against NodeMini rental rates if you are splitting inference (Ascend) and Agent execution (cloud Mac).

Use Huawei Cloud ModelArts for API access without hardware, or self-deploy from GitCode on Ascend NPUs. Pair 512K document Agents with a dedicated cloud Mac for iOS CI and long-session CLI Agents. Start with the help center for provisioning and the six-step deployment guide in Section 04 above.

Huawei's openPangu 2.0 Is Now Open-Source Trained without a single NVIDIA GPU — 512K context, full training pipeline open source

openPangu 2.0 release timeline, specs, and seven-component roadmap

Release timeline

Pro vs Flash specifications

Seven open-source components

Architecture: mHC, Muon, ModAttn, DSA+SWA

Ascend 910B training: 2x throughput and >99% consistency

Reported training and inference metrics

Software stack: CANN + torch_npu

openPangu 2.0 vs DeepSeek, Qwen, Kimi, and Llama

Parameter and licensing comparison

Capability matrix (architecture-based)

Selection guide

How to deploy openPangu 2.0: ModelArts API, GitCode, and hardware requirements

Hardware requirements

Strategic significance: sovereign AI, HarmonyOS Agent, and the openPangu License

Geopolitical and industry context

Full-stack open source as ecosystem strategy

HarmonyOS Agent era

openPangu License terms

What is coming in H2 2026

Where NodeMini fits: execution layer for long-context Agents

Frequently asked questions

Huawei's openPangu 2.0 Is Now Open-Source
Trained without a single NVIDIA GPU — 512K context, full training pipeline open source