Qwen3-30B-A3B (MoE)

Mixture-of-Experts model — 30B total parameters, 3B active (128 experts), BF16. Validated on Ascend 910B3 at TP=2 and TP=4 (with expert parallel) using both vLLM-Ascend and MindIE.

Model identity

FieldValue
PublisherQwen (Alibaba)
ArchitectureQwen3 MoE-A3B — 30B total / 3B active (128 experts)
Parameters30B (3B activated)
Native dtypeBF16
Hugging Facehttps://huggingface.co/Qwen/Qwen3-30B-A3B
ModelScopehttps://www.modelscope.cn/models/Qwen/Qwen3-30B-A3B

Validated hardware × stack

PlatformEngineVersionStatus
Ascend 910B3 × 2 / × 4 (TP=2/TP=4 + EP)MindIE2.2.RC1 (ATB, auto BF16)✅ 4-scenario open-loop perf
Ascend 910B3 × 2 / × 4 (TP=2/TP=4 + EP)vLLM-Ascendv0.18.0-openeuler (EP + --enforce-eager)✅ 4-scenario open-loop perf
NOTE

This model has also been validated as W8A8 with multi-node Prefill/Decode (PD) disaggregation on 910B4 (cross-node KV transfer) — a separate, more complex topology. The recipes on this page are the single-node aggregated deployments.

Deploy

Self-contained ServingRuntime + InferenceService YAMLs:

ConfigFile
MindIE, TP=2 (rate-1 chat ITL P90 30.8 ms)qwen3-30b-a3b-mindie-tp2.yaml
MindIE, TP=4qwen3-30b-a3b-mindie-tp4.yaml
vLLM-Ascend, TP=4qwen3-30b-a3b-vllm-ascend-tp4.yaml
base=https://raw.githubusercontent.com/alauda/aml-docs/master/docs/en/inference_guide/assets/qwen3-30b-a3b
# edit namespace / image tag / storageUri first, then:
kubectl apply -f $base/qwen3-30b-a3b-mindie-tp2.yaml
WARNING

MindIE requires root and a writable model volume. Keep serving.kserve.io/readonly: "false" on the InferenceService and the root securityContext in the asset. See the Modelcar permission modes in Extend Inference Runtimes.

Benchmark results

Open-loop per-replica, replica=1. Saturation capacity (RPS/replica) by workload (< 1 = the workload does not sustain one request/second/replica on this hardware/stack).

Workload (prompt/output)MindIE TP=2MindIE TP=4vLLM TP=2vLLM TP=4
Chat 512/2565.906.482.682.69
Code 1024/10241.161.710.130.12
RAG 4096/5121.071.690.661.24
Long RAG 10240/15360.020.1400

Rate-1 latency snapshot, Chat workload (TP=2):

EngineAchieved RPSTTFT p90 / mean (ms)ITL p90 / mean (ms)TPS mean (tok/s)
MindIE0.97115 / 10430.8 / 29.0250
vLLM-Ascend0.83587 / 510195.2 / 193.2215

Full open-loop data (all 22 columns)

Every rate level (1–9) × all four workloads × both engines × TP=2/TP=4, with TTFT / E2E / ITL / TPS at p90 / p95 / p99 / mean. Expand each section:

Full 22-column open-loop sweep — 910B3 × 2 · Qwen3-30B-A3B (MoE) · vllm-ascend v0.18.0-openeuler (EP+eager)

Chat 512/256 (capacity < 1 RPS/replica) — units: TTFT / E2E / ITL in ms, TPS in tok/s. * = past saturation (achieved RPS < target rate).

ratereplicaTotal RPSRPS/repMean InMean OutTTFT p90TTFT p95TTFT p99TTFT meanE2E p90E2E p95E2E p99E2E meanITL p90ITL p95ITL p99ITL meanTPS p90TPS p95TPS p99TPS mean
1*11152025658760663351050,26350,48650,62449,771195.2195.8196.4193.2446.9619.71,192.2214.7
2*12252025659460863851350,27150,33950,46049,857195.1195.3195.6193.5918.01,279.52,449.9426.7
3*13352025688390695472175,37675,44975,56872,377292.4292.8293.1281.01,220.71,692.63,160.7576.6
4*14452025623,82623,92724,0599,84999,24099,33099,43383,105295.7295.9296.5287.31,583.42,290.74,445.5686.3
5*15552025651,64051,93259,93724,236127,980128,316136,01998,731299.8300.0300.3292.11,504.42,163.14,169.3683.0
6*16652025633,55633,62433,73116,269107,807107,881107,96290,077293.3293.6293.8289.41,539.02,223.94,341.9680.3
7*17752025684689493172074,54574,57574,68673,668289.1289.2289.4286.1994.41,627.63,777.0362.5
8*18852025687891394374674,74974,78474,87874,403289.8289.9290.0288.81,559.82,730.76,610.4615.5
9*19952025686088291974173,84173,89773,95973,337286.4286.6286.7284.7982.31,901.34,888.5362.7

Code 1024/1024 (capacity < 1 RPS/replica) — units: TTFT / E2E / ITL in ms, TPS in tok/s. * = past saturation (achieved RPS < target rate).

ratereplicaTotal RPSRPS/repMean InMean OutTTFT p90TTFT p95TTFT p99TTFT meanE2E p90E2E p95E2E p99E2E meanITL p90ITL p95ITL p99ITL meanTPS p90TPS p95TPS p99TPS mean
1*11110321024680690691585259,677260,236260,690253,700253.2253.9254.2247.4254.7341.6639.3133.9
2*12200000000000.00.00.00.00.00.00.00.0
3*13300000000000.00.00.00.00.00.00.00.0
4*14400000000000.00.00.00.00.00.00.00.0
5*15500000000000.00.00.00.00.00.00.00.0
6*16600000000000.00.00.00.00.00.00.00.0
7*17700000000000.00.00.00.00.00.00.00.0
8*18800000000000.00.00.00.00.00.00.00.0
9*19900000000000.00.00.00.00.00.00.00.0

RAG 4096/512 (capacity < 1 RPS/replica) — units: TTFT / E2E / ITL in ms, TPS in tok/s. * = past saturation (achieved RPS < target rate).

ratereplicaTotal RPSRPS/repMean InMean OutTTFT p90TTFT p95TTFT p99TTFT meanE2E p90E2E p95E2E p99E2E meanITL p90ITL p95ITL p99ITL meanTPS p90TPS p95TPS p99TPS mean
1*11141045121,1141,1501,2641,003113,578113,639113,723112,403220.1220.3220.5218.0668.0914.81,710.9319.6
2*12241045121,0951,1241,164988113,069113,135121,106112,836219.3219.4235.0218.9751.11,029.51,933.8344.8
3*13341045121,3271,3271,3271,118111,138111,138111,138110,866214.9214.9214.9214.844.959.281.227.4
4*14400000000000.00.00.00.00.00.00.00.0
5*15500000000000.00.00.00.00.00.00.00.0
6*16600000000000.00.00.00.00.00.00.00.0
7*17700000000000.00.00.00.00.00.00.00.0
8*18800000000000.00.00.00.00.00.00.00.0
9*19900000000000.00.00.00.00.00.00.00.0

Long RAG 10240/1536 (capacity < 1 RPS/replica) — capacity 0: all 9 rate levels errored (requests queue past the TTFT timeout). No sustained throughput.

Full 22-column open-loop sweep — 910B3 × 2 · Qwen3-30B-A3B (MoE) · MindIE 2.2.RC1

Chat 512/256 — units: TTFT / E2E / ITL in ms, TPS in tok/s. * = past saturation (achieved RPS < target rate).

ratereplicaTotal RPSRPS/repMean InMean OutTTFT p90TTFT p95TTFT p99TTFT meanE2E p90E2E p95E2E p99E2E meanITL p90ITL p95ITL p99ITL meanTPS p90TPS p95TPS p99TPS mean
11115202561151212111047,9878,3218,7407,49330.832.233.829.0433.4567.41,024.8249.7
212252025612012212810611,18911,35511,86810,77543.444.146.141.8922.21,237.32,259.9494.5
313352025612813113811014,93715,00315,15414,68058.158.459.057.11,442.31,962.73,560.5730.2
414452025614014214811820,79720,81120,90720,46381.181.181.579.82,034.62,826.45,200.6954.4
515552025615015516112328,81628,85129,00627,948112.5112.6113.2109.12,300.83,093.15,377.31,159.8
6*16652025616617118213383,25483,67484,37658,071326.1327.5330.3227.22,216.92,962.15,130.61,109.9
7*1775202561,5802,2163,41758273,92474,66776,62368,379288.5292.3300.0265.92,809.34,148.79,104.91,366.0
8*1885202562,2213,0514,03586572,61273,01773,60667,506284.3285.8288.0261.33,259.05,053.411,726.91,504.1
9*1995202562,4803,3765,08393474,59874,97975,38168,848292.1293.5295.1266.33,287.15,053.411,275.01,508.0

Code 1024/1024 (capacity < 1 RPS/replica) — units: TTFT / E2E / ITL in ms, TPS in tok/s. * = past saturation (achieved RPS < target rate).

ratereplicaTotal RPSRPS/repMean InMean OutTTFT p90TTFT p95TTFT p99TTFT meanE2E p90E2E p95E2E p99E2E meanITL p90ITL p95ITL p99ITL meanTPS p90TPS p95TPS p99TPS mean
1*1111032102417117419514967,94168,01268,27563,18866.366.366.661.61,561.72,088.73,706.9793.2
2*122103210242064461,842219124,495125,095126,046110,642121.3121.7122.1107.92,487.03,344.75,809.31,193.9
3*13310321024196206225156133,332133,843134,927116,007130.1130.7131.7113.22,458.63,338.65,847.41,136.6
4*14410321024173179193150117,735117,926118,148113,941114.9115.1115.4111.21,901.32,559.44,476.3941.3
5*15510321024153154154138115,931116,024116,024115,189113.2113.3113.3112.5266.4352.4623.8154.5
6*16600000000000.00.00.00.00.00.00.00.0
7*17700000000000.00.00.00.00.00.00.00.0
8*18800000000000.00.00.00.00.00.00.00.0
9*19900000000000.00.00.00.00.00.00.00.0

RAG 4096/512 — units: TTFT / E2E / ITL in ms, TPS in tok/s. * = past saturation (achieved RPS < target rate).

ratereplicaTotal RPSRPS/repMean InMean OutTTFT p90TTFT p95TTFT p99TTFT meanE2E p90E2E p95E2E p99E2E meanITL p90ITL p95ITL p99ITL meanTPS p90TPS p95TPS p99TPS mean
1111410451234234435332136,93436,97737,01235,55971.671.771.869.0851.81,160.32,138.3452.2
2*122410451236,93237,40746,71518,20994,43094,902104,12575,324114.3114.8115.3111.81,158.41,736.14,211.1556.5
3*133410451235736031,9801,46760,53860,70990,94660,881117.7117.9118.4116.2837.01,159.02,171.0354.9
4*144410451253575675849058,69759,69960,45250,958114.2116.2117.798.81,296.91,773.53,219.0655.3
5*15541045122,5972,7573,0081,71458,12258,92659,52551,933112.5114.4115.998.31,441.32,097.24,241.0665.5
6*16641045124,8175,0145,3662,88358,31658,98459,48653,161112.1114.4115.898.41,443.82,029.23,848.0666.0
7*17741045126,4166,7037,2093,77158,03458,60459,03553,613110.8113.5114.997.51,469.12,130.24,262.5671.1
8*18841045127,6298,0678,5004,45858,53659,04059,41554,673112.1113.4115.798.31,479.22,134.54,104.0666.8
9*19941045128,5579,0189,4764,95157,98758,43258,76554,555110.8112.2114.497.11,481.92,071.33,968.1674.1

Long RAG 10240/1536 (capacity < 1 RPS/replica) — units: TTFT / E2E / ITL in ms, TPS in tok/s. * = past saturation (achieved RPS < target rate).

ratereplicaTotal RPSRPS/repMean InMean OutTTFT p90TTFT p95TTFT p99TTFT meanE2E p90E2E p95E2E p99E2E meanITL p90ITL p95ITL p99ITL meanTPS p90TPS p95TPS p99TPS mean
1*111102481536772772772760119,253119,253119,253119,21177.277.277.277.225.925.925.925.7
2*12200000000000.00.00.00.00.00.00.00.0
3*13300000000000.00.00.00.00.00.00.00.0
4*14400000000000.00.00.00.00.00.00.00.0
5*15500000000000.00.00.00.00.00.00.00.0
6*16600000000000.00.00.00.00.00.00.00.0
7*17700000000000.00.00.00.00.00.00.00.0
8*18800000000000.00.00.00.00.00.00.00.0
9*19900000000000.00.00.00.00.00.00.00.0
Full 22-column open-loop sweep — 910B3 × 4 · Qwen3-30B-A3B (MoE) · vllm-ascend v0.18.0-openeuler (EP+eager)

Chat 512/256 (capacity < 1 RPS/replica) — units: TTFT / E2E / ITL in ms, TPS in tok/s. * = past saturation (achieved RPS < target rate).

ratereplicaTotal RPSRPS/repMean InMean OutTTFT p90TTFT p95TTFT p99TTFT meanE2E p90E2E p95E2E p99E2E meanITL p90ITL p95ITL p99ITL meanTPS p90TPS p95TPS p99TPS mean
1*11152025659861463951050,72450,89251,05050,123196.9197.4198.2194.6436.0601.41,122.7214.3
2*12252025659061064651350,43750,49150,57650,093195.8196.0196.2194.4896.21,267.22,465.8426.2
3*13352025687489794971475,33875,54276,01571,897292.3293.3294.9279.11,247.91,748.03,320.9574.1
4*14452025622,56422,84923,0679,14997,95098,14998,33881,977295.7295.9296.2285.61,553.72,223.94,284.3689.9
5*15552025648,96349,15959,76623,222124,432124,804135,86696,727295.8297.1298.4288.21,557.52,232.24,350.9690.2
6*16652025633,29033,35833,45615,773107,308107,385107,46388,985292.3292.5292.6287.11,495.32,143.24,140.5681.8
7*17752025688790797675374,65574,75874,82674,010289.8289.9290.0287.31,400.92,100.84,204.8590.0
8*18852025690091697076474,42374,45174,49474,127288.5288.6288.6287.71,002.01,650.73,782.1360.9
9*19952025688491596276275,01775,05875,12174,478290.9290.9291.0289.11,536.92,392.65,102.6635.7

Code 1024/1024 (capacity < 1 RPS/replica) — units: TTFT / E2E / ITL in ms, TPS in tok/s. * = past saturation (achieved RPS < target rate).

ratereplicaTotal RPSRPS/repMean InMean OutTTFT p90TTFT p95TTFT p99TTFT meanE2E p90E2E p95E2E p99E2E meanITL p90ITL p95ITL p99ITL meanTPS p90TPS p95TPS p99TPS mean
1*11110321024687716758592261,870262,405262,776256,676255.4256.0256.3250.3235.1316.9584.3123.9
2*12200000000000.00.00.00.00.00.00.00.0
3*13300000000000.00.00.00.00.00.00.00.0
4*14400000000000.00.00.00.00.00.00.00.0
5*15500000000000.00.00.00.00.00.00.00.0
6*16610321024883883883883186,360186,360186,360186,360181.3181.3181.3181.35.55.55.55.5
7*17710321024944944944944186,188186,188186,188186,188181.1181.1181.1181.15.55.55.55.5
8*18800000000000.00.00.00.00.00.00.00.0
9*19900000000000.00.00.00.00.00.00.00.0

RAG 4096/512 (capacity < 1 RPS/replica) — units: TTFT / E2E / ITL in ms, TPS in tok/s. * = past saturation (achieved RPS < target rate).

ratereplicaTotal RPSRPS/repMean InMean OutTTFT p90TTFT p95TTFT p99TTFT meanE2E p90E2E p95E2E p99E2E meanITL p90ITL p95ITL p99ITL meanTPS p90TPS p95TPS p99TPS mean
1*11141045121,1201,1721,3291,010113,572113,636113,755112,093220.3220.3220.5217.4660.7903.21,688.5321.3
2*12241045121,1631,1911,2711,052113,131113,163113,227112,576219.2219.3219.4218.21,391.51,938.93,646.6639.9
3*133410451224,79026,75728,18212,671133,221135,206136,637119,699212.2212.2212.3209.41,296.91,797.43,360.8601.9
4*144410451211,69612,26912,9036,657118,269118,963119,721112,638208.6208.8209.0207.4580.9831.51,664.4285.6
5*15541045123,8324,0424,2542,360108,559108,822109,022106,352204.9205.0205.0203.5121.3187.8401.949.7
6*16641045121,1211,1211,1211,12193,52693,52693,52693,526180.8180.8180.8180.85.55.55.55.5
7*17700000000000.00.00.00.00.00.00.00.0
8*18800000000000.00.00.00.00.00.00.00.0
9*19900000000000.00.00.00.00.00.00.00.0

Long RAG 10240/1536 (capacity < 1 RPS/replica) — capacity 0: all 9 rate levels errored (requests queue past the TTFT timeout). No sustained throughput.

Full 22-column open-loop sweep — 910B3 × 4 · Qwen3-30B-A3B (MoE) · MindIE 2.2.RC1

Chat 512/256 — units: TTFT / E2E / ITL in ms, TPS in tok/s. * = past saturation (achieved RPS < target rate).

ratereplicaTotal RPSRPS/repMean InMean OutTTFT p90TTFT p95TTFT p99TTFT meanE2E p90E2E p95E2E p99E2E meanITL p90ITL p95ITL p99ITL meanTPS p90TPS p95TPS p99TPS mean
11115202561151462349410,24810,33510,5399,16939.840.141.035.6433.3568.91,033.1248.4
21225202561011081328711,18011,30711,5059,87443.444.044.838.4903.41,199.72,137.8497.6
31335202561051121628912,98013,19213,59411,78550.551.552.945.91,368.91,817.33,193.2737.7
41445202561071121458915,19915,66615,77913,89059.361.161.554.11,834.82,437.14,221.7979.2
51555202561101141349217,00617,13617,38316,45666.366.867.864.22,347.83,137.15,398.11,210.2
61665202561211282169921,81522,39023,04020,76085.187.590.081.02,774.03,685.76,269.51,431.9
717752025612513117210127,19127,60228,04825,442106.2107.8109.699.43,164.34,177.66,990.51,635.2
8*18852025613514819911167,57867,93968,49654,263264.6266.0268.2212.43,156.04,211.17,206.71,583.5
9*19952025614815618612065,75466,78067,05860,393257.4261.4262.6236.43,368.94,559.08,004.41,659.7

Code 1024/1024 — units: TTFT / E2E / ITL in ms, TPS in tok/s. * = past saturation (achieved RPS < target rate).

ratereplicaTotal RPSRPS/repMean InMean OutTTFT p90TTFT p95TTFT p99TTFT meanE2E p90E2E p95E2E p99E2E meanITL p90ITL p95ITL p99ITL meanTPS p90TPS p95TPS p99TPS mean
11111032102413514026611746,18646,35546,47444,52245.045.245.343.41,691.32,267.03,975.9874.8
2*1221032102414715319812276,95676,99577,03272,95975.175.175.271.23,080.04,096.06,887.21,525.5
3*13310321024158165196127187,558187,711187,750114,584183.2183.4183.4111.92,621.43,625.66,459.11,151.7
4*14410321024154161196125205,162209,423212,868161,776200.5204.6208.0158.02,504.13,394.65,942.91,182.3
5*15510321024155161184123219,823222,283226,664180,215214.8217.1221.5176.02,659.13,599.26,267.71,248.1
6*16610321024151156181122224,514232,688234,988187,325219.3227.4229.6183.02,818.63,813.56,589.01,324.9
7*17710321024152159189123238,503240,573241,737198,131233.0235.0236.2193.62,933.13,953.36,813.41,393.0
8*18810321024158182455135243,807244,264244,630207,207238.2238.6239.0202.43,153.64,227.97,194.31,516.2
9*19910321024150155188125242,101242,203242,339211,232236.5236.6236.8206.43,271.74,428.07,623.41,423.0

RAG 4096/512 — units: TTFT / E2E / ITL in ms, TPS in tok/s. * = past saturation (achieved RPS < target rate).

ratereplicaTotal RPSRPS/repMean InMean OutTTFT p90TTFT p95TTFT p99TTFT meanE2E p90E2E p95E2E p99E2E meanITL p90ITL p95ITL p99ITL meanTPS p90TPS p95TPS p99TPS mean
1111410451225626227423923,98224,29924,43522,52046.447.147.443.6874.01,164.02,088.8473.8
2*122410451226827429224455,03755,34555,49450,869107.2107.9108.199.11,646.12,211.73,887.2837.9
3*13341045124597157,806413123,457123,631127,791108,562240.5240.9241.4211.61,818.12,464.34,378.2870.6
4*144410451221,85722,29422,6175,192142,170144,016146,093117,326240.6242.2247.0219.41,771.62,441.44,490.7829.5
5*1554104512278287316251118,977119,424120,140108,327232.3233.2234.6211.51,190.21,636.53,030.6546.6
6*166410451254054555442598,07198,72299,38392,079191.2192.5193.9179.3885.61,199.12,240.5459.5
7*17741045121,4501,5311,6641,00894,01294,72795,29888,159182.8184.7186.1170.5819.51,162.82,328.9346.9
8*18841045122,8252,9513,1641,74094,20394,63195,11589,652183.0184.4185.7172.0972.71,345.62,563.8478.8
9*19941045123,9354,1584,4252,35194,26094,66395,11490,226182.9183.8185.7172.0958.71,334.52,641.2484.0

Long RAG 10240/1536 (capacity < 1 RPS/replica) — units: TTFT / E2E / ITL in ms, TPS in tok/s. * = past saturation (achieved RPS < target rate).

ratereplicaTotal RPSRPS/repMean InMean OutTTFT p90TTFT p95TTFT p99TTFT meanE2E p90E2E p95E2E p99E2E meanITL p90ITL p95ITL p99ITL meanTPS p90TPS p95TPS p99TPS mean
1*111102481536566569576539187,723188,821190,020173,963122.0122.7123.4113.0410.7552.61,012.0210.9
2*12200000000000.00.00.00.00.00.00.00.0
3*13300000000000.00.00.00.00.00.00.00.0
4*14400000000000.00.00.00.00.00.00.00.0
5*15500000000000.00.00.00.00.00.00.00.0
6*16600000000000.00.00.00.00.00.00.00.0
7*1771024815361,1491,1491,1491,14948,71848,71848,71848,71831.031.031.031.032.332.332.332.3
8*18800000000000.00.00.00.00.00.00.00.0
9*19900000000000.00.00.00.00.00.00.00.0