Open-Source AI Model

Qwen 3

Developed by Alibaba Cloud (Qwen Team)

Local AI Deployment Experts 24+ Years IT Infrastructure GPU Hardware In Stock

Key Capabilities

  • Hybrid thinking: toggle between deep reasoning and fast response
  • 119-language multilingual support
  • MoE variants for efficient large-scale inference
  • Strong math, coding, and reasoning benchmarks
  • Agentic tool use and MCP support

VRAM Requirements by Quantization

Choose the right GPU based on your performance and quality needs.

Model / QuantizationVRAM Required
8B FP1616GB
32B FP1664GB
32B Q420GB
235B MoE FP16470GB
235B MoE Q4130GB

Use Cases

Qwen 3 (0.6B, 1.7B, 4B, 8B, 14B, 32B, 30B-A3B (MoE), 235B-A22B (MoE)) can be deployed for enterprise AI applications including document processing, code generation, data analysis, and conversational AI. License: Apache 2.0.

Run Qwen 3 with Petronella

PTG deploys Qwen 3 for organizations needing multilingual AI (119 languages) with efficient MoE inference. Apache 2.0 license means no restrictions on commercial deployment.

Recommended Hardware

Model SizeRecommended GPU
8BRTX 5080 (16GB)
32BRTX 5090 (32GB) or RTX PRO 5000 (48GB)
235B-A22B MoEDGX Spark (128GB) or 2x RTX PRO 6000 (192GB)

Deploy Qwen 3 On-Premises

Our team builds GPU-accelerated systems configured and optimized for Qwen 3. Private, secure, and fully under your control.