Open-Source AI Model
Qwen 3
Developed by Alibaba Cloud (Qwen Team)
Local AI Deployment Experts
24+ Years IT Infrastructure
GPU Hardware In Stock
Key Capabilities
- Hybrid thinking: toggle between deep reasoning and fast response
- 119-language multilingual support
- MoE variants for efficient large-scale inference
- Strong math, coding, and reasoning benchmarks
- Agentic tool use and MCP support
VRAM Requirements by Quantization
Choose the right GPU based on your performance and quality needs.
| Model / Quantization | VRAM Required |
|---|---|
| 8B FP16 | 16GB |
| 32B FP16 | 64GB |
| 32B Q4 | 20GB |
| 235B MoE FP16 | 470GB |
| 235B MoE Q4 | 130GB |
Use Cases
Qwen 3 (0.6B, 1.7B, 4B, 8B, 14B, 32B, 30B-A3B (MoE), 235B-A22B (MoE)) can be deployed for enterprise AI applications including document processing, code generation, data analysis, and conversational AI. License: Apache 2.0.
Run Qwen 3 with Petronella
PTG deploys Qwen 3 for organizations needing multilingual AI (119 languages) with efficient MoE inference. Apache 2.0 license means no restrictions on commercial deployment.
Recommended Hardware
| Model Size | Recommended GPU |
|---|---|
| 8B | RTX 5080 (16GB) |
| 32B | RTX 5090 (32GB) or RTX PRO 5000 (48GB) |
| 235B-A22B MoE | DGX Spark (128GB) or 2x RTX PRO 6000 (192GB) |
Deploy Qwen 3 On-Premises
Our team builds GPU-accelerated systems configured and optimized for Qwen 3. Private, secure, and fully under your control.