
AI Training Optimized
HPC GPU Servers
Extreme performance systems optimized for AI training, large language models, and scientific simulation.
Deploy the most powerful GPU infrastructure available. With 8x NVIDIA H100 GPUs interconnected via NVLink 4.0, these servers deliver unmatched performance for training the largest AI models.
Performance Highlights
1,979
TFLOPS FP64
3,958
TFLOPS FP32 Tensor
4.9
TB/s Memory Bandwidth
640
GB Total GPU Memory
Detailed Specifications
GPU Configuration
- •8x NVIDIA H100 Tensor Core GPUs
- •80GB HBM3 memory per GPU
- •NVLink 4.0 interconnect (900 GB/s per link)
- •4.9 TB/s aggregate memory bandwidth
- •1,979 TFLOPS FP64 performance
- •3,958 TFLOPS FP32 Tensor performance
- •1,979 TFLOPS FP16 Tensor performance
- •3,958 TFLOPS INT8 Tensor performance
CPU & System Memory
- •2x AMD EPYC™ 9654 processors
- •192 cores total (384 threads)
- •Base frequency: 2.4 GHz
- •Max boost: 3.7 GHz
- •1TB DDR5 ECC system memory
- •4800 MT/s memory speed
- •128 PCIe 5.0 lanes
Cooling System
- •Liquid cooling ready (direct-to-chip)
- •PUE rating: 1.08
- •Operating temp: 10-35°C
- •Redundant cooling loops
- •Hot-swappable fans
Power Supply
- •4x 3000W redundant PSUs
- •N+1 redundancy
- •80 Plus Titanium efficiency
- •Total capacity: 12kW
- •Hot-swappable power supplies
Storage
- •12x NVMe Gen5 hot-swap bays
- •Up to 30.72TB per server
- •14GB/s sequential read
- •1.5M IOPS random read
Networking
- •2x 100GbE QSFP28 ports
- •4x 25GbE SFP28 ports
- •RDMA over Converged Ethernet (RoCE)
- •Latency: < 2ms
Ideal Use Cases
Large Language Model Training
Train GPT-scale models with billions of parameters. Optimized for transformer architectures and distributed training.
- Multi-node distributed training
- ZeRO optimization support
- Mixed precision training (FP16/BF16)
Computer Vision & Image Generation
Train diffusion models, GANs, and vision transformers for image generation and analysis.
- Stable Diffusion training
- DALL-E style models
- Video generation models
Scientific Computing
Run complex simulations, molecular dynamics, and computational fluid dynamics workloads.
- Molecular dynamics simulations
- Climate modeling
- Quantum chemistry calculations
High-Performance Inference
Deploy production inference endpoints for large models with high throughput requirements.
- Batch inference processing
- Real-time model serving
- Multi-model deployment
Pre-installed Software Stack
Deep Learning Frameworks
- • PyTorch 2.1+ with CUDA 12.1
- • TensorFlow 2.15+
- • JAX with GPU support
- • Hugging Face Transformers
Distributed Training
- • DeepSpeed
- • FSDP (PyTorch)
- • Horovod
- • NCCL optimized
Development Tools
- • CUDA Toolkit 12.1
- • cuDNN 8.9
- • Jupyter Lab
- • Docker & Kubernetes
Availability & Pricing
Regions Available
- • US-East-1 (Northern Virginia)
- • US-West-2 (Silicon Valley)
- • EU-West-1 (Coming Q2 2025)
Pricing
- • Pay-as-you-go: $4.00/hour per GPU
- • Reserved instances: Up to 60% discount
- • Enterprise: Custom pricing