Runtime
The MIND runtime provides deterministic execution of compiled models with minimal overhead. It supports multiple deployment modes from embedded devices to cloud servers.
Architecture
┌─────────────────────────────────────┐ │ Application │ ├─────────────────────────────────────┤ │ Runtime API (C/Rust) │ ├─────────────────────────────────────┤ │ Executor │ Memory Manager │ ├────────────────┼────────────────────┤ │ CPU Backend │ GPU Backends │ │ (Stable) │ CUDA|Metal|ROCm|WebGPU│ └────────────────┴────────────────────┘
GPU Backend: Production CUDA 12.8+ backend available via Enterprise license. All 4 GPU backends (CUDA, Metal, ROCm, WebGPU) are production-ready.
GPU Runtime (Enterprise)
The Enterprise GPU runtime provides production-grade GPU acceleration across 4 backends:
- cuBLAS/cuDNN: TF32 Tensor Cores for matmul, auto-tuned convolutions
- Memory Allocator: CachingAllocator achieves 8.3M allocs/sec (180x faster than cudaMalloc)
- Tensor Cores: TF32, FP16, FP8 (Ada Lovelace+) with PTX mma.sync
- Async Streams: 8 streams (6 compute, 2 transfer) for overlapped execution
- Supported GPUs: SM_80+ (Ampere, Ada Lovelace, Hopper)
Execution Modes
| Mode | Use Case | Characteristics |
|---|---|---|
| AOT (Ahead-of-Time) | Production deployment | Fastest startup, smallest binary |
| JIT (Just-in-Time) | Development, dynamic shapes | Flexible, runtime optimization |
| Interpreter | Debugging, conformance | Reference implementation |
Memory Management
- Static allocation: Memory planned at compile time for AOT
- Arena allocator: Fast bump allocation for intermediate tensors
- Buffer reuse: Automatic sharing of memory between non-overlapping tensors
- Device memory: Unified API for CPU and GPU memory
Determinism Guarantees
The runtime provides strong determinism guarantees:
// Create runtime with deterministic mode (default)
let rt = Runtime::new(RuntimeConfig {
deterministic: true, // IEEE 754 strict, no threading non-determinism
seed: 42, // RNG seed for reproducibility
});
// Same inputs always produce same outputs
let out1 = model.forward(&input);
let out2 = model.forward(&input);
assert_eq!(out1, out2); // GuaranteedResource Limits
let config = RuntimeConfig {
max_memory_mb: 1024, // Memory limit
max_threads: 4, // Thread pool size
timeout_ms: Some(5000), // Execution timeout
..Default::default()
};
let rt = Runtime::new(config);Profiling
// Enable profiling
let rt = Runtime::new(RuntimeConfig {
profile: true,
..Default::default()
});
model.forward(&input);
// Get profile data
let profile = rt.get_profile();
for op in profile.operations {
println!("{}: {}ms", op.name, op.duration_ms);
}Learn More
See the full runtime specification at mind-spec/runtime.md and the runtime repository at mind-runtime.