NVIDIA just cut the official release of – and while hardware gets the headlines (looking at you, next-gen Rubin GPUs), this software update might be the real performance unlock for the next 18 months.
Instead of writing raw PTX or using third-party wrappers, you can now write: cuda release news
If you’re building HPC simulations, training LLMs, or optimizing edge inference, here’s what changed, what broke (sorry, legacy Kepler devs), and what to benchmark first. The biggest quality-of-life shift: cuda.compile and cuda.execute are now built into the core driver API. NVIDIA just cut the official release of –
import cuda @cuda.kernel def vec_add(a, b, c): idx = cuda.thread_idx.x + cuda.block_idx.x * cuda.block_dim.x if idx < a.size: c[idx] = a[idx] + b[idx] vec_add[blocks, threads](a, b, c) import cuda @cuda
April 14, 2026 Reading time: 4 min