Neural Shaders
Blackwell Neural Shaders (Unified AI and Traditional Shaders)
AI is embedded into parts of the traditional rendering pipeline, paving the path towards full neural shading.
Enhanced Tensor Cores that are
now accessible to graphics shaders combined with scheduling optimizations in SER 2.0 (Shader Execution Reordering) so that AI graphics with neural filtering features and AI models including generative AI can be run concurrently in next-generation games.
NVIDIA has worked with Microsoft to create the new Cooperative Vectors API. When combined with differentiable shading language features in Slang, Cooperative Vectors unlock the ability for game developers to use neural techniques in their games including neural texture
compression, that provides up to seven-to-one VRAM compression over block compressed formats, and other techniques such as RTX Neural Materials, Neural Radiance Cache, RTX Skin, and RTX Neural Faces.
Neural shaders allow us to train neural networks to learn efficient approximations of complex algorithms that calculate how light interacts with surfaces, efficiently decompress textures that are stored in video memory in supercompressed form, predict indirect lighting based on limited
ground truth data, and approximate subsurface light scattering —all contributing to a more immersive gaming experience. The potential applications for neural shaders are not yet fully explored, which means more exciting features for faster and more realistic (or stylized) real-time
rendering lie ahead.
Then explanations for RTX Neural materials, neural texture compression, neural radiance cache, RTX skin, neural faces
Shader Execution Reordering (SER) 2.0
First introduced in the Ada architecture, SER on Blackwell is enhanced by several innovations to both hardware and software that further improve the feature’s effectiveness.
The core reorder logic of SER on Blackwell is twice as efficient, reducing reordering overhead and increasing its precision. The higher precision results in smarter coherence extraction and lets developers provide more application-specific knowledge to reorder operations, in turn increasing overall workload performance.
SER is fully controlled by applications through a small API, allowing developers to easily apply reordering where workloads benefit most. The API additionally introduced new flexibility around the invocation of ray tracing shaders to the programming model, enabling more streamlined ways to structure renderer implementations while taking advantage of reordering. Several game titles that feature path tracing, as well as a number of production rendering packages, already take advantage of SER. These applications will benefit directly from the Blackwell SER enhancements without any code changes.
AI Management Processor (AMP)
The role of AMP is to take over the responsibility of the CPU’s scheduling of GPU tasks, reducing dependency on the system CPU, which is often a bottleneck for game performance. In fact, allowing the GPU to manage its own task queue can lead to lower latency because of less back-and-forth communication between the GPU and CPU. This allows smoother frame rates in games, and better multitasking in Windows because the CPU is less burdened.
Essentially, AMP is used to coordinate, schedule fairly, and ensure a smoother gaming experience without performance drops. With LLMs, it does this by reducing the time to first response, and with games, it prioritizes work with the game engine to prevent stuttering. By delivering work at more predictable times, AMP can significantly improve quality of service depending on workloads.
=================================================================================
=================================================================================
That's the main points I found interesting. I didn't go into the new MFG as that's understood and was detailed. Among other things it seems blackwell as an upgraded gating compared to Ada for power state latencies and frequency switching.
It seems that neural shaders don't have a specific requirement except TOPs, blackwell new tensor cores and SER 2 should help according to paper, but will run on all RTX cards.
The biggest surprise is Mega geometry being embedded in the RT core, Its not software only. That means the triangle cluster intersection / compression engine for Mega geometry mesh clusters is effectively accelerated in ASIC-like tensor cores. That could be maybe run on older RTX cards but I fear that performance cost of not having it supported by hardware RT is gonna be tough to run. I'm surprised they even said that the Alan wake 2 mega geometry patch is coming to all rtx cards. Will be interesting to benchmark that.