• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

RTX Blackwell Whitepaper Released

Buggy Loop

Member
Additions from Ada architecture for tensor cores & RT cores

Blackwell 5th Generation Tensor Cores
Tensor Cores are specialized high performance compute cores that are tailored for the matrix
multiply and accumulate math operations that are used in AI and HPC applications. Tensor Cores
provide groundbreaking performance for the matrix computations that are critical for both deep
learning neural network training and inference operations.

Like NVIDIA Ada GPU Tensor Cores, the RTX Blackwell Tensor Cores support FP16, BF16, TF32,
INT8, INT4, and Hopper’s FP8 Transformer Engine. RTX Blackwell adds new support for FP4 and
FP6 Tensor Core operations, and the new Second-Generation FP8 Transformer Engine, similar to
our datacenter-class Blackwell GPUs.

FP4 Support
Generative AI models have improved in capabilities since the first ones released in 2022. But the
improvements have often come with an increase in parameters and size. As models grow in both
compute and memory requirements, it can be difficult to run such models even on the latest
hardware.

The GeForce RTX 50 Series includes support for the FP4 data format in its new Tensor Cores to
help address this issue. FP4 provides a lower quantization method, similar to file compression,
which decreases model sizes. Compared with FP16 precision — the default method that most
models publish with — FP4 requires less than half of the memory, and 50 Series GPUs provide

6yISGoW.jpeg
 
Last edited:

Buggy Loop

Member
Blackwell 4th Generation RT Cores

Ray-triangle intersection testing is a computationally expensive operation that is performed at
high frequency when rendering a ray-traced scene. The Fourth-Generation RT Core in the
Blackwell architecture provides double the throughput for Ray-Triangle Intersection Testing over
Ada.


hx4LOI6.jpeg


In addition to the above-specified functions, the RT Cores found in both Ada and Blackwell GPUs include a dedicated unit known as the Opacity Micromap Engine. The Opacity Micromap Engine evaluates Opacity Micromaps and directly alpha-tests geometry to significantly reduce shader-based alpha computations. New Mega Geometry technology provides RTX-accelerated ray tracing of triangle cluster-level structures. The new Blackwell RT Core includes a Triangle Cluster Intersection Engine, which further accelerates ray tracing of Mega Geometry, while also including standard ray-triangle intersection testing. Blackwell also adds Linear Swept Spheres as a hardware-accelerated path to ray trace fine geometry like hair. All are described below.

Mega Geometry

Mega Geometry is a new RTX technology aimed at dramatically increasing the geometric detail that is possible in ray-traced applications. In particular, Mega Geometry enables game engines such as Epic’s Unreal Engine 5, which employ modern level-of-detail (LOD) systems like Nanite, to ray trace their geometry at full fidelity. Falling back to low-resolution proxies for ray-traced effects is no longer needed, enabling new levels of quality for shadows, reflections, and indirect illumination. Mega Geometry also helps bring techniques previously reserved for production rendering, such as displaced subdivision surfaces, to the domain of real-time ray tracing.

^^ This is the dragon demo they showed at CES with the crazy geometry displacement

Won't post the details here but the mega geometry is well explained, It explains the Cluster-based LOD updates, high object counts, subdivision surfaces.

This is tailored made to path trace unreal 5 games.
 

Buggy Loop

Member
Neural Shaders
Blackwell Neural Shaders
(Unified AI and Traditional Shaders)
AI is embedded into parts of the traditional rendering pipeline, paving the path towards full neural shading. Enhanced Tensor Cores that are
now accessible to graphics shaders combined with scheduling optimizations in SER 2.0
(Shader Execution Reordering) so that AI graphics with neural filtering features and AI models including generative AI can be run concurrently in next-generation games.
NVIDIA has worked with Microsoft to create the new Cooperative Vectors API. When combined with differentiable shading language features in Slang, Cooperative Vectors unlock the ability for game developers to use neural techniques in their games including neural texture
compression, that provides up to seven-to-one VRAM compression over block compressed formats, and other techniques such as RTX Neural Materials, Neural Radiance Cache, RTX Skin, and RTX Neural Faces.

oSZImik.jpeg


Neural shaders allow us to train neural networks to learn efficient approximations of complex algorithms that calculate how light interacts with surfaces, efficiently decompress textures that are stored in video memory in supercompressed form, predict indirect lighting based on limited
ground truth data, and approximate subsurface light scattering —all contributing to a more immersive gaming experience. The potential applications for neural shaders are not yet fully explored, which means more exciting features for faster and more realistic (or stylized) real-time
rendering lie ahead.

Then explanations for RTX Neural materials, neural texture compression, neural radiance cache, RTX skin, neural faces

Shader Execution Reordering (SER) 2.0
First introduced in the Ada architecture, SER on Blackwell is enhanced by several innovations to both hardware and software that further improve the feature’s effectiveness. The core reorder logic of SER on Blackwell is twice as efficient, reducing reordering overhead and increasing its precision. The higher precision results in smarter coherence extraction and lets developers provide more application-specific knowledge to reorder operations, in turn increasing overall workload performance.

SER is fully controlled by applications through a small API, allowing developers to easily apply reordering where workloads benefit most. The API additionally introduced new flexibility around the invocation of ray tracing shaders to the programming model, enabling more streamlined ways to structure renderer implementations while taking advantage of reordering. Several game titles that feature path tracing, as well as a number of production rendering packages, already take advantage of SER. These applications will benefit directly from the Blackwell SER enhancements without any code changes.

AI Management Processor (AMP)
The role of AMP is to take over the responsibility of the CPU’s scheduling of GPU tasks, reducing dependency on the system CPU, which is often a bottleneck for game performance. In fact, allowing the GPU to manage its own task queue can lead to lower latency because of less back-and-forth communication between the GPU and CPU. This allows smoother frame rates in games, and better multitasking in Windows because the CPU is less burdened.

Essentially, AMP is used to coordinate, schedule fairly, and ensure a smoother gaming experience without performance drops. With LLMs, it does this by reducing the time to first response, and with games, it prioritizes work with the game engine to prevent stuttering. By delivering work at more predictable times, AMP can significantly improve quality of service depending on workloads.


=================================================================================
=================================================================================


That's the main points I found interesting. I didn't go into the new MFG as that's understood and was detailed. Among other things it seems blackwell as an upgraded gating compared to Ada for power state latencies and frequency switching.

It seems that neural shaders don't have a specific requirement except TOPs, blackwell new tensor cores and SER 2 should help according to paper, but will run on all RTX cards.

The biggest surprise is Mega geometry being embedded in the RT core, Its not software only. That means the triangle cluster intersection / compression engine for Mega geometry mesh clusters is effectively accelerated in ASIC-like tensor cores. That could be maybe run on older RTX cards but I fear that performance cost of not having it supported by hardware RT is gonna be tough to run. I'm surprised they even said that the Alan wake 2 mega geometry patch is coming to all rtx cards. Will be interesting to benchmark that.
 
Last edited:

Buggy Loop

Member
Another interesting tidbit

Note that the number of possible INT32 integer operations in Blackwell are doubled compared to Ada, by fully unifying them with FP32 cores, as depicted in Figure 6 below. However, the unified cores can only operate as either FP32 or INT32 cores in any given clock cycle. Figure 6 below shows how the SM architecture evolved between Ada and Blackwell.

vfLdsDo.jpeg


New SM features built for Neural Shading - New RT Core and Tensor Core features
described below enhance and accelerate neural rendering capabilities. The NVIDIA RTX
Blackwell SM provides a doubling of integer math throughput per clock cycle compared to
NVIDIA Ada GPUs, which can increase the performance of address generation workloads
that are crucial for neural shading.

^^^^^

This is a funny one

Those that follow the GPU architectures will maybe see the irony here.

Pascal was FP32/INT32 → Turing was FP32 + INT32 (split) → Ampere/Ada was FP32 + FP32/INT32 → Blackwell back to FP32/INT32

A full circle

I guess the turing / Ampere / Ada solutions was to save on silicon back then as probably the extra cuda cores were smaller on die to make place for RT/ML and they optimized the cost of cuda core area on silicon.

Seems like we're back to full cuda cores 100%, no cut corners. It means even more compute in the pipeline, which they say above that blackwell SM provides a double of integer math, which increases the performance of address generation workloads crucial for neural shading. So we'll have to wait and see how that performs too whenever a game supports it.
 
Last edited:

3liteDragon

Member
The biggest surprise is Mega geometry being embedded in the RT core, Its not software only. That means the triangle cluster intersection / compression engine for Mega geometry mesh clusters is effectively accelerated in ASIC-like tensor cores. That could be maybe run on older RTX cards but I fear that performance cost of not having it supported by hardware RT is gonna be tough to run. I'm surprised they even said that the Alan wake 2 mega geometry patch is coming to all rtx cards. Will be interesting to benchmark that.
Assumed this was known already since NVIDIA already mentioned it on their site & it was revealed during one of the slides they showed off when the first embargo lifted. But yea Mega Geometry will work on all RTX cards but its hardware accelerated in Blackwell only, meaning the performance delta between the 20-50 series will widen in future games when it’s used.

I believe Ada had a similar hardware feature that was embedded within it’s RT cores called DMM (Displaced Micro Mesh) but it had issues with certain mesh types, which I’m guessing the Blackwell cluster solution fixes. We’ll know how big of a performance delta we can expect between all RTX cards when the update for Alan Wake 2 drops tomorrow & people do comparisons since it has Mega Geometry support. My guess is there’ll be a small performance delta between the 40 & 50 series but anything below Ada, the delta will be significant.
 

Buggy Loop

Member
Assumed this was known already since NVIDIA already mentioned it on their site & it was revealed during one of the slides they showed off when the first embargo lifted. But yea Mega Geometry will work on all RTX cards but its hardware accelerated in Blackwell only, meaning the performance delta between the 20-50 series will widen in future games when it’s used.

I believe Ada had a similar hardware feature that was embedded within it’s RT cores called DMM (Displaced Micro Mesh) but it had issues with certain mesh types, which I’m guessing the Blackwell cluster solution fixes. We’ll know how big of a performance delta we can expect between all RTX cards when the update for Alan Wake 2 drops tomorrow & people do comparisons since it has Mega Geometry support. My guess is there’ll be a small performance delta between the 40 & 50 series but anything below Ada, the delta will be significant.

Didn't see it in the kit

At least everywhere I look on nvidia site it says it accelerate BVH building for cluster-based geometry systems, but no sign of it supported in hardware form on blackwell. If you find please share.

We have any solid news on Alan wake 2? At CES it was roughly at blackwell launch that patch would drop.
 

3liteDragon

Member
Didn't see it in the kit

At least everywhere I look on nvidia site it says it accelerate BVH building for cluster-based geometry systems, but no sign of it supported in hardware form on blackwell. If you find please share.
I just meant the site on Jan 6th when they were announced, they were hinting Neural Shaders/Mega Geometry had architectural support on Blackwell. DF went over the RT/Tensor core changes when the CES embargo lifted & 19 slides were shown off to media outlets which they talked about & the cluster based system was talked about, I just didn't say anything on here until the whitepaper was officially released.

These were the CES slides that were revealed by outlets on January 15th when that embargo lifted, so this was before the whitepaper release today:

architecture-06.jpg

architecture-07.jpg

architecture-09.jpg

architecture-10.jpg

architecture-11.jpg

architecture-13.jpg

Full link: https://www.techpowerup.com/review/nvidia-geforce-rtx-50-technical-deep-dive/3.html
We have any solid news on Alan wake 2? At CES it was roughly at blackwell launch that patch would drop.
AW2's getting a new ULTRA RT preset:
Alan Wake 2’s forthcoming update also adds a new Ultra quality level Ray Tracing preset, adding fully ray-traced refractions, fully ray-traced transparent reflections, and higher quality fully ray-traced indirect lighting. This new Ultra quality ray tracing preset, combined with the all-new transformer-based DLSS models and DLSS Multi Frame Generation, allows Alan Wake 2 to be explored with even better image quality, and heightened realism and immersion. All of these enhancements will be available for free, as part of Remedy Entertainment’s next Alan Wake 2 update. Simply download the update from the Epic Games Store and your experience will be instantly upgraded!
I'm guessing this update drops tomorrow since that's when the 5090 launches alongside the new NVIDIA app update & drivers with support for the override feature.
 
Last edited:

3liteDragon

Member
DF video from CES, that picture of the dragon you saw on the RTX Kit site when they talk about Mega Geometry, that demo had over 500M triangles (Cyberpunk has 10-50M for comparison) & the DF video shows off how the dragon's meshes deform in real-time, EVERY single triangle is fully ray-traced. So Blackwell is the starting point for NVIDIA's vision of path tracing on complex geometry & neural rendering. I'm on a 4070S right now & I'm waiting to see what the 5070S looks like (hopefully at least 16GB of VRAM), raster probably won't be that big of an improvement but the main reason for me to upgrade will be all these features since it's more about future-proofing for the long term. Whatever optimizations they make on the software side of things to neural rendering among other things for the NEXT generation of GPUs probably will come to the 50 series as well since Blackwell's the FIRST generation to support this new era of rendering they're talking about.

 
Last edited:

mclaren777

Member
I'm really hoping these new Tensor cores speed up AI-accelerated tasks in Adobe software.

That's the main reason I'm planning to buy a 5070 next month.
 

Buggy Loop

Member
DF video from CES, that picture of the dragon you saw on the RTX Kit site when they talk about Mega Geometry, that demo had over 500M triangles (Cyberpunk has 10-50M for comparison) & the DF video shows off how the dragon's meshes deform in real-time, EVERY single triangle is fully ray-traced. So Blackwell is the starting point for NVIDIA's vision of path tracing on complex geometry & neural rendering. I'm on a 4070S right now & I'm waiting to see what the 5070S looks like (hopefully at least 16GB of VRAM), raster probably won't be that big of an improvement but the main reason for me to upgrade will be all these features since it's more about future-proofing for the long term. Whatever optimizations they make on the software side of things to neural rendering among other things for the NEXT generation of GPUs probably will come to the 50 series as well since Blackwell's the FIRST generation to support this new era of rendering they're talking about.



Yea that demo is insane.

I wish they would revisit all path tracing games they supported in the past years and bring them to standards ala Alan wake 2. Give mega geometry to Cyberpunk / Black myth / Indiana jones, etc. Black myth using nanite should be the easier transition and biggest benefactor.

Could doom dark ages be using the new tech? Its fresh-er in path tracing implementation.
 
Last edited:

Buggy Loop

Member
838 INT8 TOPS. I thought it was 4,000.

AI TOPS always goes to the smallest data format possible with sparsity. In this case FP4 tensor TFlops. 3352.

GB202 is the baseline of the die @ 4,000. That's "Blackwell" in CES slide. 5090 gets the variant (binned) GB202-300-A1. It has less tensor cores, less rt cores, SMs, ROPs, etc.

All gens its like that when comparing gaming GPU vs the whitepaper. Or the techpowerup specs with the die specs vs the variant you look at.
 
Top Bottom