• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

AMD RDNA 4 GPUs To Incorporate Brand New Ray Tracing Engine, Vastly Different Than RDNA 3

KeplerL2

Neo Member
With bad occupancy, most of the units get unused. And that is a waste of die space.
Nvidia already made a good step with SER. But there is still a lot to do to improve prediction and scheduling for RT units.
I bet that Nvidia will have a SER 2.0 and more tech to address this in their 5000 series.

Even Nvidia's hardware suffers from low occupancy in Cyberpunk's 2077 Path Tracing render path.

And how do you think occupancy issues get resolved? Through prefetching, prediction and out of order execution. All things that have been done in CPUs for decades.

The problem is doing that in an area efficient way. Zen5 has a god-tier branch predictor, but putting that in an RDNA CU would almost double its size.
 
I don't know why you are quoting me and saying what I have already said a few posts above. But ok.
I mean fair enough.
But the point still stands. Obviously Nvidia are a moving target, however, AMD hasn't ever really gone "all-in" on matrix hardware for "AI" or RT hardware. They have very area efficient solutions, which are less performant but good enough.
 

winjer

Gold Member
And how do you think occupancy issues get resolved? Through prefetching, prediction and out of order execution. All things that have been done in CPUs for decades.

The problem is doing that in an area efficient way. Zen5 has a god-tier branch predictor, but putting that in an RDNA CU would almost double its size.

Now you understand the need for GPUs to have better prediction and re-ordering.
The bottleneck really is in knowing what is going to be needed to execute the next RT calculation.
I doubt we'll see a front end in a GPU as complex as we have in a modern CPU. But the fact remains that is where nvidia, AMD and Intel have to investing in.
Intel and Nvidia already have some features for RT instruction reordering. But a lot more needs to be done.
Some will have to be done in hardware. But I guess that some of it also can be done with improved compilers.
The next generation is going to be interesting.
 

SolidQ

Member
N44 smaller, than N33
135871-amd-navi-44-leak-4.png
 

llien

Member
Well say you are targeting 60 FPS, so have a 16.6 ms per frame budget. Imagine on Nvidia hardware that the ray tracing part takes 3ms, and the remainder of the frame takes 13.6ms. Then you're hitting your target. Now imagine that on AMD hardware, the ray tracing part takes 6ms instead. Then despite being at a big disadvantage during the ray tracing portion, you can still equal Nvidia if you can complete the rest of the frame in 10.6 ms. So a ray tracing deficit is "compensated" by extra rasterization performance. Obviously however the greater the frame time "deficit" due to ray tracing, the less time you will have to complete the rest of the frame in time, and the more performance you will need.
Ok, let me apply this to GPU + CPU bottlenecks, shall I.

If you use a Celeron with the most overpriced high end GPU on the market, when your Celeron can not do jack shit and game is CPU starved, that overpriced GPU "somehow helps" by rendering frames faster, so that Celeron has "more milliseconds" to render... :D :D :D
 

llien

Member
Because even NVIDIA realizes dedicating a large amount of silicon for RT is a not good idea yet.
Intersection tests, something done by "hardwahr RT" is just one step out of multiple steps needed to do the "RT".
The rest is done "old school". A lot of it can be optimized per vendor.

And that is how you have TPU show 15% RT diff on average between 7900XTX and 4080, but get people cherry picking "but in this game" card.

Perf is just all over the place, depending on how much time devs spent optimizing for a vendor.

Which highlights another so far failed promise of "hardwahr RT": that it will make lives of developers easier.
 

FireFly

Member
Ok, let me apply this to GPU + CPU bottlenecks, shall I.

If you use a Celeron with the most overpriced high end GPU on the market, when your Celeron can not do jack shit and game is CPU starved, that overpriced GPU "somehow helps" by rendering frames faster, so that Celeron has "more milliseconds" to render... :D :D :D
If your Celeron can send the required data to the GPU 5 times per second, say, then the GPU has 200ms to render each frame. In which case the total rendering time would already need to exceed 200ms for any GPU time "savings" to become apparent, in the form of increased performance. However once you are bound by the GPU time taken to complete a given frame, then any reduction in the time taken to complete a part of the rendering pipeline, will result in a lower total frame time and a higher frame rate.

In case this is not clear, let me employ a suitably silly analogy. Say you and I are working at McDonald's and I am in charge of completing each burger, you are in charge of cooking and serving the fries, and my colleague is in charge of cooking the burgers. Most of the delay in getting the burger done is the cooking part, but if I decrease the time taken to complete the burger, the rate at which finished burgers are made will still increase. This means that if the fries and drink are ready, the rate at which completed meals leave the kitchen will also increase. However, if there is a hold up with finishing the fries because there is something wrong with the fryer or you weren't trained how to use it properly, then I will complete the burgers faster, but the meals won't be produced any quicker, because they will still be waiting on the fries. In the example you gave, this is like the GPU being ready for the next frame, but the CPU not being ready to provide it yet.

Your claim seems to be that ray tracing is also like making fries, in that it's a parallel task that must be completed "in time" to not delay the other tasks. However, my understanding that at least some of the ray tracing work is a sequential part of the overall pipeline, so that it's more like the cooking of the burger. If it takes longer to cook the burger, then I can still save time by preparing it more quickly. So once we get a new fryer (you replace your CPU) we will be able to get meals out faster (higher frame rate) due to my increased burger preparation speed (finishing the rasterisation part faster).
 

gatti-man

Member
I want to believe but I don’t. AMD is getting absolutely reamed by Nvidia. And I know Nvidia is insanely expensive but the product experience with them is pretty much world class. DLSS has progressed to the point of being magic. It’s just shitty you have to spend 1200+ to see it.

4k plus RT is face melting I hope everyone can experience it soon.
 
Last edited:

Loxus

Member
I want to believe but I don’t. AMD is getting absolutely reamed by Nvidia. And I know Nvidia is insanely expensive but the product experience with them is pretty much world class. DLSS has progressed to the point of being magic. It’s just shitty you have to spend 1200+ to see it.

4k plus RT is face melting I hope everyone can experience it soon.
DLSS isn't magic anymore.

Have you not seen the PSSR vs DLSS comparisons?

What makes PSSR interesting, is both Sony and AMD collaborated in working on PSSR. Which means FSR4 (which is AI driven), is most likely similar in quality to PSSR.
 

llien

Member
. Most of the delay in getting the burger done is the cooking part, but if I decrease the time taken to complete the burger, the rate at which finished burgers are made will still increase.
You assume them to be sequential steps.
Which is not the case.

but the product experience with them
I've got a laptop with 6800M, another one with pathetic piece of silicon garbage 3050 and a desktop 6600XT.
When people tell me how amazing the experience with green silikon is, my polite reaction is: STFU.

Green shit is so arrogant, the pieces of shit even ask you to log the f*ck in to automatically update goddamn drivers.

progressed to the point of being magic
Oh my...
 

FireFly

Member
You assume them to be sequential steps.
Which is not the case.
Nvidia's Turing documentation says:

"The RT Cores in Turing can process all the BVH traversal and ray-triangle intersection testing, saving the SM from spending the thousands of instruction slots per ray, which could be an enormous amount of instructions for an entire scene. The RT Core includes two specialized units. The first unit does bounding box tests, and the second unit does ray-triangle intersection tests. The SM only has to launch a ray probe, and the RT core does the BVH traversal and ray-triangle tests, and return a hit or no hit to the SM. The SM is largely freed up to do other graphics or compute work. "


Largely freed implies some proportion of the CUDA cores are still used to facilitate ray tracing.

On AMD hardware, the TMUs are used to perform the ray/box and ray/triangle intersection via special "texture instructions", so that would limit the ability of the TMU to run other texture instructions. In addition, the BVH traversal is performed by the shader cores.

"Raytracing acceleration is accessed via a couple of new texture instructions. Obviously these instructions don’t actually do traditional texture work, but the texture unit is a convenient place to tack on this extra functionality. The new instructions themselves don’t do anything beyond an intersection test. Regular compute shader code deals with traversing the BVH. It also has to calculate the inverse ray direction and provide that to the texture unit, even though the texture unit has enough info to calculate that by itself. AMD probably wanted to minimize the hardware cost of supporting raytracing, and figured they had enough regular shader power to get by with such a solution."


So in both cases there will be a frame time cost of using the shader cores for ray tracing, that can be compensated by being able to complete the rest of their work faster.
 
Last edited:

llien

Member
...Largely freed implies some proportion of the CUDA cores are still used to facilitate ray tracing...
That's a long way to say "'hardwahr RT' relies on non-RT cores doing some work".

Yes it does. And there is a number of steps involved that are not about intersection at all. (e.g. preparing / amending the said BHV trees) That is why the promise of "one day, when we will have 'enough hardwahr RT', there will be no perf drop" is a lie.

In addition, the BVH traversal is performed by the shader cores.
This allows to use various structures and I haven't seen any proof that it is a significantly slower approach, but we've detached from the original context of the discussion quite a bit.

And it was about "since gap with raster is bigger, AMD is somehow slower at RT, even when it is faster than NV". A rather peculiar thought, cough.
 

FireFly

Member
That's a long way to say "'hardwahr RT' relies on non-RT cores doing some work".

Yes it does. And there is a number of steps involved that are not about intersection at all. (e.g. preparing / amending the said BHV trees) That is why the promise of "one day, when we will have 'enough hardwahr RT', there will be no perf drop" is a lie.


This allows to use various structures and I haven't seen any proof that it is a significantly slower approach, but we've detached from the original context of the discussion quite a bit.

And it was about "since gap with raster is bigger, AMD is somehow slower at RT, even when it is faster than NV". A rather peculiar thought, cough.
The question was whether having more rasterisation resources would provide a frame rate advantage in games that use ray tracing. Your argument seemed to be that it would not, because the rest of the GPU would simply be waiting for the ray tracing units to complete their work. That's where the Celeron example came in.

And my counter to that was that ray tracing still uses general purpose resources, so there is still a general frame time cost. If on AMD hardware it takes 2ms to do the BVH traversal using the CUs, then that's 2 ms that has to come from somewhere. And therefore that cost can be "recovered" by being able to render the rest of the frame 2ms faster. I think your argument only works if ray tracing is a black box process that exclusively relies on dedicated units.
 
Last edited:

gatti-man

Member
DLSS isn't magic anymore.

Have you not seen the PSSR vs DLSS comparisons?

What makes PSSR interesting, is both Sony and AMD collaborated in working on PSSR. Which means FSR4 (which is AI driven), is most likely similar in quality to PSSR.
Are you serious? Show me the demos of wukong running at ultras settings with RT at 60fps with PSSR.

Sony and AMD cumulatively couldn’t hold Nvidias jock strap when it comes to AI or DLSS. It’s nice you are excited about what they are doing but you’re kidding yourself if you think it’s in any way comparable.
 
Last edited:

gatti-man

Member
You assume them to be sequential steps.
Which is not the case.


I've got a laptop with 6800M, another one with pathetic piece of silicon garbage 3050 and a desktop 6600XT.
When people tell me how amazing the experience with green silikon is, my polite reaction is: STFU.

Green shit is so arrogant, the pieces of shit even ask you to log the f*ck in to automatically update goddamn drivers.


Oh my...
Neither of your cards were qualified by my post. I said $1200 not $200. You bought piece of crap hardware and expect a miracle. I literally said it’s a shame it’s so expensive you’ve got to spend $1200+. I knowingly left out the sucker cards at the bottom bc they are terrible. AMD owns the bottom end. There is no reason for nvidia to even care about the bottom end there is no profit there.
 
Last edited:

Loxus

Member
Are you serious? Show me the demos of wukong running at ultras settings with RT at 60fps with PSSR.

Sony and AMD cumulatively couldn’t hold Nvidias jock strap when it comes to AI or DLSS. It’s nice you are excited about what they are doing but you’re kidding yourself if you think it’s in any way comparable.
Come on dude, seriously?

There's a whole thread with detailed comparisons.
 

gatti-man

Member
Come on dude, seriously?

There's a whole thread with detailed comparisons.
Yeah and it’s completely full of hopes and dreams. I care about results and actuality not woulda coulda shoulda. For example check out the actual tests of games using PSSR. It’s literally not comparable to DLSS at all in reality. It reminds of first Gen DLSS (we are on the 3rd gen about the be 4th)

Like I said show me a game running at 4k 60fps ultra settings with RT. It’s pretty simple DLSS can do it PSSR cant. That’s where the magic is.
 
Last edited:

Zathalus

Member
Yeah and it’s completely full of hopes and dreams. I care about results and actuality not woulda coulda shoulda. For example check out the actual tests of games using PSSR. It’s literally not comparable to DLSS at all in reality. It reminds of first Gen DLSS (we are on the 3rd gen about the be 4th)

Like I said show me a game running at 4k 60fps ultra settings with RT. It’s pretty simple DLSS can do it PSSR cant. That’s where the magic is.
I think you’re mixing up what DLSS actually does. That 4K 60fps Ultra setting with RT that you’re on about is due to a cards power, not due to the upscaler. DLSS/PSSR/FSR is just responsible for the final image quality, in which PSSR is quite close to DLSS, only lacking in temporal stability.

Spending $1k+ on a GPU then using glorified upscaling is a hilarious thing to do.
No.
 

PandaOk

Member
And it was about "since gap with raster is bigger, AMD is somehow slower at RT, even when it is faster than NV". A rather peculiar thought, cough.
wait what GIF


Don’t RDNA2/3 cards lose more performance than Nividia cards once you turn on Raytracing? How do you account for the relatively larger loss in performance if not for a comparable inefficiency somewhere in the process that triggers a bottleneck?
 
Last edited:

kevboard

Member
wait what GIF


Don’t RDNA2/3 cards lose more performance than Nividia cards once you turn on Raytracing? How do you account for the relatively larger loss in performance of not for a comparable inefficiency somewhere in the process that triggers a bottleneck?

there are extreme examples of that where an RTX2060 outperforms any AMD card if you enable pathtracing in Cyberpunk for example.

so far on AMD, the more complex the BVH is, the worse the game runs it seems.

RNDA4 will apparently try to solve this
 

winjer

Gold Member
AMD is working on a technique similar to Nvidia's Ray-Reconstruction.
Probably it will be available on RDNA4 GPUs and the PS5 Pro.
The question is about RDNA2 and 3....


Reconstructing pixels in noisy rendering​

Denoising is one of techniques to address the problem of the high number of samples required in Monte Carlo path tracing. It reconstructs high quality pixels from a noisy image rendered with low samples per pixel. Often, auxiliary buffers like albedo, normal, roughness, and depth are used as guiding information that are available in deferred rendering. By reconstructing high quality pixels from a noisy image within much shorter time than that full path tracing takes, denoising becomes an inevitable component in real-time path tracing.

Existing denoising techniques fall into two groups: offline and real-time denoisers, depending on their performance budget. Offline denoisers focus on production film quality reconstruction from a noisy image rendered with higher samples per pixel (e.g., more than 8). Real-time denoisers target at denoising noisy images rendered with very few samples per pixel (e.g., 1-2 or less) within a limited time budget.

It is common to take noisy diffuse and specular signals as inputs to denoise them separately with different filters and composite the denoised signals to a final color image to better preserve fine details. Many real-time rendering engines include separated denoising filters for each effect like diffuse lighting, reflection, and shadows for quality and/or performance. Since each effect may have different inputs and noise characteristics, dedicated filtering could be more effective.

Neural Denoising​

Neural denoisers [3,4,5,6,7,8] use a deep neural network to predict denoising filter weights in a process of training on a large dataset. They are achieving remarkable progress in denoising quality compared to hand-crafted analytical denoising filters [2]. Depending on the complexity of a neural network and how it cooperates with other optimization techniques, neural denoisers are getting more attention to be used for real-time Monte Carlo path tracing.

A unified denoising and supersampling [7] takes noisy images rendered at low resolution with low samples per pixel and generates a denoised as well as upscaled image to target display resolution. Such joint denoising and supersampling with a single neural network gives an advantage of sharing learned parameters in the feature space to efficiently predict denoising filter weights and upscaling filter weights. Most performance gain is obtained from low resolution rendering as well as low samples per pixel, giving more time budget for neural denoising to reconstruct high quality pixels.
 

Bojji

Member
AMD is working on a technique similar to Nvidia's Ray-Reconstruction.
Probably it will be available on RDNA4 GPUs and the PS5 Pro.
The question is about RDNA2 and 3....


I almost bought 7900XTX few months ago (it had good price) but it will probably be abandoned by AMD when it comes to new ML features (FSR4 and this).
 

ap_puff

Member
I almost bought 7900XTX few months ago (it had good price) but it will probably be abandoned by AMD when it comes to new ML features (FSR4 and this).



Mind you I wouldn't buy an XTX myself due to the power draw (actually matters to me)
 

Bojji

Member



Mind you I wouldn't buy an XTX myself due to the power draw (actually matters to me)


I was using 3080ti so 7900xtx offered the same power draw but with better performance. But of course lack of DLSS was the main factor why I didn't buy it.

I can see this working on RDNA3 but RDNA2? It has no ML at all AFAIK...
 

ap_puff

Member
I was using 3080ti so 7900xtx offered the same power draw but with better performance. But of course lack of DLSS was the main factor why I didn't buy it.

I can see this working on RDNA3 but RDNA2? It has no ML at all AFAIK...
They will do what they always do, make the regular shader cores do it. Which means it will be very slow on rdna2, which likely isn't powerful enough anyway. Maybe at 1080p.
 

Bojji

Member
They will do what they always do, make the regular shader cores do it. Which means it will be very slow on rdna2, which likely isn't powerful enough anyway. Maybe at 1080p.

At this point they just might not support it, RDNA2 is already slow with RT and this would make it even slower.
 

ap_puff

Member
In raster it's still very decent GPU. Xess and FSR3 extended life of this piece of hardware.
yeah I just upgraded to a 4K main monitor and AFMF is saving my bacon. i can still tell its 60fps underneath but the additional motion clarity of 120fps even if half of them are "fake frames" is really nice.
 

gatti-man

Member
I think you’re mixing up what DLSS actually does. That 4K 60fps Ultra setting with RT that you’re on about is due to a cards power, not due to the upscaler. DLSS/PSSR/FSR is just responsible for the final image quality, in which PSSR is quite close to DLSS, only lacking in temporal stability.


No.
No I am not. PSSR might do something close to DLSS but not to the extent DLSS does it. Which is my entire point. You won’t see 60fps RT ultra even on lower settings. It’s more like first gen DLSS. Even then there are blatant IQ issues with retail finished examples of PSSR. People won’t ever see bc most of this forum is filled with console only gamers. The amount of disinformation shouted here is pretty bad when it comes to anything Pc.

DLSS will literally triple my frames on high end games that would be a slide show with PSSR. Yes people here say the ps5 pro is similar to a 4070. Look at a 4070 with DLSS vs the ps5 pro with PSSR. It’s a laughable comparison.
 
Last edited:

winjer

Gold Member
Lack of a good dlss alternative from amd is their problem. Ray tracing in mid hw will not make any diff.

FSR 3.1 already made a big jump in quality. Especially with games that do a proper implementation.
But yes, AMD really needs to catch up with an ML pass for their temporal upscaler.
And this is something that should have already happened around the launch of RDNA3.
 

Sanepar

Member
FSR 3.1 already made a big jump in quality. Especially with games that do a proper implementation.
But yes, AMD really needs to catch up with an ML pass for their temporal upscaler.
And this is something that should have already happened around the launch of RDNA3.
And if their strategy is focused on mid gpus tier a good upscaler will benefit the most to sell these gpus.
 

64bitmodels

Reverse groomer.
DLSS isn't magic anymore.

Have you not seen the PSSR vs DLSS comparisons?

What makes PSSR interesting, is both Sony and AMD collaborated in working on PSSR. Which means FSR4 (which is AI driven), is most likely similar in quality to PSSR.
Which is cool and bodes well for any future versions of FSR (too early to call it FSR4, it could just end up being FSR 3.5)

we need to see if they apply that added expertise to the Raytracing segment. If they can do that they'll have some good shit on their hands
 

OverHeat

« generous god »
Which is cool and bodes well for any future versions of FSR (too early to call it FSR4, it could just end up being FSR 3.5)

we need to see if they apply that added expertise to the Raytracing segment. If they can do that they'll have some good shit on their hands
lol Loxus Loxus making a statement with YouTube video 😂
 

Zathalus

Member
No I am not. PSSR might do something close to DLSS but not to the extent DLSS does it. Which is my entire point. You won’t see 60fps RT ultra even on lower settings. It’s more like first gen DLSS. Even then there are blatant IQ issues with retail finished examples of PSSR. People won’t ever see bc most of this forum is filled with console only gamers. The amount of disinformation shouted here is pretty bad when it comes to anything Pc.

DLSS will literally triple my frames on high end games that would be a slide show with PSSR. Yes people here say the ps5 pro is similar to a 4070. Look at a 4070 with DLSS vs the ps5 pro with PSSR. It’s a laughable comparison.
DLSS increases FPS, so do other upscalers like FSR and XeSS, and the difference in performance between them all is rather close, with the only really major difference being upscaling quality. If Sony released PSSR on PC and Nvidia allowed it to be accelerated on the Tensor cores, PSSR and DLSS would offer a very similar increase in performance on Nvidia GPUs, with the only difference between the quality of the rendered frames. DLSS still has a small advantage on that front.

The reason you are not seeing PT games on the PS5 Pro isn't because of PSSR, its because the Pro simply isn't powerful enough for them, not because PSSR has some sort of performance deficient vs DLSS.
 

Rosoboy19

Member
I hope they can come through with massive RT performance gains. If this generation has taught us anything, it’s that RT is only really noticeable and worth the cost when you use gobs and gobs of it or just full path tracing. And in games with heavy RT/path tracing AMD gets destroyed.

 
Top Bottom