• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

PS5 Pro Technical Seminar with Mark Cerny

Bojji

Member
So what you're saying is that nvidia TOPS performance doesn't really matter at all "because you can just run those things on a normal GPU without any ML"?
How is it not important? Even a simple thing like framegen would benefit greatly from better performance, especially at higher base fps.

I'm not sure what you're getting at here anyway. I just said your claim that there is "no other use for ai in games other than Super resolution" is blatantly false. Your subjective opinion about the "importance" of these other uses isn't something I particularly care about.

There IS use for it, but it's not something that will change gaming world before next gen. This stuff takes years:

- mesh shaders were introduced in 2018, first games using it: 2023? There are like 2 games using it.
- how many games use PS5 I/O hardware? Even some Sony games aren't.
- number of games with competent RT? Not that big. Games requiring hardware RT? Few titles?

Developers are very slow with this stuff so in my option TOPS power of both Nvidia GPUs and pro GPU is irrelevant at the moment (and 2+ years?) for gaming.
 
Last edited:

truth411

Member
Cerny literally stated the secondary motive. The primary is to improve machine graphics (which has implicit retaining hardcore fans), but the secondary is about technology development and iteration towards the next generation. Which is both internal/partner and dev driven. Ie: not primarily about the consumer in the case of the Pro.

I mean you could almost swap them around, because forward looking it sounds like they didn't have confidence executing their PS6 vision without the PS5 pro. The mid gen improvements being a nice bonus. Perhaps it started one way and has ended up the other. Regardless I don't think you can understate the importance of having physical hardware at this point in the cycle already integrating conceptual elements of the next generation.
THIS!!!
That is a huge reason for the PS5 Pro existence, assuming PS6 comes out holiday 2028. That gives them 4 years of experience of ML upscaling, instead of taking a first Crack at it at PS6 Launch. Cerny literally said that.
 

sachos

Member
Does that mean Ps6 will also be 9 x more processing power than Ps5?
Raw performance? Most likely not. But the compounding effect of the hardware and ML/Algorithm improvements over the next 4/5 years might. That is what Amethyst is about i think.
Basically i envision a future where PS5/PS6 cross gen games have basic rendering on PS5 but Full RT even PT on PS6. At least thats what i hope for lol.
 
Last edited:

Boglin

Member
Forgive me for a diary style of post, but I absolutely envy Mark Cerny and his team's job. I have a background with doing house renovations along with retrofitting electronics for different use cases, and the puzzle of figuring out how to modify, adapt or otherwise better utilize an existing part or product in conjunction with accommodating it into my own designs to create something custom is so rewarding.

So regardless of feelings about Playstation or the consumer reception of the PS5 Pro, I just love seeing these types of deep dives and seeing the complex problems and logic involved with solving them. The people who have claimed consoles are simply repackaging off the shelf parts are selling the process so damn short it should be criminal.
 

sachos

Member
Interesting answer to that Oliver question about PSSR being able to upgrade games without a patch when the model is updated. Cerny was quite vague to it but seemed open to the idea, at least to let the user decide. That would be awesome.
He was also kinda vague to the question about how much PSSR will keep on improving, i would have loved to get a clear answer there, he started talking about Frame generation at that point.
 

FrankWza

Member
He was also kinda vague to the question about how much PSSR will keep on improving, i would have loved to get a clear answer there, he started talking about Frame generation at that point.
I think he said that it would be labeled differently and not be PSSR. Almost like PSSR is this version and next will be a different name vs PSSR 2.0
 

Panajev2001a

GAF's Pleasant Genius
Both of these were in original rDNA2 (even Xbox has them), for some reason omitted for PS5 GPU.
Which was my point, compared to PS5 RDNA2 it was upgraded (it was not in PS5 as Cerny’s team is very strict / pragmatic with their budget and they are not shy about asking AMD to redesign the FPU of Ryzen 2 if it helps them in their goals). Then again other stuff aside from RT and ML/AI was also updated beyond RDNA2 as Cerny briefly touched upon (geometry engine, vertex/triangle setup and rasterisation, etc…). They just did not bring in anything that would force devs to recompile all their shaders.

Then again in some cases, if you want the best RT performance, you need to opt-in to get BVH8 support as well as the new automatic HW accelerated stack management and BVH traversal. Otherwise you still do it in compute shaders like before. Quick PS5 Pro patches would not extract all their performance improvements potential. One thing is transparent BC which they achieved and another thing is optimal HW utilisation which may take a bit more time.
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
I really doubt that there is a 'secret source' that makes the performance increase exponentially in RT performance, probably what we have already seen is what it is, after all if RDNA4 has significant improvements should be reflected in all the games already released, neither Intel nor Nvidia have had to adapt anything in the games already released to perform better than AMD in RT so why should it be any different?
It is not secret nor sauce, they explained what it is and that beyond basic binary compatibility and having more CUs you need to opt-in to these new features and change your code (you have seen plenty of partnership nVIDIA does with devs to get them to run things “better” no? Nothing new).

XSX and PS5 were considerably behind nVIDIA, Apple/PowerVR, and Intel in their RT implementation and what was done in HW and what was done in CS. Now PS5 Pro has Sony and AMD collaborate on a new RT core (extending what they had before), but to take full advantage of it you need to spend more time in adapting your engine and shaders.
 

winjer

Gold Member
Yes, but that applies to everything in AI today - including NVidia GPUs.
Eg- for a 4090, bandwidth situation is even worse - as it does a bit more more than double the TOPs (660) and has less than double the bandwidth (about 1TB/s) of a PS5Pro.
Ultimately, the memory bandwidth is over 3 orders of magnitude removed from compute throughput. So for every memory access, you need to perform at least 1000 operations, otherwise you're wasting compute waiting on memory.

This is also why slower caches (like infinity) are not even making a dent in this problem - with 2TB/s - we're still in the 3 orders of magnitude too slow range.

There are a few things to consider. For one, the Pro is using something similar to WMMA. So these are instructions integrated into the Shader pipeline. While the Tensor units in Ada, are dedicated units.
Another thing to consider is that vram bandwidth in Ada is much efficient, than in RDNA2. So in practical terms, a 4090 will have much more memory bandwidth than the Pro.
The only thing that RDNA2 still does better than Ada is L2 caches. That helps, but it's not a magic bullet. And to make things worse, the Pro doesn't have any L3.

https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F035631c1-4f63-4d46-8a5a-28bdcfa95af9_1169x549.png
 
THIS!!!
That is a huge reason for the PS5 Pro existence, assuming PS6 comes out holiday 2028. That gives them 4 years of experience of ML upscaling, instead of taking a first Crack at it at PS6 Launch. Cerny literally said that.
I like the honesty here. It's logical but they are essentially saying that people who buy a PS5 Pro are beta testers for the PS6 which can be taken either way.
 

Fafalada

Fafracer forever
For one, the Pro is using something similar to WMMA. So these are instructions integrated into the Shader pipeline. While the Tensor units in Ada, are dedicated units
Taken at face value (ie. suggesting that the units operate completely independently from shader compute) that would only make bandwidth situation worse, not better.
That said - my understanding is that execution resources are still shared anyway even in Ada - so we're not looking at parallel execution of both types of compute. But that just brings us back to the same 1000:1 ratio we started with.

Another thing to consider is that vram bandwidth in Ada is much efficient, than in RDNA2. So in practical terms, a 4090 will have much more memory bandwidth than the Pro.
This I explained in the post above (and Mark did in his talk even more) - existing caches are still several orders of magnitude too slow - even if you get 100% utilisation. Eg. Ada best case nets you 5TB/s with a 4090 - which sure - it's 5x improvement over its raw memory access, but that's till over 200x slower from where we want to be.
Or to be specific - Mark's example of 3% utilisation would become 15% - but that still leaves 85% of performance on the table. The whole point why they went to registers as cache is that nothing else in the system runs at the speeds needed, not even L0.

The one positive with all of the above is that for a workload like upscaling - managing memory in/out is relatively trivial, as tiling can predictably access each block of memory in succession and give us that 1000x speedup on a per tile basis, as long as model doesn't need to jump around memory beyond that. But it shows the need for more fast on-chip memory.
It's analogue of the usecases Cell was originally created for (or Larabee/EE VUs) - ie. stream processors that have register-speed memory coupled to them - main change is that the bandwidth gap is wider than ever now.
 

winjer

Gold Member
Taken at face value (ie. suggesting that the units operate completely independently from shader compute) that would only make bandwidth situation worse, not better.
That said - my understanding is that execution resources are still shared anyway even in Ada - so we're not looking at parallel execution of both types of compute. But that just brings us back to the same 1000:1 ratio we started with.


This I explained in the post above (and Mark did in his talk even more) - existing caches are still several orders of magnitude too slow - even if you get 100% utilisation. Eg. Ada best case nets you 5TB/s with a 4090 - which sure - it's 5x improvement over its raw memory access, but that's till over 200x slower from where we want to be.
Or to be specific - Mark's example of 3% utilisation would become 15% - but that still leaves 85% of performance on the table. The whole point why they went to registers as cache is that nothing else in the system runs at the speeds needed, not even L0.

The one positive with all of the above is that for a workload like upscaling - managing memory in/out is relatively trivial, as tiling can predictably access each block of memory in succession and give us that 1000x speedup on a per tile basis, as long as model doesn't need to jump around memory beyond that. But it shows the need for more fast on-chip memory.
It's analogue of the usecases Cell was originally created for (or Larabee/EE VUs) - ie. stream processors that have register-speed memory coupled to them - main change is that the bandwidth gap is wider than ever now.

A 4090, even with a heavy use case of ChatGPT3 running on it, has a Tensor usage rate of 50-60%. It's mostly bandwidth starved, but it's not as bad as the pro.
Having dedicated Tensor units means, these units have their own L1 and registers. And don't have to contend with shader units.
 

Ashamam

Member
I like the honesty here. It's logical but they are essentially saying that people who buy a PS5 Pro are beta testers for the PS6 which can be taken either way.
Not really. It is after all a 1.0 release for the hardware and PSSR etc. But its conceptually in line with whats planned for PS6. Its not Sony's fault that people despite all the information available have unrealistic expectations as to what the Pro can do. It has the capability to do everything they said it could. We are in a bit of a strange place right now because almost everything we have seen is a retrofit which is muddying the water so I'm not entirely unsympathetic to the negative narrative, but to believe it you either need tunnel vision or need to read wider, ie not just warrior comments.

Basically it can simultaneously be a release product, act internally as a test platform of sorts and a dev facing teaching tool. These roles aren't mutually exclusive.
 

Fafalada

Fafracer forever
A 4090, even with a heavy use case of ChatGPT3 running on it, has a Tensor usage rate of 50-60%. It's mostly bandwidth starved, but it's not as bad as the pro.
Ok we're done here.
sh8Fp3y.gif

Once you reach the destination (if there's one) we can pickup a discussion.

Having dedicated Tensor units means, these units have their own L1 and registers. And don't have to contend with shader units.
Yes - so it's useful for async-execution (basically to increase occupancy), but we're talking single digit % efficiencies here - again, operating several orders of magnitude away from the problem described.
It might become interesting if we move past specialised workloads like upscaling into more general usage - but that's not what PS5 Pro was built for anyway.
 

PaintTinJr

Member


@Kaiserstark summary

  • Introduction to PS5 Pro
    • Mark Cerny explains the PS5 Pro's focus on improving GPU performance and addressing mid-generation technological advances.
    • Unlike generational leaps (e.g., PS3 → PS4), the Pro model optimizes existing hardware for better game performance without exclusive games.
  • Key Goals for PS5 Pro
    • Minimize workload for game developers while achieving significant improvements and make games like Dustborn and Concord quicker to complete.
    • Focus on enhancing GPU capabilities for noticeable gameplay improvements.
  • The "Big Three" Improvements
    • Larger GPU with 67% more workgroup processors for faster rendering.
    • Advanced ray tracing features using future AMD RDNA technologies.
    • AI-driven upscaling via PlayStation Spectral Super Resolution (PSSR).
  • Hybrid GPU Architecture
    • Combines AMD RDNA 2 and RDNA 3 technologies for easier developer adoption.
    • Focused improvements include faster vertex processing and new ray-tracing structures.
  • Improved Memory System
    • Increased memory bandwidth (28% higher than PS5).
    • Extra memory added using DDR5 to support ray tracing, upscaling, and higher resolutions, including potential 8K.
  • Ray Tracing Enhancements
    • Doubling ray intersection speeds using a new BV8 acceleration structure.
    • Improved performance consistency with hardware-based stack management to reduce divergence issues.
  • Machine Learning Integration
    • PSSR uses a custom lightweight convolutional neural network (CNN) to enhance image quality and resolution.
    • Focus on efficient memory use to process frames quickly, enabling 4K upscaling with minimal system bottlenecks.
  • Developer-Friendly Design
    • PSSR supports variable upscaling ratios and integrates seamlessly with existing game engines.
    • Maintains compatibility with PS5 for ease of game development.
  • Future of Machine Learning in Gaming
    • SIE aims to develop generalized machine learning architectures for broader applications in gaming.
    • Goals include fully fused networks, improved ray tracing, and richer game graphics.
  • PS5 Pro's Legacy and Vision
    • The advancements in PS5 Pro lay the groundwork for future technologies, including enhanced machine learning and graphics processing innovations.
    • Collaboration with AMD and internal R&D aims to revolutionize game design and player experiences.





I suspect I'm misremembering what I said about PSSR and the Pro setup for ML acceleration but ....watching it last night and him kicking off with saying about holes in image being filled, the PSSR being slightly re-entrant, 10k ops per pixel and the enhanced WGP being the source of the capability, after he confirmed 1ms of run time, so 300TOPs at 120fps with 1ms of 8.3ms giving at 12% utilisation or 36TOPs, consistent with DLSS utilisation numbers, all feels like a bit of a 'bingo!', but am sure someone will remind me it wasn't. :)


It was interesting that the Double RDNA3 flops' 'flopflation' was directly discussed, and how it doesn't apply because game code between Pro and PS5 remains the same - mostly - meaning it couldn't be factored in, even if it was or wasn't able to be harnessed, but then after dispelling the 33TF figure, the TOPS for 8bit was 300TOPS, but the 16bit integer TOPs was 66TOPS which is a 16 x 2(RPM) x 2(Flopflation) calculation, showing for AI it is completely exploitable for the 10K ops per pixel which feels like another little 'bingo'


Anyway, all great technical stuff, that probably just confirms what I thought about Pro missing the marking in the UK, and this chat release being to re-energise hype for an £800 console with a fit your own disc drive, not covered on a unified warranty, with games that are failing to justify the cost of the excellent AI/RT tech inside.

I still think for a 720P PS5 image with the PSSR taking x5 (5ms) a superior 1440p isn't out of the equation
 
Last edited:

Fafalada

Fafracer forever
the TOPS for 8bit was 300TOPS, but the 16bit integer TOPs was 66TOPS which is a 16 x 2(RPM) x 2(Flopflation) calculation
Mark's explanation is pretty explicit here. The 8bit and 16bit extensions are AI specific instructions they added (so not flopflation) - note he makes a specific point they could have gone higher for 16bit but didn't see the practical use for it (or 32bit).

kicking off with saying about holes in image being filled, the PSSR being slightly re-entrant
That was one of those 'blink and you miss it' moments - but the distinction he makes is important. The difference from simply 'rendering lower resolution' is that screenspace math differs (especially the interpolants). Ie. if you remember some games that added DLSS and ended up with comically low-resolution textures (until patches fixed it) - that's an example of the difference between lowres+upscale to 'sparse sampling+hole reconstruction'.
 

twilo99

Member
- 30 WGPs
- architecture is "between RDNA2 and RDNA3", called RDNA2.x
- e.g. geometry pipeline is RDNA3 based
- BVH8 acceleration structure for RT
- no double-FP like RDNA3
- 16.7 TFlops

What is the reasoning behind not implementing full RDNA3 dual issue?

We made a mid ass unbalanced upgrade for 920 euros.
But hey, ps6 is probably gonna be THE shit, thanks for betatesting our upscaling solution.

Please and thank you.


(People can add to that if i missed something)


Selling overpriced hardware as a testbed for future developments is solid decision from business stand point.
 

PaintTinJr

Member
Mark's explanation is pretty explicit here. The 8bit and 16bit extensions are AI specific instructions they added (so not flopflation) - note he makes a specific point they could have gone higher for 16bit but didn't see the practical use for it (or 32bit).
I was only pre-empting with his own term, while demonstrating with the multipliers that it was derived from the RDNA3 dual issue.

Good catch about him say they could have gone higher, I didn't absorb that point when I watched, probably because I was distracted by the 66TOPs numbers.
That was one of those 'blink and you miss it' moments - but the distinction he makes is important. The difference from simply 'rendering lower resolution' is that screenspace math differs (especially the interpolants). Ie. if you remember some games that added DLSS and ended up with comically low-resolution textures (until patches fixed it) - that's an example of the difference between lowres+upscale to 'sparse sampling+hole reconstruction'.
Yeah, definitely such a fleeting description he was focusing on the 1% with that comment and the 99% with what followed. You and I briefly discussed this in a PS2/Dreamcast thread IIRC as I hadn't realised the importance of AI/ML still needing quality base textures.

I also think someone might have disagreed with me about the maths being different with sparse sampling at native, versus lower native, but it is just excellent that he talked about the solution in such intricate detail, including mentioning PSSR was partially re-entrant, which I originally hypothesised if they could infer errors from predicting non-holes from the inferred data, effectively allowing them to recursively improve the inference of the holes. So I don't know if that does happen, or if the CNNs are gaining bias in other ways from the data they process. i would have assumed the former as it would be the same for every user no matter the games they played.
 

Loxus

Member
But did the speed per pin increase? Last time I saw hbm3e was still slower per pin than GDDR6. And I know Gddr7 is 30% faster and 6X.

If anything I see PS6 going with GDDR7X and the 40-48 gbps per pin
One single stack of HBM4 can give 1.5TB of bandwidth.
9EFyWVW.jpeg


HBM is probably never going to happen anyway, but it's still my dream.
 

truth411

Member
I like the honesty here. It's logical but they are essentially saying that people who buy a PS5 Pro are beta testers for the PS6 which can be taken either way.
But there is no downside though. The PS5 Pro is still the most powerful console plus ML upscaling. Thus the best console to play Multiplats and exclusive games to boot.
 

GymWolf

Member
- 30 WGPs
- architecture is "between RDNA2 and RDNA3", called RDNA2.x
- e.g. geometry pipeline is RDNA3 based
- BVH8 acceleration structure for RT
- no double-FP like RDNA3
- 16.7 TFlops

What is the reasoning behind not implementing full RDNA3 dual issue?




Selling overpriced hardware as a testbed for future developments is solid decision from business stand point.
I do admire their asshole, i mean their hassle.
 
Last edited:

DeepEnigma

Gold Member
What is the reasoning behind not implementing full RDNA3 dual issue?
He explained it. This is not a new gen device. This is doing far more advanced things with ML and their RT customizations than RDNA3 is.

For the GPU they didn't adopt full RDNA3 feature set because it would force developers to have different compilers, patches and executables. They don't want that burden for developers for a mid gen upgrade.
 
- 30 WGPs
- architecture is "between RDNA2 and RDNA3", called RDNA2.x
- e.g. geometry pipeline is RDNA3 based
- BVH8 acceleration structure for RT
- no double-FP like RDNA3
- 16.7 TFlops

What is the reasoning behind not implementing full RDNA3 dual issue?




Selling overpriced hardware as a testbed for future developments is solid decision from business stand point.
It's not overpriced, it's got a custom RX 6800 GPU more memory, a 2 TB SSD, it's a mid gen refresh, the reasoning for not including an RDNA 3 GPU was simply because it would need a separate game to be compiled for the Pro rather than just building once for a PS5/Pro.
 

Interfectum

Member
For a 700 “pro” console with all the technical bullshit that cerny spouts, higher frames should be the fucking minimum.
Not even an Nvidia 5090 can control shit decisions by shitty developers. Just as Cerny said in the video, it requires a rethinking of how developers approach this. Some are excelling right away, some are just tossing in a shit ton of raytracing and hoping for the best (see Alan Wake 2).
 

yogaflame

Member
RT for me is still overrated for this generation, consume allot of resources and its not yet perfect even with expensive video cards this generation. RT focus must be reserved for ps6 era which will surely have better and more powerful hardware and matured AI upscaling. This generation should focus more on targeting 4k/60 fps.
 
Last edited:

diffusionx

Gold Member
PS5 Pro main expectation was to get borderline unplayable games (image quality/frame rate) over the line. Fidelity/frame rate improvements to already technically excellent games (eg. Sony 1st party) are more on the enthusiast side. With the mixed results so far on the former.
no, PS5 Pro main expectation was to put out 30fps quality mode type of graphics at 60fps. The first party games generally have done this. The third party ones have been hit and miss.

Again, the Cerny 8 minute intro video. He spells it out. At no point does he say to spend money on this machine so games running at 52fps can go to 60fps.
 

SweetTooth

Gold Member
- 30 WGPs
- architecture is "between RDNA2 and RDNA3", called RDNA2.x
- e.g. geometry pipeline is RDNA3 based
- BVH8 acceleration structure for RT
- no double-FP like RDNA3
- 16.7 TFlops

What is the reasoning behind not implementing full RDNA3 dual issue?




Selling overpriced hardware as a testbed for future developments is solid decision from business stand point.

You didn't bother to watch the video?

Also I like how PS5 was RDNA1.x talk disappeared 🤣
 

twilo99

Member
You didn't bother to watch the video?

No, that's why I asked.

He explained it. This is not a new gen device. This is doing far more advanced things with ML and their RT customizations than RDNA3 is.

Surely someone already answered this, but why is this not a problem on Windows?

It's not overpriced, it's got a custom RX 6800 GPU more memory, a 2 TB SSD, it's a mid gen refresh, the reasoning for not including an RDNA 3 GPU was simply because it would need a separate game to be compiled for the Pro rather than just building once for a PS5/Pro.

It's very much an overpriced product, but then again, that's really subjective...

You see a 2 year old GPU architecture and a 5 year old CPU as a "refresh" worth $800 and all I see is overpriced relics wrapped in a plastic box that shouldn't cost more than $600.

Why is there a need to compile a separate game required? There is no way this happens on PC the devs will go insane.
 
Top Bottom