• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

PS5 Pro Specs Leak are Real, Releasing Holiday 2024(Insider Gaming)

Mr.Phoenix

Member
For both inference in addition to training?

And if so is that because that V_DUAL_DOT2ACC_F32_BF16 capability has been brought into WMMA on the RDNA4 ISA?

From reading I assumed in RDNA3 those two dual issue 16bit (RPM) capable instructions couldn't be used with WMMA from what the article stated meaning that inference on RDNA3 would still get better performance from V_DUAL_DOT2ACC_F32_BF16 by being dual issue and RPM than just RPM via WMMA instructions, or did I misunderstand or is the article incorrect about dual issue with RPM being restricted to those two V_DUAL instructions?

For comparison the PS5 GoW Ragnarok ML AI inference upscaler on page 48-49 of pdf does an performance optimisation that sounds very similar to a manual implementation of the dot2acc.
Well... if you really wanna go down this rabbit hole.... read this too.

BF16 is one of the supported data types of RDNA3, and that fall right into their WMMA feature set. Int and 4 are also supported.
 

Mr.Phoenix

Member
The 16-bit 67 TF from the leak don't match either, which would of been 2.18GHz.

But it's like you said, likely to protect sources.
It could also be that that white paper is old. If the PS5pro was scheduled to launch last year, it would (like the slim) no doubt have been on 6nm. Going from 6nm to 5nm or even 4nm can make a world of difference.

Secondly, I have never taken the leak as gospel, more as something just pointing us in the general direction. Eg... with the OG PS5 leaks, not a single one gave an accurate clock of 2.2Ghz. Instead, we just got 2Ghz. I don't know why we expect everything said in this leak to be 100% accurate.,
 
I tried putting a PS5 "pro" PC build together and no matter what I do I can't get it under ~$900 so if they can come at around $600 would be quite good.

Part List - AMD Ryzen 7 5700X, Radeon RX 7700 XT, Montech AIR 903 BASE ATX Mid Tower - PCPartPicker

Now, of course the PC is a much better proposition overall, but you always have to pay a bit more.

7700 XT is not a good proxy. Based on leaks, Pro should have much better RTRT performance. If PSSR matches XeSS, then it will blow FSR3 out of the water as well. You're more like looking at ~ 7900 GRE.
 

James Sawyer Ford

Gold Member
It could also be that that white paper is old. If the PS5pro was scheduled to launch last year, it would (like the slim) no doubt have been on 6nm. Going from 6nm to 5nm or even 4nm can make a world of difference.

Secondly, I have never taken the leak as gospel, more as something just pointing us in the general direction. Eg... with the OG PS5 leaks, not a single one gave an accurate clock of 2.2Ghz. Instead, we just got 2Ghz. I don't know why we expect everything said in this leak to be 100% accurate.,

So are you thinking we could get a spec increase if it's on a smaller node?

Most people thought PS5 was going to be like 9TF or whatever based on the earlier leaks
 

nnytk

Member
I tried putting a PS5 "pro" PC build together and no matter what I do I can't get it under ~$900 so if they can come at around $600 would be quite good.

Now, of course the PC is a much better proposition overall, but you always have to pay a bit more.

I beg to differ. I've been playing on PC since 1999 and to me there's been a steady decline in the PC gaming experience - for many various reasons you probably know or understand.

Consoles are far from perfect but at least they are fluid and deliver the goods without too much hassle. Aka, gaming.
 
So are you thinking we could get a spec increase if it's on a smaller node?

Most people thought PS5 was going to be like 9TF or whatever based on the earlier leaks

That "early days" information was probably true until the XSX's specs got leaked, and Sony decided to bump the PS5's specs to the 10.28TF. But TF isn't a good metric anymore, and certainly not for the PS5 Pro.
 

PaintTinJr

Member
Well... if you really wanna go down this rabbit hole.... read this too.

BF16 is one of the supported data types of RDNA3, and that fall right into their WMMA feature set. Int and 4 are also supported.
I take it from my previous responses you agree that the dual issue with rapid pack maths (RPM) FP16 is actually possible on RDNA3 for a useful PSSR inference implementation on the Pro to get the peak performance of 67TF/s (assuming custom RDNA3), yes?

I had looked at that RDNA3 WMMA article, but it doesn't mention dual issue from what I gleamed, and your previous linked article says dual issue and RPM at the same time is only possible with those two instructions, so unless dual issue gets added in custom RDNA3 of the Pro for WMMA and RDNA 4 a DLSS equivalent (inference) on AMD GPUs on RDNA3 and 4 would have least performance cost use by either of the V_DUAL_DOT2ACC instructions IMO because it does dual issue and RPM at the same time. Do you agree with that assessment, or have I missed something?
 
Last edited:

PaintTinJr

Member
That "early days" information was probably true until the XSX's specs got leaked, and Sony decided to bump the PS5's specs to the 10.28TF. But TF isn't a good metric anymore, and certainly not for the PS5 Pro.
Based on how power efficient PS5 is with the subsystems it has and the boost clock it has, and the time Sony spent developing its liquid metal thermal interface material - effectively planned deliding of the AMD APUs - and how there's no value in them leaving more power headroom in the under 250watt target range the product sits in, the idea that they reacted to the XsX specs doesn't make sense IMO.
The liquid metal took them 4-5years of testing IIRC and the deterministic variable clock was born out of the problem that Cerny explained of how they had to design the PS4 Pro and didn't get it it quite right in terms of estimating power draw at full utilisation.
 
Last edited:

Loxus

Member
It could also be that that white paper is old. If the PS5pro was scheduled to launch last year, it would (like the slim) no doubt have been on 6nm. Going from 6nm to 5nm or even 4nm can make a world of difference.

Secondly, I have never taken the leak as gospel, more as something just pointing us in the general direction. Eg... with the OG PS5 leaks, not a single one gave an accurate clock of 2.2Ghz. Instead, we just got 2Ghz. I don't know why we expect everything said in this leak to be 100% accurate.,
This leak is more in line with the PS4 Pro.

With the PS5, if I remember correctly, the 2GHz came from an AMD intern that mistakenly put a lot on info on Github.

MLiD and RGT got nearly anything right, which is why I'm still kinda scratching my head that MLiD actually got PS5 Pro documents.
 
This leak is more in line with the PS4 Pro.

With the PS5, if I remember correctly, the 2GHz came from an AMD intern that mistakenly put a lot on info on Github.

MLiD and RGT got nearly anything right, which is why I'm still kinda scratching my head that MLiD actually got PS5 Pro documents.

Those documents were on the developer portal and many people had access to it - all it took was for a developer who wanted some quick cash to leak some confidential info - MLiD would in return get the leak, and more importantly credibility even though his track record hasn't been great. As for RGT, he wasn't even the game, dude got like 10% of the information right.
 

Radical_3d

Member
At native res sure but with PSSR? I think its possible.
So, there are people in GAF that have no idea how games work.
The Wire Reaction GIF
 
I'm curious to see how the consoles will navigate the feature set for a mid-gen refresh. I know Sony will go for a strong focus on ray-tracing, and likely hardware accelerated upsampling. Similar with what they did with PS4 Pro but fortunately the next-gen upscaling techniques like DLSS 2/3 AND FSR 2/3 are significantly better than their predecessors so I expect some pretty cool things in that regards.

Hey everyone - I'm a "lEakeR"
 
So, there are people in GAF that have no idea how games work.
The Wire Reaction GIF

Yes, you clearly, no game since Crysis has been CPU limited to the extent you seem to think that modern games are, yes you might get 5/10/15% difference in the 1% lows but it's not going to be the difference between a game hitting 60fps and 30fps with a modern GPU and an 8 core 16 thread CPU.
 
Last edited:

Loxus

Member
I take it from my previous responses you agree that the dual issue with rapid pack maths (RPM) FP16 is actually possible on RDNA3 for a useful PSSR inference implementation on the Pro to get the peak performance of 67TF/s (assuming custom RDNA3), yes?

I had looked at that RDNA3 WMMA article, but it doesn't mention dual issue from what I gleamed, and your previous linked article says dual issue and RPM at the same time is only possible with those two instructions, so unless dual issue gets added in custom RDNA3 of the Pro for WMMA and RDNA 4 a DLSS equivalent (inference) on AMD GPUs on RDNA3 and 4 would have least performance cost use by either of the V_DUAL_DOT2ACC instructions IMO because it does dual issue and RPM at the same time. Do you agree with that assessment, or have I missed something?
I could be wrong, but it might be a case where the AI Accelerators need to utilize the SIMD32 or vice versa to work.

Which is where Dual-Issue comes in.
Vector Units has two SIMD32 now compared to the one in RDNA1&2.

SIMD32 (Float/ INT / Matrix)
SIMD32 (Float/ Matrix)

I figure that's the meaning of Dual-Issue.
One SIMD32 running normal Float operations and the other, Matrix operations at the same time.

When one or both SIMD32 is doing Matrix operations, it utilizes the AI Accelerators for higher throughput.

You can look at the RDNA3 CU and Vector Unit diagrams for a better understanding.
 

PaintTinJr

Member
I could be wrong, but it might be a case where the AI Accelerators need to utilize the SIMD32 or vice versa to work.

Which is where Dual-Issue comes in.
Vector Units has two SIMD32 now compared to the one in RDNA1&2.

SIMD32 (Float/ INT / Matrix)
SIMD32 (Float/ Matrix)

I figure that's the meaning of Dual-Issue.
One SIMD32 running normal Float operations and the other, Matrix operations at the same time.

When one or both SIMD32 is doing Matrix operations, it utilizes the AI Accelerators for higher throughput.

You can look at the RDNA3 CU and Vector Unit diagrams for a better understanding.
In the first article Mr.Phoenix Mr.Phoenix linked


The text associated with diagram 2/5 of the way down the page does the formal comparison between RDNA2 and RDNA3 about why RPM and Dual issue isn't straight forward even on RDNA3 and gives insight into the limitations of using Dual issue on RDNA3 even without the benefits of RPM.

rdna3_vopd_issue.png


"A RDNA 3 VOPD instruction is encoded in eight bytes, and supports two sources and one destination for each of the two operations. That excludes operations that require three inputs, like the generic fused multiply add operation. Dual issue opportunities are further limited by available execution units, data dependencies, and register file bandwidth."
 

Radical_3d

Member
So if I put a Ryzen 7 3700x in my rig a plague tale will run at 30fps on my 4090? what about Cyberpunk? you're wrong, Crysis was a unique title in that it couldn't utilise multiple threads and cores on a CPU.
No. But if you put a Ryzen 3600 (which is better than what the PS5 mounts) yes. So I’m not quite wrong. Timestamped for your viewing pleasure:

 
No. But if you put a Ryzen 3600 (which is better than what the PS5 mounts) yes. So I’m not quite wrong. Timestamped for your viewing pleasure:



The PS5 CPU desktop equivalent is the Ryzen 7 3700x, a rising tide lifts all boats, a weaker CPU will give you lower 1% lows but even when you're CPU limited, it's never the case that you're ALWAYS CPU limited in modern games, the overall FPS will still be reliant on the GPU 99% of the time overall it's not going to be a disaster to have the PS5 CPU and a better GPU in the PS5 Pro, you will get hugely better performance.

Also, Alex is a moron, you have to be underutilised in the GPU in a console, do you know how much power GPU's use? it can't fit into the TDP of a console if you are expecting 100% GPU utilisation in a GPU roughly equivalent to a 2080 super.

PS5 Pro will be power limited before it's CPU limited.
 
Last edited:

winjer

Gold Member
No. But if you put a Ryzen 3600 (which is better than what the PS5 mounts) yes. So I’m not quite wrong. Timestamped for your viewing pleasure:



I played that game with a 2070S and a 3700X, and I had significantly better performance on that area. At over 70 fps.
DF's machine, has a memory latency of 90ns. Which is insanely high for a Zen2 CPU. And it really hurst performance.
 

Loxus

Member
In the first article Mr.Phoenix Mr.Phoenix linked


The text associated with diagram 2/5 of the way down the page does the formal comparison between RDNA2 and RDNA3 about why RPM and Dual issue isn't straight forward even on RDNA3 and gives insight into the limitations of using Dual issue on RDNA3 even without the benefits of RPM.

rdna3_vopd_issue.png


"A RDNA 3 VOPD instruction is encoded in eight bytes, and supports two sources and one destination for each of the two operations. That excludes operations that require three inputs, like the generic fused multiply add operation. Dual issue opportunities are further limited by available execution units, data dependencies, and register file bandwidth."
This is from the same article.

Now, lets talk about that 123TFLOP FP16 number that AMD claims. While this is technically correct, there are significant limitations on this number. Looking at the RDNA3 ISA documentation, there is only one VOPD instruction that can dual issue packed FP16 instructions along with another that can work with packed BF16 numbers.

image-1.png
These are the 2 VOPD instructions that can used packed math.

This means that the headline 123TF FP16 number will only be seen in very limited scenarios, mainly in AI and ML workloads although gaming has started to use FP16 more often.



In the article, it states Dual-Issue can work with FP16 & BF16 instructions.
Below we can see WMMA utilizing FP16, BF16, & Int8 instructions.
4mE7aBR.jpeg


From my understanding, it seems the SIMD32 utilize the AI Accelerators for higher throughput when doing Matrix operations but the AI Accelerators aren't dedicated in the same way as CDNA2 Matrix Cores. Maybe this changes in RDNA4.

You can read more on RDNA4 here.
Examining AMD’s RDNA 4 Changes in LLVM
 
Last edited:

welshrat

Member
I played that game with a 2070S and a 3700X, and I had significantly better performance on that area. At over 70 fps.
DF's machine, has a memory latency of 90ns. Which is insanely high for a Zen2 CPU. And it really hurst performance.
I could be wrong here but isn't the latency higher on PS5 due to it using gdr memory. Conversely though this gives higher bandwidth?
 

winjer

Gold Member
I could be wrong here but isn't the latency higher on PS5 due to it using gdr memory. Conversely though this gives higher bandwidth?

Yes, it's at 140ns. And it's not exactly because it's using GDDR6. It's because the memory controller is tweaked for high bandwidth, not latency.
To make things worse, the Zen2 CPU on consoles, only have 4+4MB of L3. While the desktop version has 16+16MB.
So cache misses are more frequent on consoles, causing more memory accesses to that slow memory.

Though consoles have other advantages that help to claw back some performance.
 

welshrat

Member
Yes, it's at 140ns. And it's not exactly because it's using GDDR6. It's because the memory controller is tweaked for high bandwidth, not latency.
To make things worse, the Zen2 CPU on consoles, only have 4+4MB of L3. While the desktop version has 16+16MB.
So cache misses are more frequent on consoles, causing more memory accesses to that slow memory.

Though consoles have other advantages that help to claw back some performance.
Would be nice if there was a cache increase on PS5 Pro as well, as I assume this would give some performance increase without any compatibility issues ?
 

kikonawa

Member
I paid 1149 guilders back in November 2000. If you take inflation into account, that PS2 cost 900 euro in today's money. 😭

BTW for reference:

PS3 - 600 euro in 2007 - 891 euro in 2024
PS4 - 400 euro in 2013 - 525 euro in 2024
PS5 - 500 euro in 2020 - 593 euro in 2024
I know right. But hey we coupd play dvds
 

Mr.Phoenix

Member
So are you thinking we could get a spec increase if it's on a smaller node?

Most people thought PS5 was going to be like 9TF or whatever based on the earlier leaks
Most definitely... well, one of two things. Its either they increase clocks of the APU, going from 6nm dn to 4nm will definitely give them the headroom while keeping the cooling solution they originally designed for the Pro at 6nm, the same. Or, they keep the clocks the same and drop the size of the cooler. These console peeps tend to always go with the latter. So we can only hope.

Then again, we do not know for certain what the actual specs are, so we wont even know if it was increased or kept the same.

That 2.35Ghz clock though, is just napkin math we are all doing based on leaked TF + CU numbers.
I take it from my previous responses you agree that the dual issue with rapid pack maths (RPM) FP16 is actually possible on RDNA3 for a useful PSSR inference implementation on the Pro to get the peak performance of 67TF/s (assuming custom RDNA3), yes?

I had looked at that RDNA3 WMMA article, but it doesn't mention dual issue from what I gleamed, and your previous linked article says dual issue and RPM at the same time is only possible with those two instructions, so unless dual issue gets added in custom RDNA3 of the Pro for WMMA and RDNA 4 a DLSS equivalent (inference) on AMD GPUs on RDNA3 and 4 would have least performance cost use by either of the V_DUAL_DOT2ACC instructions IMO because it does dual issue and RPM at the same time. Do you agree with that assessment, or have I missed something?
I agree in theory, but have my reservations. From my understanding of the doc, RDNA3s implementation of this tech is just convoluted. While there are two pairs of 32 ALUs in each CU, both are not identical in what data type they can handle. Then there is the thing of dual-issue operations only seemingly limited to FP16, BF16, INT8...etc. Basically, as it stands now, you can do one FP32 option, simultaneously with as much as 2 FP16/BF16 /4*Int8 instructions on the only other 32ALUs in the CU that support all those data types.

In theory...

But in reality, you will be bottlenecked by the VGPR and data banks.

So yes, you should be able to use it for inference, but they have to do something about VGPR and data banks. Maybe that has been improved some in RDNA4/PS5pro. I have to believe it has been. And don't forget, RDNA3 technically does have dedicated AI accelerators that have nothing to do with the ALU cores.
 

IDWhite

Member
Yes, it's at 140ns. And it's not exactly because it's using GDDR6. It's because the memory controller is tweaked for high bandwidth, not latency.
To make things worse, the Zen2 CPU on consoles, only have 4+4MB of L3. While the desktop version has 16+16MB.
So cache misses are more frequent on consoles, causing more memory accesses to that slow memory.

Though consoles have other advantages that help to claw back some performance.

Cache misses are not only dependant of hardware performance and configuration. Game code and compiler are also a part of the equation to reach a good cache hit ratio.

On consoles even in the worst case you can compensate latency with bandwidth and highly optimised code, and in the case of Ps5 you have now dedicated hardware to manage coherency and a good amount of SRAM integrated on the SoC.

Of course more cache would have been ideal, but not so much in the CPU but in the GPU. At the end CPU consoles don't need to run a huge S.O. with lots of background processes that have nothing to do with the game.
 

Bojji

Member
Keppler said PS5 Pro is like AMD 7700 with better RT performance.

Sounds good enough.

I was saying this from day one, yet people are going 4070... 4070S... 4080 next week?

7700 XT is not a good proxy. Based on leaks, Pro should have much better RTRT performance. If PSSR matches XeSS, then it will blow FSR3 out of the water as well. You're more like looking at ~ 7900 GRE.

It will be above 7700XT when RT comes into play but 95% of console games so far have no RT at all (other than software), so in majority of games it will be around that level.

PSSR also don't change performance, "only" image quality.
 

Saberus

Member
I was saying this from day one, yet people are going 4070... 4070S... 4080 next week?



It will be above 7700XT when RT comes into play but 95% of console games so far have no RT at all (other than software), so in majority of games it will be around that level.

PSSR also don't change performance, "only" image quality.
But using PSSR, if they decide to run the game at a lower resolution and then use PSSR to upscale it, wouldn't that extra headroom allow higher frame rates?
 
Top Bottom