AMD UDNA will be interesting. CDNA3 architecture is still based on GCN 4 cycle wave64 scheduling. RDNA schedules every cycle and exposes instruction latency. Scheduler runs/blocks instructions concurrently/dynamically. RDNA is much closer to Nvidia GPUs.
CDNA has wide matrix cores and other wide compute workload improvements, which AMD wants to bring to UDNA. It also has multi-chip scaling.Rumors tell that RDNA4 will finally have matrix cores in consumer space. Seems that AMD is integrating matrix cores early to RDNA lineup.
My expectation is that UDNA compute unit will be RDNA4 descendant instead of CDNA3 descendant. They definitely need 1 cycle low latency scheduling in consumer space, and Nvidia does well with it in AI space too. I don't see them going back to GCN-style design for UDNA.
Integrating matrix cores to RDNA4 compute unit would be an iterative step towards UDNA. They could iterate that compute unit further to meet AI workload.I would expect UDNA to borrow lot of CDNA3 tech for caches, memory controllers and connectivity with chiplets.
There's rumors that RDNA4 will be a small iterative improvement as they are ditching RDNA arch one year later. But i'd say it's entirely possible that RDNA4 is a an iterative step towards UDNA. First time they merge new RDNA ALU pipes with CDNA matrix units. Could be a big step.
It's important to consider the importance of AI for AMD. RDNA architecture was initially designed for AMDs most important market segment (gaming), while UDNA got the old GCN design. Now AI is most important segment for them. They want newest arch for professional AI chips too.
One might argue that simple GCN-style scheduling is still good for AI workloads, but lower latency scheduling can operate better on register pressure. And it's often better for caches to avoid running too wide workload on every compute unit. RDNA/Nvidia style arch is just better.
UDNA makes perfect sense. AMD today wants matrix units for AI based upscaling (and other client AI workloads) in consumer space. And they want their latest compute unit architecture for AI chips. AI is now a priority. GCN-style arch is outdated. Nvidia/RDNA-style is better.