What are the specifications of the APU installed in the Steam Deck?

Valve's portable gaming PC 'Steam Deck' is equipped with a custom APU ' AMD Custom APU 0405 ' dedicated to Steam Deck based on AMD's 'Zen 2' and 'RDNA 2'. Chips and Cheese, a gadget blog, explained about this APU.

Van Gogh, AMD's Steam Deck APU – Chips and Cheese


The launch of Zen 2 was a defining moment for AMD. AMD's single-threaded performance can finally compete head-on with Intel's top performance, showing AMD's ability to introduce up to 16 cores to desktop CPUs and provide consumers with very strong multi-threaded performance. rice field.

The Zen 2 is also flexible and works well with devices that demand lower power consumption. Even after the next-generation Zen 3 was released in the second half of 2020, Zen 2 continued to play an active role in many products, but one of the representative products using Zen 2 is Steam Deck. It looks like this when the back cover of Steam Deck is removed.

The APU, which combines the GPU architecture 'RDNA 2' manufactured on TSMC's 7nm process and the CPU architecture 'Zen 2', is commonly known by the codename 'Van Gogh'. Among this Van Gogh, the chip provided by AMD for Steam Deck is named 'AMD Custom APU 0405'.

The Steam Deck is powered by 16GB LPDDR5, which uses two Samsung chips with 8GB capacity each. The chip is arranged in four 32-bit channels and runs at 5500MT/s, so the theoretical bandwidth should be 88GB/s. The motherboard is called 'Valve Jupiter' and connects the APU to a x4 M.2 slot and provides a x1 PCIe link for the microSD card controller and Realtek 8822CE WiFI card.

Power to the APU is provided by three

VRM stages controlled by Monolithic Power Systems' MP2845 power module. According to Chips and Cheese, the VRM is probably split into a two-stage component and another one-stage component. As such, the VRM is pretty weak, but given the APU's 16W cap, it's not a big deal. That power is flexibly allocated to the CPU and GPU, for example, in a GPU-bound sequence, the GPU consumes more than 10W of power, and the CPU side consumes 2-3W of power at a lower clock than the base clock. will be For the CPU, its power consumption is reversed. Looking at the surroundings of the VRM installed in the Steam Deck, it looks like this.

Looking at the power consumption while playing the simulation game, you can see that a large amount of power is allocated to the GPU.

Such flexible power allocation works well when the game is either CPU or GPU bound. However, if you try to maximize computational throughput by using both CPU and GPU together, performance will suffer. In general, performance degradation can occur in things like renderers and photo processing apps, but Steam Deck rarely does such things, so unless it's a game that uses both CPU and GPU at the same time No problem

Van Gogh packs four Zen 2 cores in one cluster (CCX), achieving a boost clock of 3.5GHz and a base clock of 2.8GHz. Zen 2 for desktops and servers will feature 16MB of L3 cache per CCX, which can protect cores from slow memory and boost performance. Van Gogh's CCX, on the other hand, has only 4MB of L3 cache.

In our Chips and Cheese tests, the Steam Deck's L1 and L2 caches performed as expected for a Zen 2 CPU, but the L3 capacity was significantly lacking compared to other machines. It is confirmed that there is The results of the cache and memory performance tests are as follows, with Governor set to the default schedutil shown by the red line, performance set to the dotted line, and Windows 11 set to the green line.

The L3 issue seems to have been resolved in the bandwidth tests, with similar results using Windows and Linux, with over 200GB/s of L3 bandwidth observed under full thread load. Chips and Cheese evaluates that the bandwidth is likely to be fine, although it is slightly lower than other Zen 2 due to the difference in clock speed. However, LPDDR5 had a disappointing result. Due to various factors, it is difficult to fully draw out the theoretical bandwidth in any DRAM configuration, but '25 GB / s' indicated by the green line is quite late.

Comparing with AMD's other APU 'Renoir' equipped with Zen 2, the difference is noticeable, and the DDR4-3200 setup (orange line) has an overwhelming difference to Van Gogh's setup (green line) I know you're wearing it.

An LPDDR5 setup would provide comparable bandwidth to a late 2015 DDR4 setup, but would also impose unnecessary memory latency on the CPU. Chips and Cheese said, 'This isn't a huge step up from a good DDR3 setup. Plus, the CPU's smaller L3 cache means the core isn't isolated from memory compared to desktop and server Zen 2 implementations. exacerbates this problem,” he said.

Looking at the memory bandwidth usage when playing Cyberpunk 2077 reveals more details. Looking at the operation when ray tracing is turned off and the frame rate is around 100 FPS, the memory bandwidth required increases as the elapsed time (Elapsed Time) passes, and it soon reaches 25 GB / s or more. You can see that you need a speed of Less L3 capacity means even higher memory bandwidth demands, and Van Gogh is clearly not optimized to make the most of the CPU cores.

In general, CPUs are not running at maximum clock all the time. This trend is particularly noticeable in mobile devices, where clocks are increased according to load instead. This process of clocking up can take some time, but most devices are characterized by going to maximum clock as soon as possible to achieve high responsiveness.

But Steam Deck is not. The Steam Deck clock speed starts at 1.4GHz and reaches 1.7GHz in 0.27ms. This is a good start and shows that the APU can command clock changes fairly quickly, but after a few hundred milliseconds at 1.7GHz it gradually ramps up and reaches its maximum clock after almost a second. Become.

Chips and Cheese pointed out, ``Such boost behavior is the worst for a client device.'' Compared to other Zen 2 systems, it should feel quite unresponsive. It seems that this behavior is intentional, and it seems that it was done this way to extend battery life at the expense of responsiveness.

The Steam Deck GPU is named 'AMD Custom GPU 0405'. This is a GPU derived from RDNA 2, it seems to have 512 FP32 lanes, that is, 4 WGPs, and it seems that the operating clock is up to 1.6 GHz, which is a very low clock speed for RDNA 2 GPU.

The AMD Custom GPU 0405 features an RDNA-style cache setup with a new level of cache added when compared to Renoir's Vega iGPU. It has 16KB first level vector and scalar caches backed by 128KB L1, and like Renoir, Van Gogh uses a disproportionately large 1MB L2 cache to separate the GPU from the DRAM. increase. Keeping the same L2 cache to compute ratio as AMD's RX 6900 XT, the GPU with 4 WGPs would have less than 512KB L2.

According to a test that measured latency, the AMD Custom GPU 0405 (green line) fully demonstrated the superiority of RDNA's architecture, and Vector's memory access latency was found to be far superior to Vega. About. Access latencies to the scalar cache are about the same, and Vega is more competitive on the scalar side, but both iGPUs have about the same L2 latency, so RDNA's 128KB L1 should still have the upper hand, Chips and Cheese said. “Van Gogh is great because you get the benefit of a 128KB L1 mid-level cache while maintaining the same L2 latency,” he said.

In the GPU bandwidth test, the LPDDR5 controller finally redeemed itself, achieving near-paper performance. The Custom GPU 0405 has over 70GB/s of bandwidth, giving it a commanding bandwidth lead over Renoir's iGPU. This is consistent with Van Gogh's promotion of being a gaming-focused product.

Integrated GPUs like Van Gogh's are built into CPU-intensive chips and often suffer from bandwidth limitations. That's why Steam Deck uses LPDDR5 to achieve compute-to-bandwidth that rivals consoles. This means that enough bandwidth is available even when the GPU shares the memory bus with the CPU. Chips and Cheese points out that it has enough performance to meet the bandwidth demands of GPUs.

Van Gogh also has excellent transfer speeds between CPU and GPU. It's much faster than the Renoir, which is limited by DDR4 bandwidth, and faster than the RX 6900 XT, which is limited to PCIe 4.0. However, PCIe bandwidth does not significantly affect gaming performance until extremely slow configurations, so this performance is less important for gaming platforms. PCIe bandwidth is useful for computational applications that offload work to the GPU, do some processing on the result on the CPU side, and then move on, and Van Gogh is made for that. not.

Chips and Cheese said, 'AMD's custom APU is an interesting example of a very small console chip. Similar to the ones found in the PlayStation 5 and Xbox Series X, this CPU is also low clock, low cache and high memory. It suffers from the problem of latency.However, although the CPU performance is weak compared to other Zen 2, the CPU performance of Van Gogh alone is quite reliable.'

in Hardware,   Game, Posted by log1p_kr