Something to look forward to: AMD has published its first patent for chiplet GPU design. In the typical AMD way, they try not to rock the boat. Chiplet GPUs have just started to appear. Intel has been honest about the development process and confirmed the hiring of chips in their first generation discrete GPUs. Nvidia, while kind about details, has published a number of research articles on the subject. AMD was the last holdout – which only adds to the intrigue.
Chiplets, as the name suggests, are less complex chips, which meant working together into more powerful processors. They are without a doubt the inevitable future of all high-performance components, and in some cases the successful present; AMD̵
In the new patent dated December 31, AMD outlines a chiplet design that is designed to mimic a monolithic design as closely as possible. Their hypothetical model uses two chipsets connected to a high-speed inactive interposer called crosslinking.
A cross-connection is located between the L2 cache and the L3 cache in the memory hierarchy. Everything below it, like cores and L1 cache and L2 cache, is aware that they are separated from the other chip. Everything above, including L3 cache and GDDR memory, is shared between the chips.
This design is beneficial because it is conventional. AMD claims that computing devices have access to low-level cache on other chips almost as fast as they have access to local low-level cache. If it turns out to be true, software does not need to be updated.
The same cannot be said of Intel and Nvidia’s designs. Intel intends to use two new technologies, EMIB (built-in multi-die interconnection bridge) and Foveros. The latter is an active interposer that uses through-silicon vias, something AMD explicitly says they will not use. Intel’s design allows the GPU to house a system-accessible cache that drives a new memory.
Nvidia has not revealed everything, but has indicated some directions they can follow. A research article from 2017 describes a square chip design and a NUMA (non-uniform memory access) clear and site-conscious architecture. It is also experimenting with a new L1.5 cache, which has only external data access and is bypassed under local memory access.
AMD’s approach may sound at least imaginative, but it also sounds practical. And if history has proven anything, it’s that developer friendliness is a big advantage.
Below are several diagrams from the patent.
Figure 2 is a cross section that falls down from two pieces to the circuit board. The two pieces (106-1 and 106-2) are stacked vertically on the passive crosslink (118) and use dedicated conductor structures to access the crossband tracks (206) and then communicate with each other. Conductor structures that are not attached to the crosslink (204) are connected to the circuit board for power and other signaling.
Figure 3 shows the cache hierarchy. WGPs (workgroup processors) (302), which are collections of shadow cores, and GFXs (fixed function units) (304), which are dedicated processors for unique purposes, are connected directly to a channel’s L1 cache (306). Each chip contains several L2 cache (308) banks that are individually addressable and that are also contiguous in a single chip. Each chip also contains several L3 cache (310) cache banks that are contiguous across the GPU.
GDF (graphics data material) (314) connects the L1 cache banks to the L2 cache banks. SDF (scalable data material) (316) combines L2 cache banks and connects them to the cross link (118). The cross connector connects to the SDFs on all chips, as well as the L3 cache on all chips. The GDDR memory fields (written as Memory PHY) (312) are connected to L3 cache banks.
As an example, if a WGP on one chiplet required data from one GDDR bank on another chiplet, this data would be sent to an L3 cache bank, then over the crosslink to an SDF, then to an L2 bank, and finally through a GDF to an L1 bank.
Figure 4 is a bird’s eye view of a piece. It shows more accurately the potential locations and scales of different components. The HBX controller (404) controls the crosslinking to which the chip is connected with HBX PHY (406) conductors. The small square in the lower left corner (408) is a potential additional connection to the cross link to connect more pieces.