(Image credit: AMD)

As spotted by patent sleuth @Underfox3, AMD has field a patent for a technique that speeds the transfer of threads between high-performance cores and smaller low-performance cores in a big.LITTLE-esque hybrid computing architecture. From the abstract from AMD's recently filed patent application:

[A] hybrid architecture may employ one or more small interconnect memory elements (IMEs) coupled to a core and a high-performance data operation unit (HPDU) coupled to other IMEs. In operation, the HPU includes one or more modules ...

A big.LITTLE device identifies and transfers data between different types of shaders in a system while only using a small portion of the available overall power budget …

It also identifies the types of shaders present on the die and specifies:

the locations of shaders components, the type of unit each shader component operates on and the relative distribution of schedulers or cross-processing unit sets for lead and shadow VUs

"Big.LITTLE" is obviously meant to refer to speech-first, conceptually. That is, that single type of L2 cache and single core can execute high-level instructions while the L1/L2 caches contain low-level,/kernel parallel instructions. Those instructions can then be conveyed further among platform memories/memory controllers or the CPU cores, with the assistance of RISC cores/tasklets, which share local interconnect registers (IRs).

(Image credit: AMD)

Now, theoretically, there are good reasons to prefer 3-core designs to increase plugin density and utilization. That is, if a higher number of core covers more parallel workloads, less need be distributed between the respective cores – instead of the needing more resources dedicated to one core/core pair.

However, it seems more likely that HEDT/HMP processor architectures are being used for faster execution of system-load-balancing tasks, rather than speed extension for adding more cores to well-optimized 6-or-better desktop/portable designs.

What conclusions can we draw from just the patent filing alone? In it, AMD has already described a single interconnect memory unit (IMU) on the die, one of I/O lanes and a system interface. Given the HEDT design, this IMU will likely consist of 2x A15 or A9 or A8 or A7 or A6L 352-pin (v1.4) A7/x A6/x processors. Semiconductor vendors might dispute sole type of interconnect memory for a planar interconnect, so we might not want to assume that a specific type is being worked around
g