Hexagon arch12/21/2023 ![]() The Hexagon processor is a hardware multi-threaded, variable instruction length, VLIW processor architecture developed for efficient control and signal processing code execution at low power levels needed for mobile platforms. Results of this benchmarking can be found at BDTI’s website. The processor has been benchmarked against relevant DSP processors by the leading independent company that analyzes Digital Signal Processors, Berkeley Design Technologies Incorporated (BDTI). Competitive data from applications such as H.264, AMR-WB and AAC+ will be presented.Ī public description of this architecture was recently presented at the 2013 edition of the Hot Chips conference at Stanford University in August. The ISA features a rich set of DSP arithmetic support including 16-bit and 32-bit fractional and complex data types, 32-bit floating-point, and full 64-bit integer arithmetic support. Vector operations use register pairs from the general register file. All instructions operate on a shared 32-entry per-thread register file. The Hexagon ISA is a hybrid DSP/CPU that features a 4-issue VLIW comprised of dual load/store slots and dual 64-bit vector execution slots. The perception of low instruction latencies allows the compiler to more effectively utilize the VLIW packets. Multi-threading hides pipeline latencies which make instruction latencies appear low. Multi-threading and VLIW are complementary technologies. The ISA features a VLIW-style static grouping of instructions. Such techniques enable extraction of high instruction parallelism even from irregular control-code applications. As an example, the common load-compare-branch idiom can be expressed in a single Hexagon instruction packet. Hexagon goes beyond conventional VLIW and allows for grouping of both independent and many forms of dependent instructions. The inherent latency tolerance afforded by multi-threading enabled ISA optimizations that would not otherwise be practical. Unlike most architectures, the Hexagon instruction set originated and evolved assuming the existence of a multi-threaded implementation. The RTOS globally schedules the highest priority runnable software threads and always directs interrupts to the lowest-priority hardware thread. To facilitate this, a very fast RTOS kernel has been designed for Hexagon. Thus, it is very beneficial for the software to employ threads that cooperate on shared data. These hardware threads share the entire memory hierarchy including L1. The programmer does not need to focus on the threading since the RTOS maps user software threads onto the processors hardware threads. To the programmer, these hardware threads can be considered as separate processor cores with shared memory, and are programmed using conventional software threading. The initial Hexagon V1 core supported six threads, but the most recent version of Hexagon DSP, Hexagon V5 features three threads. The number of hardware threads has changed over the generations to meet various product and application needs. Implementations have evolved from simple Interleaved Multi-Threading (IMT) to more advanced prioritized scheduling to obtain the maximum efficiency to schedule as many execution slots as possible. Hexagon cores use a semi-custom physical design methodology with customizations oriented to power reductionĪll versions of the Hexagon DSP core are hardware multi-threaded to enable superior concurrency needed in mobile applications. Through carefully orchestrated hierarchical clock gating, near perfect power scaling is achieved. One of the challenges with multi-threading is to have the power scale with the number of threads running. Keeping the speed targets low allows the implementation to avoid many of the power-costly design methods that are typical of high speed design. Rather than pushing performance through MHz, the designs strive for high levels of work per cycle, but at a reduced clock speed. ![]() ![]() Energy efficiency is often the more critical metric. Hexagon cores are optimized for both high performance and energy efficiency. ![]() At Uplinq 2013, we released the first publicly available development environment for the Hexagon DSP, the Hexagon SDK. The “Hexagon DSP” core is now in its 5th generation and is integrated inside all recent Qualcomm Technologies modem and application chips. As of 2012, multiple Hexagon cores form the processing engine behind virtually every commercially shipping 4G LTE modem by Qualcomm Technologies. In 2011 the Hexagon Access program was started to allow customers to program the DSP and thus exploit the power & performance benefits of offloading the ARM cores for performance, reduced power dissipation, or concurrency requirements. Qualcomm Technologies began development of a new DSP processor architecture and high-performance implementation in the Fall of 2004. ![]()
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |