In recent years the industrial community has strongly focused on embedded vision technologies, aiming at incorporating computer vision capabilities into a wide range of embedded systems. Typical examples of these applications are gesture tracking, smart video surveillance, advanced driver assistance systems (ADAS), Augmented Reality (AR), and many others. In this context the acceleration of computer vision functions is increasingly considered a must, since many of these applications are computationally intensive and exhibit real-time requirements.
Adrenaline is a novel tool for hardware/software design space exploration of many-core accelerators, with a strong focus on embedded vision applications. Adrenaline consists of two main components, an embedded platform simulator and an OpenVX programming model.?OpenVX is a cross-platform standard for imaging and vision, based on a generic set of primitives meant to run across multiple platforms.?Device vendors can provide an efficient implementation of such primitives leveraging specific hardware features (e.g., parallel cores).
The Adrenaline simulator models a SoC composed of a host processor and a many-core accelerator.
Host and accelerator instruction set simulators are synchronized with the aim to provide a coherent timing between the two domains.?The accelerator is modeled after PULP (Parallel Ultra-Low Power) platform, a configurable clustered many-core computing fabric.
Each cluster features a parametric number of Processing Elements (PEs) based on an optimized OpenRISC core.
Multiple PEs share a L1 multi-banked, multi-ported tightly coupled data memory (TCDM) and multiple clusters can be interconnected via a scalable communication medium (e.g., a NoC).?The PULP fabric is integrated in a SoC featuring L2 memory shared among all clusters and a fast DMA enables flexible and efficient communication.
Our simulator provides several knobs to tune the accelerator architecture. Examples include: enable/disable HW FPU, change the memory size at L1/L2 level, change the number of cores, set DMA bandwidht/latency.
On the software side, we provide an OpenVX run-time with specific optimizations for the PULP many-core platform.
With our solution, an OpenVX graph can be partitioned to execute partly on the host and partly on the accelerator.
When executed on the accelerator, an OpenVX graph node can be further parallelized among available PEs.
Our tool is intended to provide design exploration capabilities to a wide range of end users:
- Hardware designers and platform architects can explore different hardware configurations for a target application
- Application developers and software engineers can explore different partitioning solutions (host vs accelerator) for different applications
- SDK vendors and researchers can explore various optimizations, scheduling policies and algorithms for the implementation of the OpenVX support layer