WP6-13: Difference between revisions

From COMP4DRONES
Jump to navigation Jump to search
(Created page with "= OODK: Onboard Overlay Development Kit = {|class="wikitable" | ID|| WP6-13 |- | Contributor || UNIMORE |- | Levels || Tool |- | Require || OODK configuration, HW accelerators and FPGA-based System-on-Chip |- | Provide || Automated deployment of HW accelerators in the companion computer platform |- | Input || * OODK configuration file * HW Accelerator RTL designs |- | Output || * OODK system ready to be deployed on FPGA-based System-on-Chip * SW stack for...")
 
Line 14: Line 14:
* OODK configuration file
* OODK configuration file
* HW Accelerator RTL designs
* HW Accelerator RTL designs
|-
|-
|  Output ||  
|  Output ||  
* OODK system ready to be deployed on FPGA-based System-on-Chip
* OODK system ready to be deployed on FPGA-based System-on-Chip
* SW stack for offloading computation from the host processor side to the hardware-specific accelerators
* SW stack for offloading computation from the host processor side to the hardware-specific accelerators
|-
|-
|  C4D tooling || n.a.
|  C4D tooling || n.a.
Line 33: Line 31:


It allows for:
It allows for:
* Automated deployment of HW accelerators in the companion computer platform.
* Automated deployment of HW accelerators in the companion computer platform.
* Generation of an application-tailored FPGA overlay. The latter is an HW/SW abstraction layer that integrates and supports HW accelerators and is instantiated in the FPGA-based System-on-Chip.  
* Generation of an application-tailored FPGA overlay. The latter is an HW/SW abstraction layer that integrates and supports HW accelerators and is instantiated in the FPGA-based System-on-Chip.  
* SW stack support for streamlined offloading of computation from the host processor side to the hardware-specific accelerators
* SW stack support for streamlined offloading of computation from the host processor side to the hardware-specific accelerators.


Target drone application(s) will then run on a Heterogeneous System-on-Chip where:
Target drone application(s) will then run on a Heterogeneous System-on-Chip where:
* Host CPU is an industry-standard, hard-macro multi-core CPU. It executes full-fledged operating systems and other legacy software (e.g., ROS).
* Host CPU is an industry-standard, hard-macro multi-core CPU. It executes full-fledged operating systems and other legacy software (e.g., ROS).
* FPGA overlay consists of several clusters grouping a small number of RISC-V-based proxy cores to control the operation of one or more HW accelerators.
* FPGA overlay consists of several clusters grouping a small number of RISC-V-based proxy cores to control the operation of one or more HW accelerators.
* Applications start on the host CPU, and then compute-intensive parts can be offloaded to the FPGA overlay on the programmable logic.
* Applications start on the host CPU, and then compute-intensive parts can be offloaded to the FPGA overlay on the programmable logic.
* Target SoC supports interaction with autopilot, ground station, sensor, and/or other I/O.
* Target SoC supports interaction with autopilot, ground station, sensor, and/or other I/O.


==Contribution and Improvements==
==Contribution and Improvements==
OODK is a collection of tools for integrating and configuring HW accelerators (see below UC5-DEM10-DTC-05) and for quickly and effectively offloading computation from the host processor side to the hardware-specific accelerators (see UC5-DEM10-DTC-04).
In the context of the C4D contributions, this component had to demonstrate requirements associated with UC5-DEM10-DTC-04 and UC5-DEM10-DTC-05:
 
* Computation offloading capabilities are enabled by the OpenMP4 offloading support. The approach features single source and compiler-assisted code generation (pragma), thus enabling streamlined offloading between the host CPU and HW accelerators.
 
* OODK enables the automatic integration of HW accelerators with a single configuration file. Thus, users can integrate accelerators without writing in HDL and without the need to be an expert HW designer.
 
In WP4, synthetic and realist applications have been compared showing a minor implementation effort regarding LOC written for the application.
Figure 80 shows the main functionalities assigned to the OODK. OODK is used to integrate and design the complete system of the FPGA overlay.
Thus, it includes an automated flow for the integration and implementation of custom HW accelerators (from MDC and other design methodologies, such as Xilinx Vivado HLS) on our overlay.
 
 
 
 


== Interoperability with other C4D tools ==
== Interoperability with other C4D tools ==
Figure 80 shows the main functionalities assigned to the OODK. OODK is used to integrate and design the complete system of the FPGA overlay.
[[File:oodk_mdc.png|frame|center|OODK interoperability graph]]
 
The original Onboard Overlay Development kit also includes a tool for integrating Xilinx Vivado HLS accelerators into the overlay template. Thanks to the C4D project, the flow for integrating HW Accelerator is going to be enriched by the more complete and powerful MDC tool developed by the UNISS.
 
The OODK is used in the context of UC5-D1 to instantiate the Onboard Overlay Compute Platform that is used as a fabric for the AES Hardware Accelerator generated with the Multi-Dataflow Compose tool designed by UNISS (see Figure 81).
 
The UC5-D1 application providers designed an HW accelerator for the AES cryptographical encryption using MDC. The generated dataflow graph (DF Network) is then integrated into the Onboard Overlay Compute Platform using the OODK scripts. The OODK is also used to synthesize the whole RTL design into a Xilinx FPGA bitstream and implement the high-level programming interface for the AES accelerator through the OpenMP 4.0 Spec. Figure 83 shows the dependency/interoperability between OODK and the MDC inside the UC5-D1.
 
 
 
 


The original Onboard Overlay Development kit also includes a tool for integrating Xilinx Vivado HLS accelerators into the overlay template.
Thanks to the C4D project, this component is extended to automatically integrate and configure coarse-grained reconfigurable HW accelerators that are implemented by the more complete and powerful Multi-Dataflow Compose (MDC, WP6-15 component) tool developed by the UNISS.
In the context of UC5-D1, OODK has been used to instantiate the Onboard Overlay Compute Platform as a fabric for the Lightweight Cryptography (AES) HW accelerator generated with MDC tool designed by UNISS. The UC5-D1 application providers designed an AES accelerator using MDC.
OODK has been employed to integrate the MDC-based dataflow graph (DF Network) into the Onboard Overlay Compute Platform, and then synthesize the generated system targeting a Xilinx FPGA. The high-level programming interfaces for the AES accelerator are implemented through the OpenMP 4.0 Spec. The figure above shows the dependency/interoperability between OODK and the MDC inside the UC5-D1.


==Current Status==
==Current Status==
OODK has been evaluated with the Onboard Overlay design methodology developed in WP3, providing end-to-end examples of HWPU accelerators under development within the WP4 effort.
OODK has been tested in the context of UC5-D1 to evaluate the improvement with regard to SoA FPGA-based acceleration (e.g., Xilinx SDSoC). Relatively to the main metrics:
Note: OODK has not been validated in UC5-D1 directly because the proposed tools and runtime libraries are not "visible" at the UAV-user level but only on the engineering side.
 
 
 
OODK is used for implementing and designing software Subsystem and Component elements of the C4D Drone Reference Platform targeting the Onboard Overlay Compute Platform (WP3-22).


OODK is used in UC5-D1 for software (application) design and implementation and for implementing HW (FPGA-Overlay) design and integration.
* The code required for the HW/SW integration of the Lightweight Cryptography accelerator in the FPGA overlay is automatically generated. A total of 46KLOC (thousand lines of code) is replaced by the definition of just 11 parameters to describe the system micro-architecture plus 33 for the accelerator wrapper. Overall, this reduces LOC by a factor of 1049X.


completed:
* About heterogeneous applications, the reduction in the number of lines of code is application-specific. Focusing on the UC5-D1 use case, the application consists of 927 lines of code for the main program, which are written manually following the convenient OpenMP coding style. In addition to that, 1709 lines of code are required for extending the firmware’s semantics for the specific accelerators in the platform. In the proposed methodology, the latter are also automatically generated starting from the same 11+33 parameters used to define the architecture. This leads to a reduction in the number of lines of code equal to 2,7X.
- Enabling OpenMP4 offloading support, OODK enables easy and streamlined computation offload between host and HW Application-Specific Accelerators: single source and compiler-assisted code generation (pragma).
Comparison with synthetic and realist applications (see WP4) shows a minor implementation effort regarding LOC written for the application.
- OODK enables automatic integration of HW Application-Specific Accelerators with a single configuration file (python).
Users can integrate accelerators without writing in HDL and without the need to be an expert HW designer.
Comparison with synthetic and realist applications (see WP4) shows a minor implementation effort in terms of LOC written for the HW integration and design time.


* The execution time speedup when comparing the HW and SW versions of the Lightweight Cryptography layer amounts to approximately 2X. This does not directly translate into an equivalent ratio in energy savings, because the host CPU processor is active while the accelerator operates, which contributes to the overall energy spent. The energy reduction when using the accelerated version of the Lightweight Cryptography layer amounts to 48%.


==Design and Implementation==
==Design and Implementation==
[[File:oodk_kit_components.png|frame|center|Onboard Overlay Development Kit Components]]


The snipped above shows the only configuration file for OODK that the user should compile to integrate one or more HW Application-Specific Accelerators.
The snipped above shows the only configuration file for OODK that the user should compile to integrate one or more HW Application-Specific Accelerators.

Revision as of 10:13, 10 October 2022

OODK: Onboard Overlay Development Kit

ID WP6-13
Contributor UNIMORE
Levels Tool
Require OODK configuration, HW accelerators and FPGA-based System-on-Chip
Provide Automated deployment of HW accelerators in the companion computer platform
Input
  • OODK configuration file
  • HW Accelerator RTL designs
Output
  • OODK system ready to be deployed on FPGA-based System-on-Chip
  • SW stack for offloading computation from the host processor side to the hardware-specific accelerators
C4D tooling n.a.
TRL 5/6
License Open-source

Detailed Description

Although FPGA technology can satisfy the performance, energy and predictability requirements of drone systems and applications, FPGA development is a notoriously complex task. This component is a methodology to ease the deployment of application-specific accelerators - or Hardware Processing Units (HWPU) - in the companion computer platform.

It allows for:

  • Automated deployment of HW accelerators in the companion computer platform.
  • Generation of an application-tailored FPGA overlay. The latter is an HW/SW abstraction layer that integrates and supports HW accelerators and is instantiated in the FPGA-based System-on-Chip.
  • SW stack support for streamlined offloading of computation from the host processor side to the hardware-specific accelerators.

Target drone application(s) will then run on a Heterogeneous System-on-Chip where:

  • Host CPU is an industry-standard, hard-macro multi-core CPU. It executes full-fledged operating systems and other legacy software (e.g., ROS).
  • FPGA overlay consists of several clusters grouping a small number of RISC-V-based proxy cores to control the operation of one or more HW accelerators.
  • Applications start on the host CPU, and then compute-intensive parts can be offloaded to the FPGA overlay on the programmable logic.
  • Target SoC supports interaction with autopilot, ground station, sensor, and/or other I/O.

Contribution and Improvements

In the context of the C4D contributions, this component had to demonstrate requirements associated with UC5-DEM10-DTC-04 and UC5-DEM10-DTC-05:

  • Computation offloading capabilities are enabled by the OpenMP4 offloading support. The approach features single source and compiler-assisted code generation (pragma), thus enabling streamlined offloading between the host CPU and HW accelerators.
  • OODK enables the automatic integration of HW accelerators with a single configuration file. Thus, users can integrate accelerators without writing in HDL and without the need to be an expert HW designer.

In WP4, synthetic and realist applications have been compared showing a minor implementation effort regarding LOC written for the application.

Interoperability with other C4D tools

OODK interoperability graph

The original Onboard Overlay Development kit also includes a tool for integrating Xilinx Vivado HLS accelerators into the overlay template. Thanks to the C4D project, this component is extended to automatically integrate and configure coarse-grained reconfigurable HW accelerators that are implemented by the more complete and powerful Multi-Dataflow Compose (MDC, WP6-15 component) tool developed by the UNISS. In the context of UC5-D1, OODK has been used to instantiate the Onboard Overlay Compute Platform as a fabric for the Lightweight Cryptography (AES) HW accelerator generated with MDC tool designed by UNISS. The UC5-D1 application providers designed an AES accelerator using MDC. OODK has been employed to integrate the MDC-based dataflow graph (DF Network) into the Onboard Overlay Compute Platform, and then synthesize the generated system targeting a Xilinx FPGA. The high-level programming interfaces for the AES accelerator are implemented through the OpenMP 4.0 Spec. The figure above shows the dependency/interoperability between OODK and the MDC inside the UC5-D1.

Current Status

OODK has been tested in the context of UC5-D1 to evaluate the improvement with regard to SoA FPGA-based acceleration (e.g., Xilinx SDSoC). Relatively to the main metrics:

  • The code required for the HW/SW integration of the Lightweight Cryptography accelerator in the FPGA overlay is automatically generated. A total of 46KLOC (thousand lines of code) is replaced by the definition of just 11 parameters to describe the system micro-architecture plus 33 for the accelerator wrapper. Overall, this reduces LOC by a factor of 1049X.
  • About heterogeneous applications, the reduction in the number of lines of code is application-specific. Focusing on the UC5-D1 use case, the application consists of 927 lines of code for the main program, which are written manually following the convenient OpenMP coding style. In addition to that, 1709 lines of code are required for extending the firmware’s semantics for the specific accelerators in the platform. In the proposed methodology, the latter are also automatically generated starting from the same 11+33 parameters used to define the architecture. This leads to a reduction in the number of lines of code equal to 2,7X.
  • The execution time speedup when comparing the HW and SW versions of the Lightweight Cryptography layer amounts to approximately 2X. This does not directly translate into an equivalent ratio in energy savings, because the host CPU processor is active while the accelerator operates, which contributes to the overall energy spent. The energy reduction when using the accelerated version of the Lightweight Cryptography layer amounts to 48%.

Design and Implementation

Onboard Overlay Development Kit Components

The snipped above shows the only configuration file for OODK that the user should compile to integrate one or more HW Application-Specific Accelerators. For the application design, our tool provides a single-source OpenMP4.5-enabled programming interface. Supporting an OpenMP4.5 Accelerator Model means having an OpenMP4.5-enabled compiler supporting both the host ISA and the RISC-V ISA and a runtime system implementing the OpenMP standard. Thus, the OODK collection contains:  Clang/LLVM Compiler. Compiler configured for supporting OpenMP offloading from AARCH64 ISA to RISC-V ISA.  Overlay Runtime Libraries. Host and Overlay Communication and Runtime Libraries.  Overlay Rootfs and Linux OS Generator. Automated scripts for the creation of Linux-based rootfs for the host.