WP6-15: Difference between revisions

From COMP4DRONES
Jump to navigation Jump to search
(Created page with "= Multi-Dataflow Composer = {|class="wikitable" | ID|| WP6-MDC |- | Contributor || UNISS |- | Levels || Tool |- | Require || Application definition and FPGA-based System-on-Chip |- | Provide || Ready-to-use reconfigurable HW accelerator |- | Input || | * Dataflow application specification(s) * HDL actor definition(s) * Communication protocol * Target architecture |- | Output || * Multi-dataflow network * Coarse-grained reconfigurable accelerator RTL * Co-p...")
 
Line 12: Line 12:
|-
|-
|  Input ||  
|  Input ||  
|
* Dataflow application specification(s)
* Dataflow application specification(s)
* HDL actor definition(s)
* HDL actor definition(s)

Revision as of 15:00, 3 October 2022

Multi-Dataflow Composer

ID WP6-MDC
Contributor UNISS
Levels Tool
Require Application definition and FPGA-based System-on-Chip
Provide Ready-to-use reconfigurable HW accelerator
Input
  • Dataflow application specification(s)
  • HDL actor definition(s)
  • Communication protocol
  • Target architecture
Output
  • Multi-dataflow network
  • Coarse-grained reconfigurable accelerator RTL
  • Co-processor RTL
  • Programming tables
C4D tooling n.a.
TRL 5/6
License Open-source

Detailed Description

Although FPGA technology has the potential to satisfy the many performances, energy and predictability requirements of drone systems and applications, FPGA development is notoriously a complex task.

To deal with this problematic, the baseline feature of this component revolves around the composition of coarse-grained reconfigurable HW accelerators (CGRA) starting from a set of dataflow applications. The baseline feature involves two main components:

  • Multi-Dataflow Generator (MDG): it merges together different dataflows into one unique reconfigurable multi-dataflow by the insertion of switching modules. Currently, two merging algorithms are supported: empiric and Moreano. The former is more suitable for non-recursive dataflows but less optimized than the latter.
  • Platform composer (PC): it derives the RTL description of the CGRA from the multi-dataflow. It requires the user to define the communication protocol between actors in hardware (XML) and the RTL description of the actors involved in the dataflows (HDL Components Library, HCL).

This component also provides an automatic coprocessor generation, which automatically embeds the generated CGRA into a ready-to-use Xilinx IP. The user can choose among different options:

  • Processor: soft-core (Microblaze) or hardcore (ARM)
  • Processor-Coprocessor coupling: Memory-mapped or FIFO-based
  • Direct Access Memory Module: enable or not the usage of DMA

Contribution and Improvements

Regarding the contribution associated to C4D, this component will be extended to be able to automatically generate plug-and-play coarse-grained reconfigurable HW accelerators that can be used by WP6-13 component.

In the specific case of UC5-D1, the MDC tool is used to model the application that needs to be accelerated on an FPGA so as to meet real-time responses: an AES encryption/decryption block provided by RO Technologies. To do so, the application is divided in sub-blocks (called actors) that will be automatically interconnected thanks to the code generation capabilities of the tool. Each of these actors have been implemented in Verilog/SystemVerilog. Additionally, since the tool automatically connects the actors using First-In-First-Out (FIFO) blocks (already available in the tool repository), pipelining is transparently and automatically enabled within the accelerator.

Interoperability with other C4D tools

MDC has been extended with a new backend compatible with the OODK tool (Component WP6-13), where a wrapper surrounding the accelerator is generated, enabling a direct connection from MDC to OODK.

Current Status

MDC has been tested in the context of UC5-D1 to model the application that needs to be accelerated on an FPGA so as to meet real-time responses: an AES encryption/decryption block provided by RO Technologies. It has successfully achieved a performance improvement of 2x.

Design and Implementation

Considering the specific context of C4D, the tool output is directly connected to the OODK toolchain. Consequently, the design and implementation flow works as follows:

  • Define the three inputs that are required:
    • Implement the task(s) to be accelerated using a dataflow approach.
    • Define the HDL version of the actors composing the tasks (manually or with HLS tools).
    • Define the communication protocol to be used inside and outside the accelerator.
  • Using the MDG functionality, if more than one task has been specified, merge the tasks to be accelerated into a reconfigurable multi-dataflow.
  • Depending on the target architecture, the user must select the files to be generated, that could include the accelerator RTL description and a wrapping logic surrounding the accelerator itself that could allow the user to 1) use the coarse-grained reconfigurable accelerator as a co-processor or 2) to plug the accelerator with an FPGA overlay, as in the case of this project.
  • Run the automatically generated scripts to port the code to Vivado.
  • Synthesize and implement it on the target FPGA device.