Used in conjunction with a Xilinx Spartan-3A DSP 3400 FPGA and the Xilinx ISE and EDK Tools
About AutoESL AutoPilot
AutoESL’s AutoPilot is a high-level synthesis (“HLS”) tool that takes C, C++, or SystemC as its input and generates device-specific RTL for FPGAs or ASICs. BDTI used AutoPilot in conjunction with Xilinx’s ISE and EDK tool chain to implement two example applications (“workloads”) on a Xilinx Spartan-3A DSP 3400 FPGA. AutoPilot was used to go from C code to RTL, and Xilinx’s ISE and EDK tool chain was used to convert the resulting RTL to the final bitstream that was used to program the FPGA.
AutoESL was acquired by Xilinx in 2011 and the functionality of AutoPilot was incorporated into the Xilinx Vivado tool suite.
For details of the methodology of the of our certification program please visit our BDTI High-Level Synthesis Tool Certification Program page.
Quality of Results Metrics
The tables below show the BDTI High-Level Synthesis Tools Certification Program quality of results metrics for the AutoESL AutoPilot HLS tool used in combination with the Xilinx RTL tools to target a Spartan-3A DSP 3400 FPGA. (Usability metrics are covered in the next section.)
The workload implementations used to obtain the quality of results metrics were developed as follows:
- The AutoPilot-based FPGA workload implementations were developed by AutoESL. The process for obtaining BDTI Certified™ results is as follows: The HLS tool vendor licenses the specification from BDTI, implements the workloads according to the specification, and these implementations are then reviewed and certified by BDTI. As part of its analysis BDTI spends multiple engineer-months using the HLS tool, performing independent designs and reviewing the vendor-developed implementations in detail.
- The DSP processor workload implementation was developed by BDTI. A team of experienced DSP engineers spent multiple engineer-months designing, optimizing, and testing the implementation. The DSP implementation of the Optical Flow workload is very highly optimized.
- The traditional hand-coded RTL workload implementation was developed by Xilinx and BDTI. Based on guidance from experienced FPGA users in a variety of industries, a team of experienced RTL design engineers developed an implementation consistent with industry norms. The traditional RTL FPGA implementation is not meant to be extremely optimized but is optimized to a degree representative of industry practices.
Quality of Results Metrics for the BDTI Optical Flow Workload
There are two Operating Points associated with the BDTI Optical Flow Workload, each of which uses the same algorithm but is optimized for a different metric
- Operating Point 1 is a fixed workload defined as processing video with 720p resolution (1280×720 progressive scan) at 60 frames per second. The objective for Operating Point 1 is to achieve the required throughput while minimizing resource utilization. Resource utilization refers to the percentage of total processing engine resources required to implement the workload.
- Operating Point 2 is defined as the maximum throughput capability of the workload implementation on the target device (measured in frames per second) for 720p resolution (1280×720 progressive scan). The objective for Operating Point 2 is to maximize the throughput (measured in frames per second) using all available device resources.
Platform | Chip Unit Cost (USD, Quantity 10,000) | Chip Resource Utilization (Lower is Better) |
---|---|---|
AutoESL AutoPilot plus Xilinx RTL tools targeting the Xilinx XC3SD3400A FPGA | $26.65 | 39% |
Texas Instruments software development tools targeting the TMS320DM6437 DSP processor | $21.25 | N/A (a minimum of 12 DSPs would be required to meet this operating point) |
Table 1: BDTI High-Level Synthesis Tool Certification Program Results, BDTI Optical Flow Workload Operating Point 1: Fixed Throughput (1280x720 Progressive Scan, 60 Frames per Second)
Platform | Chip Unit Cost (USD, Quantity 10,000) | Maximum Frames per Second (FPS) | Cost per FPS (Lower is Better) |
---|---|---|---|
AutoESL AutoPilot plus Xilinx RTL tools targeting the Xilinx XC3SD3400A FPGA | $26.65 | 183 | $0.14 |
Texas Instruments software development tools targeting the TMS320DM6437 DSP processor | $21.25 | 5.1 | $4.20 |
Table 2: BDTI High-Level Synthesis Tool Certification Program Results, BDTI Optical Flow Workload Operating Point 2: Maximum Throughput (1280x720 Progressive Scan)
Quality of Results Metrics for the BDTI DQPSK Receiver Workload
The BDTI DQPSK Receiver is a fixed workload with a single Operating Point defined as processing an input stream of complex modulated data at 18.75 Msamples/second with the receiver chain clocked at 75 MHz. The corresponding DQPSK demodulated output bitstream is 4.6875 Mbits/second. The objective for this workload is to minimize the resource utilization required to achieve the specified throughput.
Platform | Chip Resource Utilization (Lower is Better) |
---|---|
AutoESL AutoPilot plus Xilinx RTL tools targeting the Xilinx XC3SD3400A FPGA | 5.6% |
Hand-written RTL code using Xilinx RTL tools targeting the Xilinx XC3SD3400A FPGA | 5.9% |
Table 3: BDTI DQPSK Receiver Workload, Fixed Throughput (18.75 Msamples/second Input Data with 75 MHz Clock Speed)
Usability Metrics
The following tables show usability metrics results for AutoPilot from the BDTI High-Level Synthesis Tool Certification Program. The usability metrics assess ease of use and productivity aspects of the AutoESL AutoPilot high-level synthesis tool used in combination with the Xilinx RTL tools, in comparison with the DSP processor software development tools. These are assessed in a qualitative manner, and for each aspect one of the following scores is assigned by BDTI:
- Excellent
- Very Good
- Good
- Fair
- Poor
For the DSP processor, since a single TI-provided tool chain is used, a single score is provided for each usability metric. In contrast, since AutoPilot was used with the Xilinx RTL tools, three scores are provided:
- An overall score for the entire flow (AutoPilot plus Xilinx RTL tools)
- A score for only AutoPilot (the first score in parenthesis)
- A score for only the Xilinx RTL tools (the second score in parenthesis)
In its usability analysis, BDTI considers the overall design process for a complete project flow starting with a C language algorithm reference implementation and ending with a real-time implementation on a processing platform (either an FPGA or DSP processor). The scores provided are primarily based on BDTI’s experience in implementing the BDTI Optical Flow Workload on the DSP and FPGA platforms.
As part of BDTI’s evaluation of AutoPilot, a team of experienced DSP software engineers spent multiple engineer-months learning and using AutoPilot, designing an independent BDTI Optical Flow Workload implementation from the ground up, integrating the BDTI Optical Flow application into an FPGA (including connecting to external memory and video I/O), and testing the independent implementation. The BDTI engineers who worked with AutoPilot are experienced “hardware-aware” DSP software engineers (meaning that they understand hardware architecture concepts such as pipelining and latency), but they have limited knowledge of FPGA RTL design and no previous experience with HLS tools for FPGAs. In other words, the BDTI engineers who worked on the AutoPilot FPGA design had to learn to use AutoPilot by taking AutoESL provided training and reading the documentation.
BDTI engaged an experienced FPGA engineer to assist with the aspects of the AutoPilot based FPGA implementation involving the Xilinx FPGA tools and RTL design (e.g., using the RTL output generated by HLS tool and integrating it onto the FPGA). Because an FPGA engineer experienced in both RTL design and Xilinx tools completed this effort, BDTI did not assess learning to use the Xilinx tools as part of the usability metrics. However, the Xilinx tools installation was performed by a DSP software engineer, thus Out-of-the-Box Experience was assessed and given a score.
Similarly, because experienced DSP software engineers developed the DSP processor implementation, BDTI did not assess learning to use the DSP tool chain as part of the usability metrics.
Usability metrics related to the effort required to implement the application and modification of the reference code will vary by application (sometimes significantly). In particular, the BDTI Optical Flow Workload requires unusually extensive structural modifications for optimized performance on the DSP processor. The choice of application will have the most impact on the following metrics:
- Efficiency of Design Methodology - Design and Implementation (Final Optimized Version), and
- Extent of Modifications Required to the Reference Code
A description of the individual usability metrics follows.
Required Skills
In addition to assigning scores for the usability metrics listed below, BDTI also identifies the skills required to effectively work with each tool chain. No score is provided for this metric.
Out-of-the-Box Experience
This includes all activities from unpacking the box to getting everything set up and installed so that the user can start the design process. The assessment includes items such as clarity of documentation, smoothness of the installation process, the time required to perform the installation, and the helpfulness of tutorials or demo applications. Note: For the Xilinx tools the Out-of-the-Box Experience was assessed by DSP software engineers rather than an experienced FPGA designer.
Ease of use
This is an assessment of how easy it is to use the features provided by the tool chain. It is not meant to identify missing features (this is addressed as part of the “completeness of capabilities” metric). The assessment includes items such as the intuitiveness and user-friendliness of the user interface, responsiveness (i.e., whether the tool chain was slow to complete actions), reliability (did the tools crash or hang?), and clarity of on-line help.
Completeness of Capabilities
This is an assessment of the extent to which the tool chain includes all capabilities necessary to enable a user to efficiently complete the implementation of the workloads.
Quality of Documentation and Support
Throughout the certification process BDTI assesses the documentation supplied with the tool chain. This includes the quality of the getting started guide, tutorials, and the ease of finding answers to specific questions (including technical support).
Efficiency of the Overall Design Methodology
This is primarily an assessment of user productivity. It includes assessing the extent to which a user of the high-level tool is abstracted from the underlying architecture (RTL for FPGA implementations and the DSP processor core and chip architecture for DSP processor implementations). Since this is a broad category, it is broken up into the following sub-categories:
- Learning to Use the Tool:
- For FPGA designs: Learning to efficiently use the high-level synthesis tool. As mentioned above, no score is provided for learning the Xilinx RTL tools because an experienced BDTI FPGA engineer worked with them.
- For DSP designs: As mentioned above, no score is provided for learning the DSP tools because experienced BDTI DSP engineers worked with them.
- First Compiling Version:
- The effort required to create an initial functional implementation of the application. For the FPGA this includes only the HLS tool—use of the RTL tools to support integration into the FPGA is not included. The first compiling version is not expected to be optimized in terms of performance or resource utilization, but rather an initial implementation based on which the optimization process can begin.
- Final Optimized Version:
- The effort required to take the application from the first compiling version to a final optimized implementation (not including completion of interfaces required to run on an actual chip). For the FPGA, this does not include final integration with the video and memory interfaces, but does include adding C language code required to interact with external memory. For the DSP processor, this includes testing via file I/O, but not integration with external video ports.
- Platform Infrastructure Development:
- This category includes integration of platform components that must be incorporated into a design so that it will run on a physical chip (i.e., a DSP or FPGA). This includes, but is not limited to:
- For FPGA implementations: Complete integration of the memory controller, external memory and video I/O.
- For DSP processor implementations: Installation and configuration of drivers and libraries, and interfacing to video I/O.
- This category includes integration of platform components that must be incorporated into a design so that it will run on a physical chip (i.e., a DSP or FPGA). This includes, but is not limited to:
- Extent of Modification to the Original Reference Code
- This is an assessment of how closely the code used to generate the final optimized BDTI Optical Flow Workload implementation matches the original C language reference code provided by BDTI. This is not a simple count of the number of lines of code that have been changed, but rather reflects the complexity and effort involved in making the necessary changes. Typically changes are made for one of the following reasons:
- Structural changes: Changes to the overall data flow and code structure required to map the application to the underlying device architecture
- Changes required because of tool restrictions or limitations
- Timing and resource optimizations on individual blocks
- Interfacing to peripherals and other external modules
- BDTI considers factors such as: the number of changes, the extent to which the tools automate implementing these changes, the level of difficulty for the developer in incorporating the changes and the level of difficulty in debugging and testing the changes.
Platform | Required Skill Set |
---|---|
AutoESL AutoPilot High-Level Synthesis Tool |
|
Xilinx RTL tools |
|
TI DSP tools |
|
Table 4: BDTI High Level Synthesis Tool Certification Program Results: Skills Required.
The table above summarizes the skills required to effectively use the AutoPilot high-level synthesis tool and Xilinx RTL tools for FPGA implementations, and the DSP tools for DSP processor implementation.
Out-of-Box Experience | Ease of Use | Completeness of Capabilities | Quality of Documentation and Support | |
---|---|---|---|---|
Combined AutoESL AutoPilot plus Xilinx tools rating [1] (AutoESL AutoPilot rating / Xilinx rating) | Fair (Very Good / Poor) | Good (Very Good / Fair) | Good (Good / Good) | Good (Fair / Very Good) |
Texas Instruments software development tools rating [2] | Good | Very Good | Very Good | Very Good |
Table 5.1: BDTI High-Level Synthesis Tool Certification Program Results: Usability Metrics
2Texas Instruments software development tools targeting a TMS320DM6437 DSP processor
The above table provides the qualitative productivity metric scores for the AutoESL AutoPilot high-level synthesis tool. Note that AutoESL AutoPilot tools include an overall score (in bold) followed in parenthesis by:
- The score for AutoESL AutoPilot only (the first score in parenthesis)
- The score for the Xilinx RTL tools only (the second score in parenthesis)
Since the DSP processor implementation used a single tool chain throughout the entire flow, a single score is provided.
Productivity Metric |
Learning to Use the Tool | Design & Implementation (First Compiling Version) | Design & Implementation (Final Optimized Version) | Platform Infrastrcuture Development | Extent of Modifications Required to Reference Code |
---|---|---|---|---|---|
Combined AutoESL AutoPilot plus Xilinx tools rating [1] (AutoESL AutoPiloot rating / Xilinx rating) | Very Good (Very Good / NA [2]) | Very Good (Very Good / NA) |
Good (Good / Good) |
Good (Good / Good) |
Good (Good / NA) |
Texas Instruments software development tools rating [3] | NA | Excellent | Good | Good | Fair |
Table 5.2: High-Level Synthesis Tool Certification Program Results: Usability Metrics
2NA=Not Applicable
3Texas Instruments software development tools targeting a TMS320DM6437 DSP processor
The above table provides the qualitative productivity metric scores for the AutoESL AutoPilot high-level synthesis tool. Note that AutoESL AutoPilot tools include an overall score (in bold) followed in parenthesis by:
- The score for AutoESL AutoPilot only (the first score in parenthesis)
- The score for the Xilinx RTL tools only (the second score in parenthesis)
Since the DSP processor implementation used a single tool chain throughout the entire flow, a single score is provided.
Target Platforms
The Certification Program uses the Xilinx XC3SD3400A (Spartan-3A DSP 3400) FPGA combined with Xilinx ISE and EDK tools version 10.1.3 and the Xilinx XtremeDSP Video Starter Kit. The AutoESL AutoPilot high-level synthesis tool version 10.ft.d_Linux was used in conjunction with the Xilinx ISE and EDK tools to target this FPGA platform.
Spartan-3A DSPs are based on Xilinx’s low-cost Spartan-3A family, but have a number of enhancements to accelerate digital signal processing. Spartan-3A DSP chips have double the block RAM (“BRAM”) memory of other Spartan devices and incorporate hard-wired DSP data paths, called “DSP48A slices.” Each DSP48A slice contains an 18×18 multiplier with pre-adders and an accumulator, among other features. The XC3SD3400A includes 126 DSP48A slices that can be clocked at up to 250 MHz, and roughly 54,000 logic cells.
The target DSP platform was a Texas Instruments DM6437 DSP processor combined with TI Code Composer Studio tools suite version 3.3.83.13, Code Generation Tool version 6.1.9 and the TI Digital Video Evaluation Module. The Texas Instruments DM6437 includes a 594 MHz TMS320C64x+ DSP core along with a video processing subsystem (i.e., video hardware accelerators). The hardware accelerators would not be of benefit in implementing the BDTI Optical Flow Workload, and thus were not used by BDTI.
For More Information
For more information on the BDTI High-Level Synthesis Tool Certification program or to have your HLS tool certified, please call BDTI at +1 925 954 1411 or contact us via the web.