Activities

No upcoming events

Intelligent Systems Lab Project: A framework for FPGA accelerated audio/video processing in intelligent systems

Participants

Supervisors

Motivation

Application Scenario

There are numerous situations in which high performance image processing is required but the energy budget is very limited.
An example would be the usage on mobile mini robots like the AMiRo. To process object detection as the basis for navigation and exploration a real-time vision processing with an optical flow detection is required.

Objectives

The project goals are

Description

The rapid prototyping platform RAPTOR-X64 (developed in the Cognitronics and Sensor Systems Group) enables the prototypic realization of complex microelectronic circuits based on FPGAs.
In our demonstrator the DB-V4 extension module with a Virtex-4 FPGA is used. A video stream to the RAPTOR-System is provided by a webcam or optional by a local video file via the host-computer. A Qt-based application running on the host computer displays the original video data and streams this data to the RAPTOR board for hardware-accellerated image processing. After the FPGA is done with the processing the manipulated image is transferred back to the application and displayed beside the original image.

System construction

Our application enables the user to select a filter to be applied onto the video stream. Filters can be applied onto the source image as well as the target image. This way it is possible to compare the hardware image filter with a software based version. The source image can also be scaled to any resolution without frame drops.

Graphical user interface

Results


The video shows a short demonstration of the system at work.
It is to note that the framerate show in the application is reduced due to the recording.

Discussion and Conclusion

Outlook

In the next term real hardware filters will be implemented as well as an integration for code generated through high level synthesis.
Furthermore in later projects the system can be implemented on a mini robot for example the AMiRo
or extended to work with faster RAPTOR sytems for example the RAPTOR-XPress in which the bottleneck of data-transfer is reduced.

Term 2

After we designed the system architecture and the GUI as well as implemented some image filters during the first term,
we implemented an AXI stream interface to communicate with the image filters generated through high level synthesis in the second term.
Additionally four different image filters were implemented using high level synthesis (HLS) and their performance was compared to the filters implemented in the previous term.

Complete system architecture

In our project we are using the DB-V4 module of the RAPTOR-System that contains a Xilinx Virtex-4 FPGA and a DDR2-SDRAM with 2(4) Gbytes, to temporarily store images.
With the help of the Embedded Development Kit (EDK) the main part of the system on the FPGA was built.
A Local-Bus connects the system with the host-computer and is used to transfer data between them. On the FPGA a Processor Local Bus (PLB) connects all components, so a bridge is used to transfer the Local-Bus to PLB. A Multi Port Memory Controller connects to the DDR, so it is possible to write/read data to/from it. After the data reach the DDR, they are send to the AXIStream Interface via LocalLink and AXIBridge. There the image data are processed by the HLS image filter and send back to the DDR and then back to the host-computer. Below, you see the system in detail.
System architecture based on the EDK System.

Image filter Algorithms

The four image filters we implemented during this term were: When the red filter is used only the red color channel remains in the result image all other channels will be removed and set to black.
With the inversion filter each color channel in the result image is the inverse of the input.
The grayscale filter creates a grayscale version of the input image by calculating the mean value of all color channels.
The sepia filter calculates every output color channel through the weighted sums of all input color channels.

Results

The framerates without any filters active have not changed since the first term but when compared to the image filters running on the embedded microprocessor
the HLS image filters provide a significant performance improvement.
PPC (Embedded microprocessor): ~0.2 fps
HLS (High level synthesis) : ~5 fps

Because of the PCI interface used by the RAPTOR system the possible framerate is limited to 5 - 6 fps which means the HLS filter are at the current fps limit.

For a 64 x 64 pixel image the calculation time can be approximated by:

64 x 64 pixel = 4096 pixel = 16 KByte

Writing in the DDR : 294 us
HLS Filter calculation : 73 us
Read from DDR : 524 us
----------------------------------
SUM : 891 us

Total time (Begin write, end read back): 972 us
Overhead : 972 us - 891 us = 81 us

--> ca. 1 ms per picture with a size of 16 KByte
--> 16 MByte/s Processing

An image with a resolution of 1280 x 720 pixel (921600 pixel) has a size of about 3 MByte
--> a maximum of 5 FPS possible


The video shows a short demonstration of the improved system at work.

Summary of the project time

In the first term we built a basic system architecture based on the EDK System. Additionally a graphical user interface was created to control and to review the reaction of the system. A further software aspect are the implemented image filters.
The expansion of the hardware via a AXI Stream interface was the main aspect on hardware side. By high-level-synthesis (HLS) we generated image filters. In the end an evaluation about generated VHDL code and handwritten code was carried out.

Outlook

In the future the performance of the system could be improved e.g. by using a different RAPTOR system with a faster interface.
Additionally more complex filters like sobel or motion detection filters could be supported.