Digital signal processing; Blackfin DSP; uC Linux; code performance
V súčasnosti sa vstavané aplikácie využívajúce digitálne signálové procesory stávajú stále populárnejšími. Jedným z podstatných kritérií pri rozhodovaní o výslednej hardvérovej platforme pre danú konštrukciu je aj cena a dostupnosť vývojových prostriedkov. Dostupnou platformou pre implementáciu úloh spracovania signálov je rodina signálových procesorov Blackfin. Výrobca týchto procesorov (spoločnosť Analog Devices, Inc.) ponúka kompletné integrované vývojové prostredie pre vývoj programového vybavenia. Vysoká cena tohto prostredia však môže byť limitujúcim faktorom pri rozpočtovo citlivých aplikáciách. Tento článok analyzuje možnosti alternatívneho nástroja pre vývoj aplikácií pre procesor Blackfin na báze GNU, pričom dôraz je kladený na porovnanie komerčného vývojového nástroja a jeho GNU alternatívy. Práca rozoberá z pohľadu optimalizácie a výkonu výsledného kódu vhodnosť použitia nástrojov GNU pre aplikácie spracovania obrazu.
digitálne spracovanie obrazu; Blackfin DSP; uC Linux
Embedded applications constructers are facing the problem of appropriate microprocessor platform selection for their desired design. In the field of embedded signal and image processing the Blackfin® family of digital signal processors seems to be advantageous considering the performance ratio, RISC-like (Reduced Instruction Set Computing) microprocessor architecture and integrated peripherals. Usually implementation limiting factors are not only performance and unitary price, but also costs connected with development tools for given architecture. Analog Devices, Inc. as the Blackfin® processors family manufacturer offers optimized integrated development environment. In case of price-sensitive implementations (prototype development, special customer or private solutions), where it is needed to minimize the initial or projected costs, an alternative for standard commercially offered development tools must be utilized.
One of possible alternatives for commercial software are so called GNU projects - recursive acronym for GNU's Not Unix (the GNU part of "GNU's Not Unix" is the mascot for the GNU project - the gnu or wildebeest are grazing animals and live in herds on open grassland). From roots of GNU follows that all projects and software products are issued under GPL (General Public License) and its use is free of charge. Facts mentioned above mean that also a set of GNU tools for Blackfin® processors (so-called GNU Blackfin® tool-chain) is free. In most cases the term GNU is connected with operating system Linux. But this fact doesn’t restrict users only to use it for compilation of code that will run under operating system, but it is also possible to use it for compilation of a standalone code.
For the stand-alone (no operating system) code compilation, possible limitations, restrictions and results of performance tests are presented in the following text. Conclusions presented in this paper should help constructer to consider the use of native development tools (VisualDSP++) or alternative GNU tools for desired application development.
The Blackfin® GNU tool-chain is a set of software tools (primarily running under operating system Linux) which allows users to compile and link a code for the Blackfin® processor. Despite the fact that the main group of applications compiled with these tools is intended for running under operating system, also software that runs with no operating system and applications which are operating systems (e.g. Linux kernel, U-Boot) can be compiled. This last mentioned special case of GNU compiler use is interesting especially for developers devising applications running without the need of operating system. In the following text, it is assumed that the target application is stand-alone and running without operating system.
The Blackfin® GNU tool-chain consists of three groups of software tools:
Each of the above mentioned tool groups involves following programs:
Let’s assume that our project involves more than one source file and the target architecture on which it would run is Blackfin® BF-532 processor with no operating system. Source files are written in C language (regard given processor architecture). The best way how to save time during application development (the code is usually compiled and built more than one time) is to prepare a so called “makefile”. This file is used as a source script for the “make” utility. Make is a powerful utility which can be used to generate series of commands to be executed by the shell. More about make and makefile can be found in . Example content of a makefile is written below.
Syntax of the previous listing is written according to the makefile script standards . For the final stand-alone application it is necessary to use bfin-elf tools set. CFLAG definition consists of a set of compiler parameters – mostly code optimization settings (in this case chosen the maximal speed optimization). LFLAG definition consists of parameters for linker - file bftiny.x includes linker script for the BF532 tiny memory model. Final product of linker – binary file (with extension *.bin) must be converted into the loader file (*.ldr) which can directly run on the hardware (without operating system). LDR utility can be found as a part of the UBoot project . There is one more source file in the final code compilation – startup.asm. This assembler file contains value settings of basic registers, clock frequency settings, interrupt settings and other needed startup code (depends on target application). Finally also jump to the starting point of the C code – to the main() function is included in the startup.asm. According to the rules written in a linker script – startup.o is always linked as a first piece of code which is starting at the program counter starting address (L1 program memory).Using of standard library
When the project is compiled for run under operating system Linux, it is assumed that all standard libraries will be linked dynamically. When the code is compiled for stand-alone application with no operating system, static libraries linkage is needed. This fact affects the final compiled code size. If the project code size is also critical and it is not possible to avoid the use of standard library, it is suitable to compile and prepare an own clone of this library using its source codes . This way allows user to omit parts which are not used in given project and to have the final code size under control.
For comparison between the commercial compiler VisualDSP++ (version 5.0, Analog Devices, Inc.) and the GNU Blackfin® tool-chain a set of code performance tests was made. During these tests both compilers had equivalent conditions – the same source code and target hardware. As the target hardware a simple embedded construction (evaluation board) with processor BF-532-SBST400 was used.Simple evaluation board for Blackfin®
For performance testing between native and alternative source code compiler a very simple evaluation board with Blackfin® processor was used. Figure 1 is depicting a basic block diagram of the basic Blackfin® processor connection only with the necessary external peripherals needed for its function. Processor options are advanced with external SRAM and USB communication implementation. Application source code is stored in an external SPI EEPROM and booted to processor after reset. As the core the Blackfin® BF532-SBST400 processor is used on 400 MHz.
Fig. 1 Simple Blackfin evaluation board block diagram
Fig. 2 Developed evaluation boardBenchmark source code
Authors of this contribution are dealing with embedded image processing applications and therefore used some basic algorithms from image processing field for the final compiled code efficiency testing.
First tested algorithm is a simple pattern generator that is filling 2D array 664x504 bytes with sequence of numbers from 0 to 255. Virtually if this array is depicted as an image, it represents a set of vertical strips in 255 shades of grey. Output data of this algorithm are used as input (image data) for all following testing algorithms.
Second algorithm is a simple image histogram calculation. It consists of cells values incrementation in 1D 255 bytes array that represents 255 bins of histogram.
Third chosen algorithm is a simple image “thresholding”. This algorithm is based on value by value comparison to a given threshold level. If the compared value is below the threshold the new value is 0, if above the threshold, the new value is 255. Fourth tested algorithm is a simple centroid (center of gravity) calculation , where I(x,y) represents value in input array at position x, y (pixel brightness):
Last algorithm used for compiler efficiency comparison was an implementation of the edge detector  (enhanced 2D linear filtration – 2D convolution), detecting rising and falling edges horizontally and vertically orientated according to the following basic convolution formula:
The same piece of code was compiled using makefile described in section 2 (GNU tool-chain) and using native VisualDSP++ development tool. Both codes were running on the same hardware under the same conditions. Processor was operated at the core frequency 400 MHz with external data memory running on 100 MHz. The obtained results (execution times of described algorithms) are listed below.
Fig. 3 Benchmark results in non OS environment
GNU tool-chain is mainly used for compilation of code that runs under operating system Linux, so it seems to be interesting to test the performance under these conditions.Target hardware and operating system
As the target hardware for benchmarks with the use of the operating system a Blackfin® video processing evaluation board was used (Figure 4). Used hardware is based on DSP ADSP-BF532. The main board is equipped with 32Mb SDRAM, USB2.0 controller, 10/100Mbit Ethernet controller and SD/MMC expansion slot. The core clock is 400 MHz and the system clock 133 MHz. Used operating system was uCLinux version 2.6.19-ADI- 2007R1.1. Code compiled using VDSP++ was tested on the same board and configuration. Benchmark results are presented below.
Fig. 4 uCLinux video processing evaluation board block diagramBenchmark results
Fig. 5 Image processing code running under OS uCLinux vs. VDSP++ code execution times
According to the obtained benchmark results (Figure 5) the code running under uCLinux and compiled using GNU tool-chain seems to be faster than the code compiled with VisualDSP++. Interpretation of this phenomenon could be that operating system uCLinux uses internal CACHE memory by default, so the external memory access is more effective and consequently faster. Code compiled under VisualDSP++ was not programmed with the support of CACHE and delays caused by external memory accesses are much longer than delays caused by running operating system services in multi-thread uCLinux.
The main goal of this work was to find an alternative (free) solution for Blackfin® processors code compilation. One of the possible alternatives is the GNU tool-chain, which is issued under GPL license and is free. Main question about the necessity of operating system for GNU compiled code run was answered and the way how to use it without the need of operating system uCLinux was presented.
The second important parameter in signal processing applications is the code execution time. Comparison between the commercial product – native compiler VisualDSP++ (Analog Devices, Inc.) and suggested GNU tool-chain final code efficiency was presented in section 3 (Figure 3). General conclusion of this performance tests is that for chosen algorithms the GNU tool-chain produces less effective code than VisualDSP++ when running in non-OS environment. One of the very important facts that affect the result is the absence of hardware loops implementation support in the GNU tool-chain.
In general it is possible to use GPL alternative tools (GNU tool-chain) for development of stand-alone Blackfin® applications that are not running under any operating system. If application is time critical it is better to use native development tools. GNU tools are optimized for execution under operating system, so it is better to first implement the appropriate version of uCLinux for the developed application. Operating system in embedded application will enhance its options (the price for higher hardware complexity is reasonable) and GNU tools will work more effectively. For example the benchmark algorithms presented in section 3 were compiled for execution under uCLinux and run on a simple Blackfin® embedded hardware (section 4). Operating conditions were almost the same (processor model, core frequency). Execution times of several algorithms were fully comparable with results obtained for code compiled using VDSP++ (Figure 5). Explanation of this phenomenon can be in the use of internal processors CACHE. Operating system and all applications running under it are utilizing all CACHE advantages automatically. In presented performance tests, CACHE was omitted.
Based on all acquired results, in general, it is possible to use benchmarked GNU-tools for development of signal processing applications not only in OS environments, but also in non-OS environment (with lower compiled code performance).
Research described was supported by research program No. MSM6840770015 “Research Methods and Systems for Measurement of Physical Quantities and Measured Data Processing” of the CTU in Prague, sponsored by the Ministry of Education, Youth and Sports of the Czech Republic and by Czech Agency Grant GA 102/09/H082. Development tools donated by Analog Devices Inc. were used in this work.
 main [Blackfin Linux Docs], [Internet] 28.3.2009, http://docs.blackfin.uclinux.org/
 Das U-Boot for the Blackfin Processor, [Internet] 28.3.2009, http://blackfin.uclinux.org/gf/project/u-boot
 uClibc, [Internet] 28. 3. 2009, http://www.uclibc.org/
 Gonzalez R. C., Woods R. E., “Digital Image Processing”, Ed.3,Prentice Hall, 2007
 M. Egmont-Petersen, JHC. Reiber, “Accurate object localization in gray level images using the center of gravity measure: Accuracy versus precision”, IEEE Transactions on Image Processing, 2002, Vol 11, Issue 12, pp 1379- 1384, doi: 10.1109/TIP.2002.806250