Single Print Only

n incredibly large number of
everyday products employ DSP
technology. The seemingly insatiable demand for digital wireless handsets leads the list, followed by networking
infrastructures, consumer electronics,
voice-over-IP products, industrial and
automotive control systems, and hard
disk controllers. Market pressures are
forcing makers of these products to
reduce cost, size, and power consumption, while continuing to add value and
differentiate their products.
Many of today's high-end multiprocessor
systems are built with backplane-based
architectures, such as VMEbus. Future
embedded multiprocessing architectures
will also employ hybrid system-on-chip
(SOC) architectures, which will incorporate both DSP and MPU cores. How
will these complex hybrid systems be
debugged?
Many of these products include a RISC
or CISC processor, in addition to the
DSP. According to the 1999 Embedded
Systems Study conducted by Beacon
Technology Partners f 11 the average number of different processor architectures per
embedded system has been rising for several years, and is now between two and
three. (See Figure 1).
Hardware integration and
software complexity
The design of today's embedded systems
is driven by stringent time-to-market
requirements. However, increased silicon
integration has the undesirable effect of
working against faster time-to-market. by
complicating the software development
and debug process.
A
le
ng
Si
I
10% of the embedded systems designed in
1999 employed eight or more processor
chips! (See Figure 2)
tO
in
Pr
The number of differentprocessor architectures per embedded
system is consotidating around 2-3 different embedded
processor architectures used per design
1
5 or More
An even more difficult problem is on the
horizon. According to the sample taken in
the same Beacon Technologies study, the
percentage of high-end multiple-processor systems is rapidly increasing - over
30% of respondents use three or more
processors per embedded design, and over
Multiprocessing increases the complexity
of hardware and software design and
debugging. When moving from a single
processor architecture to a multiple
processor architecture, a whole new set of
design factors must be taken into account
in order to make the new system as eftcient as possible.
y
I
4
Number of processor architectures per embedded design There are significant benefits to using
DSP chips in combination with CISC and
RISC chips. However, board designers
face difficult problems during the debug
phase of the project. DSP and CISCI
RISC technologies have evolved independently, and they are supported by different sets of development and debugging
1
3
nl
I
2
The SHARC DSP processor
In order to explore the problems encountered when designing multiprocessing
architectures, we will use the development
of a SHARC DSP multiprocessor system
as an example. However, many of the
same issues face developers designing
SOC-based systems.
DSP Engineering / Winter 2000 / I I
ciency, while the remaining portion of the
application software might be written in
C or C++.
Version control is vital
Due to the complexity of multiple processor systems, the project team typically
includes several people with complementary technical backgrounds, including:
Sift '
software engineers
silicon engineers
application experts
Si
Frequently the team members collaborate
from different locations, and even different time zones. With these concurrentlyrunning work locations, formal hardware
and software version control is essential to
efficient development.
ng
Number of rxocessors oer embedded desian
Figure 2
1 The SHARC DSP chip has many features
I
point-to-point architectures
shared bus architectures
Higher levels of integration have driven a
trend toward systems that integrate:
II
multiple processors
Ilargeamounts of memory
ri
? / DSP Engineering / Winter 2000
System-on-chip debugging is especially
difficult due to:
limited visibility into the flow of the
program
limited control over internal
operations
The application code is often
written in mixed languages
To further complicate the debug process,
DSP software is often written using both
assembly code and C or C++ code. The
most time-critical functions are written in
assembly language. for maximum effi-
Internal visibility is vital
Engineers need to have visibility into the
internals of the processors. including the
contents of control registers. memory
locations, and peripheral registers. However, with the very high functional densities of today's silicon devices, there are
only a few pins available to support test
and debug. This means that system
debugging must be done with a scanbased emulator. The SHARC DSP (like
many other processor chips) uses the
IEEE 1149.1 (JTAG) standard to scan
information out of (or into) the processor
chip, without adding significantly to the
pin count of the silicon device.
y
a
The SHARC also has an internal bank
of shared memory that can be accessed
through this external port. When several
SHARCs are interconnected on a shared
bus, the resulting unified address space
allows direct interprocessor accesses
betwen each others internal shared memory, as well as access to any external shared
memory that resides on the shared bus.
debug the interaction between the
processors
meet the real-time performance
requirements
nl
J
Shared bus multiprocessing
architectures
A shared bus multiprocessing architecture
uses the SHARC DSP's external port to
provide a connection to a single shared
bus that interconnects all of the processors. The external port provides a simple
glueless connection between up to six
Â¥^AR DSPs. and a host processor.
A debugger that has been integrated with
an instruction set simulator allows testing and performance benchmarking of
the software in the absence of the target
hardware. Clearly, an instruction set
simulator cannot test the system at full
speed, or in real-time. However, it can
provide useful information about performance and about the correctness of
the program's algorithm. When the
target hardware eventually becomes
available, engineers can begin integrating the hardware and software components, and testing the complete system
in real-time.
tO
I
Debugging often requires
multiple tool sets
Even more difficult is the debugging of
heterogeneous systems that also include a
RISC or CISC processor. Software and
hardware engineers are forced to use separate (and incompatible) debuggers and
tools, provided for each of the processor
architectures. In spite of this, they must
find some way to:
in
I
In point-to-point multiprocessing archilectures a dedicated communication channel is provided between each pair of
processors, using the SHARC DSP's link
ports. In most multiple-processor systems
each processing node will need to have
multiple
. .point-to-point connections. and
therefore multiple con~municationports.
The complexity of these systems provides
new challenges to software engineers. The
tools provided with the SHARC DSP do
support multiprocessor code development,
including a linker that supports the creation of executables for multiple processors, and for shared memory. However the
debugging of SHARC-based systems is
often difficult.
Pr
Point-to-point multiprocessing
1 architectures
J
I
complex peripheral communication
links
le
[
that are specifically designed to support
multiprocessing. For example. it can support:
The target hardware is
often unavailable
Due to stringent time-to-market requirements, software development often begins
prior to the availability of target hardware,
regardless of whether that hardware
employs discrete processor chips or SOC
technology. Concurrent software and hardware debugging can be an incredible iterative challenge. Everything should be done
to test each software component as it is
developed, in order to minimize uncertainties with respect to proper operation and
adequate performance.
Software bugs are often
difficult to localize
In multiprocessor systems, it's often difficult to track down the source of software
bugs, since the processors interact with
each other. For example, if one processor
sends an erroneous message to another, it
might precipitate an error that is exhibited
in another processor at a later time.
I
I
1
In order to use DSP code generation tools
along with RISCICISC code generation
tools, software development environments
should:
Ibe open and able to seainlessly use
tools from various suppliers.
W reduce the learning required, and
eliminate costly errors by providing
I
14 / DSP Engineering / Winter 2000
If a host-resident IDE is equipped to
send commands to target-resident debug
monitors (and if it can handle the monitor's particular communication protocols)
a software engineer can use the same code
generation and debugging tools as those
used for scan-based debugging.
If the target has a multitasking operating
system, the debug monitor should run as
an independent task, executing concurrently with the application software that is
being debugged. Because it runs concurrently with the application software, it can
process commands from the host-resident
debugger without halting the processor, or
the application program.
When the host tells the target-resident
debug monitor that it wants a breakpoint
set at a particular point in the application
program, the debug monitor copies and
retains the machine language instruction
stored at the corresponding memory location. Then is overwrites that memory location with an illegal instruction. When the
illegal instruction is fetched, the application program execution is temporarily
halted, and an exception handling routine
is invoked.
Scan-based emulation using a TAP can be
extremely helpful. It allows the host-based
debugger to discover (and control) the
internal state of the processor core. Should
the target system crash, the host system
can still collect data from the target system, to perform a post-mortem analysis. It
can then restart the target.
The primary disadvantage of the typical
JTAG on-chip debug implementation is
its intrusiveness on the real-time execution of the application software, since it
stops the processor each and every time it
needs to access the processor's internal
state.
Debugging systems
that don't have on-chip
debug hardware
Some target processors do not have a TAP.
Other processors do have a TAP, but it's
When the host tells the debug monitor to
resume execution of the application program, the monitor replaces the illegal
instruction with the original instruction,
and then resumes execution at the restored
instruction.
y
.I
However, target access and control can
still be achieved by using a target-resident
program called a debug monitor, which
communicates with the host computer
and allows the host to control the execution of downloaded application software.
Debug monitors have been widely used
for many years, and a properly designed
monitor consumes only a small percentage of the target processing power and
memory.
nl
I
only designed to verify the operation of
internal hardware during the chip's manufacture, and does not support software
debugging. In either case, host-to-target
access (using scan-based emulation) is not
an option.
tO
F
requirement #1: Combining
DSP and RISCICISC tools
Today's sophisticated hardware design and
simulation capabilities, licensable proces
sor cores, multiprocessing enabled devices,
and advanced silicon fabrication methods
allow the design
" of nowerful. cost-effective
hardware. However, the system software
s t must be developed and debugged, in
order to make such systems deliver on their
performance promises. Without tools to
develop, integrate, and optimize embedded
software, producing such a product within
cost and time-to-market constraints is
unlikely, if not impossible.
in
I
In summary, designing and integrating
software into a multiprocessor embedded
system based upon DSP and inicroprocessor technology can be quite complex and
challenging.
Debugging systems
that have on-chip debug
hardware
Many DSP and RISCICISC cores have an
on-chip serial Test Access Port (TAP)
which is compatible with the IEEE 1 149.1
JTAG specification. Through the use of a
scan-based emulator connected to this
TAP, the host-resident debugger can temporarily halt a processor at any point in the
execution of the application software.
Each time the execution is halted, the
debugger can access the contents of the
processor's on-chip hardware resources
(such as its registers, memory, or peripherals) through the TAP.
Pr
I
le
As SOC designs integrate more functions
into a single silicon chip, the buses between what were previously separate
functional units get integrated into the silicon, and the bus signals might no longer
be accessible at the package pins. This
prevents the engineer from monitoring
these signals for test and debug purposes.
Special design accommodations can be
made to provide better internal signal visibility. However, these accommodations
are often limited by the number of available pins on the SOC package.
I
Requirement #2:
Target access and control
Embedded target hardware resources are
typically limited to those needed to
support the application. As a result, most
development and debugging is done
with cross development tools, where the
Integrated Development Environment
(IDE) executes on a host computer system
(usually a PC or a UNIX workstation)
which is connected to the target system
through a communications link.
ng
Si
When an error is detected it is often helpful to capture the state of the entire system
- or to pause the system operation. This
allows the state of the system to be analyzed. If all of the processor cores are
equipped with compatible on-chip debug
support, this can be done with a scan-based
emulation tool. However, in heterogeneous
systems this might not be the case, and
customized 17ridgi11ghardware might be
needed to synchronously control the operation of the various processor cores.
the development team with a simple, intuitive, graphical user interface
to manipulate and coordinate all project files.
W include a powerful programming language editor that has been designed
specifically for writing software.
provide interactive compilation and
editing, to facilitate the location and
the correction of compilation errors.
Iprovide an open interface to industry
standard version control systems.
Iinclude built-in network support, to
make local and remote team development practical, as well as efficient.
Disadvantages to using a
target-resident debug monitor
There are a couple of disadvantages to
using a target debug monitor:
W The debug monitor shares target
resources with the application program, using a portion of the target
memory and processing power.
Since the monitor executes on the
same processor as the application pro-
I
gram, the application program might
wipe out the monitor when it crashes.
If the monitor is wiped out. the host
loses access to the target, thereby
making it practically impossible to
determine the cause of the crash.
Debugging without
prototype hardware
In most embedded development projects,
the target hardware is not available to the
software engineer during the early stages
of the project. However, time-to-market
demands often dictate that software development begin before the hardware prototype is available.
Si
1
I
I Most
1
simulators model only the processor core - they don't simulate the
peripherals of a highly integrated device
or system. However. they do provide
instruction-accurate or cycle-accurate
fidelity, and are thus very useful for validating logic, verifying algorithms, and
for measuring the processing resources
consumed by the execution of the application software.
' Provide
Requirement #3:
visibility into
I
I
timers
con~municationchips
Requirement #4:
Evaluate and optimize
system performance
To paraphrase an old saying , "A correct
answer that is not provided within the
required time limit is the wrong answer".
This is true for real-time software. It is
best to discover software inefficiencies
as early in the development cycle as
possible.
include a debugger architecture that
could be used to debug a wide range
of processor types, using either commercially-available tools or in-housedeveloped tools - all under a common
graphical interface.
allow a software engineering team to
create, debug, and manage revisions of
their software application.
provide simulators to allow application
software development to proceed without target hardware.
allow access and control of target hardware. using either ROM monitors or
emulators.
allow application programmers to evaluate and optimize the performance of
their software.
ASPEX: An open,
multi-core IDE
One example of such an open, multi-core
IDE is ASPEX by Allant Software
Corporation. It's design is based on experience gained from developing three earlier generations of tools for embedded
RISC and CISC n~icroprocessors.
ASPEX supports the debug of a wide
range of processors, including:
DSPs from Analog Devices
DSPs from DSP Group
DSPs from Motorola
DSPs from Texas Instruments
ARMIThumb processors
StrongARM processors
nl
Using DSP and
RISCICISC tools
The ASPEX ooen environment facilitates integration of a wide range of
code generation tools and utilities. Yet,
it still provides a uniform, graphical interface that tightly integrates the tools,
for ease of use. All of the following tools
are easily accessible from the main
debugger window, and operate together
seamless1y :
y
Typically, such analysis is an iterative
process that attempts to locate the portions of the application code that use the
largest portions of the processor time.
When a "hot spot" is identified, the application programmer optimizes that code to
"cool it down". This process can be iterated to progressively reduce the processing time needed to execute the application
program.
A debugger should facilitate this process
by allowing the programmer to capture performance data without having
to modify the application source code
or having to rebuild the application program. The debugger should also be able
to capture performance data when the
application program is running on a
simulator. It should also be able to import
and use performance data from logic
analyzers.
16 / DSP Engineering / Winter 2000
A proposed solution:
An open, multi-core IDE
How would you meet the four challenges
outlined above i f you were developing
a complex multiprocessor embedded
system? Ideally you would like to have
an open, multi-core IDE that integrates
an otherwise inconsistent and incompatible collection of software development and debugging tools. This IDE
would:
tO
integrated peripherals
Highly integrated SOC architectures (as
well as designs using discrete processors)
include peripherals that are integral to
the proper functioning of the product,
such as:
It should also assist the programmer in
defining the required values to write into
the internal chip configuration registers,
based on the desired configuration of the
internal chip resources.
in
J
external memory blocks
external memory-mapped
registers
bit Fields within external registers
enumerations for external register
bit fields
Pr
I
To allow this, a debugger should allow
configuration of the debugging environment to match the target system. For
example, it should allow the application
programmer to define and display:
le
J
However, most software debuggers
provide little or no visibility into these
peripheral devices. During the early
phases of application software development the application programmer must
either ignore them, or deal with them as
"black boxes". It would be highly beneficial if application programmers could see
how the peripherals interact with their
application software.
ng
The development and testing of application software can still proceed if the IDE
includes a target simulator that uses the
same code generation and debugging tools
that will eventually be used when the
target hardware becomes available.
special 110 chips
memory chips
ASIC chips
FPGA chips
the editor
the debugger
the code generation tools
the trace analysis
the project manager
the visualization tools
Figure 3 shows the Code Window (the
main debugger window) with two files
open for editing. Editing is done in the
debugger window, and can be done during
debugging sessions.
of the build as it progresses, and any
errors that occur during the build process
are displayed. Figure 5 shows the results
of the build in the Output Logging
Window.
If an error message appears in the Output
Logging Window during the build
process the user double-clicks on it and
the cursor is placed at the location of that
error in the source code of the Code
Window. The code can then be fixed, in
preparation for a new build. Figure 6
shows the cursor positioned to correct a
compile error.
After achieving a successful build of the
application software, the user can load the
resulting file and then debug it, by stepping through the source code in the same
window in which it was edited.
Si
The ASPEX debugger supports the debug
of multiple-processor, heterogeneous systems from a single debugger. It also provides extensive controls over program
execution of hybrid C/C++ and assembly
language programs.
tor ( i
f
-
0; i < 0x50000. i++)
ng
le
I
Figure 3
USTOM=defaul! Figure 4 shows the Project Settings
Window, from which a user can graphically specify the project's build settings.
This eliminates potential errors, since
it saves the user from having to learn
cryptic command string settings to specify tool options. The consistent graphical interface of the Project Settings
Window is particularly helpful when
using code generation tools from different manufacturers.
The ASPEX build facility knows about
the interrelationships of the files in the
project and it automaticallv determines
which files need to be recompiled, due to
editing of the source code since the last
compile. A project build is launched
from the ~ b o l smenu of the Code
Window. The Build tab of the Output
Logging Window shows the progress
18 / DSP Engineering / Winter 2000
aul t
Link Advanced Pre Post Link Aspex Commands y
I
ASSEMBLE=^^^
nl
I
Files can be opened by dragging them
from Windows Explorer and dropping
them on the Code Window. To open a second editor window for the same file, a file
tab can be dragged and dropped from the
Code Window to the Windows desktop.
tO
1
d Open Close
cation Histo: in
ri
Pr
The editor provides language sensitive
editing - source code is displayed using
color coding to distinguish keywords,
text, and comments. This allows the user
to quickly spot problems during an editing session. Compatibility is provided for
viy~rief,and visual c++.A user-supplied
editor can be easily integrated into
I
To support the development of large-scale
embedded programs, ASPEX includes
easy-to-use source code navigation features, including browsers that quickly
access program variables, as well as displaying their type, value. or definition. It
Figure 4
saaplearn c
-
0 warnings (+ 1 suppressed)
0 errors
Mik*: Brrca- cod* &2$. wbilaÑIcin 'ÈÈÈplÇçr
uk*: Error cod* 2ES, çhilÑkin 'rebuild
Coxnand exited with error or warning
255
Figure 5
1 s e r i o u s error
chip debug support. Figure 7a shows a
Code window for RISC processor, and
Figure 7b shows a Code window for a
DSP. (Figure 7a also shows the Register
window.) Note that the title bar of each
window identifies the processor.
£S
24
25
26
27
28
int main(int argc, char **argv) Debugging systems with
JTAG ports
JTAG-based core debugging typically
uses a 5-wire interface to serially shift
data in and out. (See Figure 8) The role
of the JTAG TAP during the debug
process is to provide access to the processor core's pipeline. This allows the
debugger to:
register unsigned int i = 0.
unsigned int j = 0. k
0;
-
meninit(),
for (i 0; i < 0x5000; i++)
Si
Some signals are
available to TAP via
periphery boundary
le
ng
Some signals pass
through the TAP
Â¥Ã
Figure 6
also allows the user to set breakpoints on
program variables, which display all of
this information.
and systems that mix JTAG and other
forms of on-chip debugging. It can simultaneously debug multiple DSPs and
RISC processors from a single instance
of the debugger, whether they are connected to single or multiple emulators.
It can also synchronize the starting and
stopping of all processors during the
debug process, within the limits of the on-
REDUCE TIME-TO-MARKET Figure 8
insert instructions for reading or
writing to registers
insert instructions for reading or
writing to memory
start execution
set watchpoints for tracing
stop and reset the core
nl
tO
in
Pr
Single emulator and
multi-emulator debugging
The ASPEX debugger supports both
multiple processor systems that have all
processors on the same JTAG scan path,
JTAG interface
In 5-wires
I
JUMPSTART YOUR DSP DESIGN
3-1/2 DAYS LECTURES & LABS
W I N A FREE COURSE
y
USE HARDWARE/SOFTWARE DEVELOPMENT TOOLS
LEARN DSP CHIP ARCHITECTURE
ASSEMBLY/C PROGRAMMING LABS
DSP WORKSHOPS, 814 SAN JACINTO BLVD., SUITE 200
AUSTIN. TX 78701.512-320-0032, [email protected]
Enter 19 on Reader Service Card
DSP Engineering / Winter 2000 / 19
with the host computer using a serial or Ethernet link. These monitors are often used on low-cost evaluation boards, to provide a debug capability without requir- ing an emulator. 32
call (
34
35
36 if ( ! 33d
1 > ~ r i n taai
I Host,
DSP
core ~
1
DIM detects and traps
reads and writes to
external memory.
le
Figure 7a
Figure 9
Each time the monitor writes to (or reads
from) this external RAM, the DIM detects
and traps this operation, notifies the Host
Debug Card (HDC) and then passes the
operation to the host through dual-ported
memory. The debugger (running on the
host) reads and writes to this same dualported memory.
The cable between the HDC card and the
DIM has signals that allow the HDC card
to stop and hold the core in debug mode.
Whenever this happens, the monitor in the
DSP core copies all register values to the
external RAM. Then it writes a message
to a mailbox location (indicating that it
has finished saving the DSP state) and
enters a loop, waiting for another mailbox
location to change. Eventually the debugger writes a request to this mailbox location, and the monitor then processes that
request.
I Figure 7b Under this model, the TAP also has access
to some of the bus signals
in the core,
including:
i-
changing the divider, becoming the clock
source, or by synchronizing with the clock.
4 the address bus
4 the data bus
4 the control bus
The 5-wire JTAG interface can also be
daisy chained with other compatible cores
to enable tightly-coupled multiple processor synchronized debugging.
Further, the TAP can be used to read the
core's debug state, and to place the core in
the debug state. It is also common to use
the TAP to control the core clock, by
Debugging systems with on-chip
monitors and non-JTAG ports
On-chip debug monitors typically reside
in the target memory, and communicate
20 / DSP Engineering / Winter 2000
y
nl
tO
in
Pr
RW 008136
i4=i6; RW:008137
modify (i dm(1n4,i4) RW - 008138
RW:008139
rlO=rlO+l . PRIMES\#27
estnum += RW:OO813A
r2=0x2, RW 00813B
r6=rb+r2, : = 0
PRIMES\#30
RW - OO8l3C
r3=0; RW.OO813D 073EOOFFFFE2
jump (PC. . PRIMES\#31 . . 3 2
exit
1
x more inferoation, select H& fms
ng
Si
40 w i d call(in%
41 f 42
static i n t HZCV FIQ IBQ STATE M
43 switch ine 0010 %IS 'DIS 32bis "=I
44
However, not all target debug monitors
use a serial or Ethernet link to communicate with the host computer. Some DSPs
include an on-chip monitor that stores its
data in memory on the external bus. The
external bus is then connected (via a
multi-pin connector) to dual-ported memory on a debug interface module (DIM).
(See Figure 9)
The on-chip monitor and external RAM
model described above cannot be daisy
chained with JTAG-compatible devices.
This presents a challenge to engineers
who need tightly-coupled, synchronized
debugging. Bridging hardware is needed
to provide tightly-coupled synchronized
multiprocessor debugging.
Debugging systems with on-chip
monitors and JTAG ports
The on-chip monitor approach described
above has been modified for some DSPs,
allowing the use of JTAG as a communi-
~
cation scheme. In this model the JTAG
TAP serves as the external data memory
for the on-chip monitor. (See Figure 10)
Whenever the monitor writes data to (or
reads data from) the external memory in
the JTAG TAP, the TAP withholds the
data transfer acknowledge signal, thus
keeping the DSP in wait states. When the
JTAG controller finishes scanning the data
written by the monitor (or read by the
monitor) in the DSP, it sends the data
transfer acknowledge signal to the DSP,
and execution resumes. Thus, instead of
the JTAG TAP controlling the execution
state of the core, the JTAG TAP can only
stoplhold the core when the monitor
1 writes or reads to the TAP.
I
-I
tightness of the synchronization varies,
depending on whether the debugging is
being done on the target hardware or a target simulator, and it also varies somewhat
from processor to processor. Functions
that can be synchronized include:
4
4
4
4
Start
Stop
Single-stepping
Cross-triggering breakpoints
Figure 11 shows how processors are
selected for synchronization in the debug
Manager.
Using the JTAG scan chain, the Start,
Stop, and Single-step operations can
be closely coordinated between the
processors. To achieve this level of
synchronization, ASPEX independently
"sets up" each processor to perform the
desired action, and then sends a final
"execute" sequence to all of the processors.
Since the "execute" is processed independently by each processor on the JTAG
scan path (daisy chain), there will be
a few nanoseconds of delay (skew)
between the start-up on each processor,
Si
!-Â Manaaers
BE?
I
le
ng
TAP accepts external
memory access
requests, from the core
and then withholds the
data transfer acknowledge signal until the
debugger (via JTAG)
shifts the value out (or in).
1
Figure 10
I
I
ASPEX supports several different kinds of
synchronization between processors. The
22 / DSP Engineering /Winter 2000
Figure 11
Tightly-coupled processors
When debugging the application software
on the target hardware, ASPEX uses
tightly-coupled synchronization, if it is
supported by the underlying processor and
emulator/monitor. This means (as discussed above) that the TAPSof the various
processors must be daisy chained together.
A major advantage that JTAG provides is
that all devices on a scan chain can be
started and stopped together. The tightness
of the synchronization might vary from
nanoseconds to microseconds, depending
on the JTAG model used by the processor.
y
,1
nl
1
1Code can be executed on multiple
processors, in lock-step fashion.
4 It allows the entire system to be
paused, allowing examination of the
state of each processor.
I It prevents data from being processed
(or lost) while examining the state of
one of the processors.
"sample. due" as
"sample. due"as
tO
'
Synchronized debugging
synchronized debugging is the ability to
have one processor control the execution
of another during the debugging process.
This can be useful for several reasons:
0x00020005
0x00020005
in
1
The on-chip monitor and connector wiring
model conform to the 5-wire JTAG interface. Thus, this model allows the DSP core
to be daisy chained with other JTAG-compatible devices, enabling tightly-coupled
multi-processor synchronized debugging.
However, the on-chip monitor with JTAG
communications model used in some DSP
cores requires that the JTAG controller
"peek" at the DR (data register) in the TAP.
This means that there can be only one such
DSP core in the scan chain, and it must be
the last device in the scan chain.
21062
21062
Pr
DSP core due to clock synchronization. If heterogeneous processors are connected together,
the skew might be as large as 100
nanoseconds, due to differences in the
start-up times.
When debugging the application software
on multiple target simulators, a special
plug-in (called SimBridge) can be used to
couple the simulators. This allows tight
coupling of:
shared memory access
synchronization of processor clocks
4 simulation of other operations
Each simulator runs independently,
in its own host thread, However, the
sequencing of all the simulators is
1 synchronized through the SimBridge
module. The worst-case skew between the
operations on the different processors is
somewhat dependent on the simulator phase-accurate vs. cycle-accurate vs.
instruction-accurate. However, it is typiI
cally on the order of nanoseconds, or less.
Multiple processor
synchronized debugging
ASPEX seamlessly supports debugging of
multi-processor systems because it was
designed from the ground up to do this. It
is designed to manage multiple processors
from a single instance of the debugger,
and thus can provide the necessary synchronization and execution management.
To keep from getting the computer's display too cluttered with debugger windows
when debugging multiple processors,
ASPEX allows the user to create attached
or unattached windows - or a combination of the two.
ng
Si
Cross-triggered breakpoints
In many processors, an output signal
(called a breakout signal) changes when
the processor stops or hits a breakpoint.
In many processors, this signal is can
be programmed to assert or not to
assert on a break or stop. Many processors also have an input trigger signal
that will stop the processor (and put it
into debug mode) when asserted. In some
cases the trigger is under program control
as well.
the selected processors, the execution controls (buttons and commands) of a processor's Code Window automatically control
all of the processors within that synchronization group.
If attached to a processor (or to a thread) a
window displays only information about
that particular processor (or thread). If
unattached, a window can display information about any processor (or thread) by
clicking on the Increment Connection
(Corm+) button in the Code window.
le
Â
When multiple cores (or chips) are crosswired correctly, it is possible to have one
core (or chip) stop the others. For example, when one processor hits a breakpoint
(or a watchpoint) it can stop all the other
processors. With the proper hardware support in the cross connections, the debugger can be made to control two or more
processors - configuring them to stop
each other as desired. In some cases, an
external mechanism (such as a TAP) can
be used to program this triggering and
control.
26 / DSP Eqineeri~lg/ Winter 2000
For example, given an array of eight
SHARC DSP processors, the user may
want to stop processors #3 and #7 when
processor #I hits a breakpoint, while
allowing the other processors to continue
running. The registers and the matrix logic
defined by the board or chip designer
should allow this flexibility.
Software requirements for
cross-triggering of breakpoints
If the control registers are memorymapped, the user can identifylcontrol
them using ASPEX's Extended Target
Visibility features, which are discussed
below. The user can then set the appropriate or desired values before executing the
application code.
-
an input signal pin to request that the
processor enter emulation mode
.
ASPEX Extended Target
Visibility ETV
ASPEX provides Extended Target
Visibility (ETV) by allowing board
designers (or SOC designers) to configure
.
.-.
Advanced Interrupt Controller
~FIQV.NFIQ.IRQV,NISQ.IRQID~
"
-1- pending*'
'FIQP.SBIP,USOIP,USlIP.TCOIP,TClIP"
TC2IP.WDIP,PIOIP.IRQOP,IIiQlP.IRQ2P"
*Irq mask*" +IQM.SBIK, OSOIM, OS~IM,TCOIM,TC~IM-
~TC2IM.BDIM.PIOIM,I~OM.IRQlM,IRQ2M" star Window Figure 12
-
y
i
Whether tightly-coupled, or loosely-coupled at the hardware (or simulator) level,
when a user tells ASPEX to synchronize
In either case, the signals to (or from)
each processor should be tied together by
a matrix that can be configured by memory-mapped control registers. These registers should allow the user to specify
which processor signals which other
processors.
nl
As there are communication delays
between ASPEX and each processor
(whether real or simulated) there will be
unavoidable delays (skew) in the synchronization between the processors. These
delays are usually on the order of milliseconds.
Hardware requirements for
cross-triggering of breakpoints
Most processors have:
As discussed above, to set up the target
hardware for the tightly-coupled scheme
to work, these pins must be cross connected between the various processors in
the array. When processor chips are used
in the target hardware, this cross connection is best accomplished with a CPLD or
FPGA. In a core-based SOC design this
external hardware can be avoided by
adding extra on-chip control logic.
tO
Loosely coupled processors
If the processors cannot be tightly coupled
with cross wired signals, and if the
SimBridge is not available to synchronize
' the particular simulators that are used,
ASPEX can still use a loosely-coupled
scheme for debugging multiple processors.
It does this by sequentially issuing the
appropriate commands to Start. Stop, or
1 Step each processor. When one processor
hits a breakpoint, ASPEX sends a "stop"
command to all the other processors.
in
I
Pr
Using this method, synchronization
delays are reduced to picoseconds. This is
extremely useful for tracking down hardto-find interaction problems within multiple processor systems.
Using unattached windows can conserve
display space, and make the debugging of
multiple processors (or threads) less confusing. In cases where it is necessary to
see information for two or more processors simultaneously, attached windows
can be used. A user can also combine
these techniques, since individual windows can either be attached or unattached.
an output signal pin to indicate when
the processor has entered emulation
mode
the ASPEX debugging environment with
detailed, hardware-specific information
about the target being debugged.
Information such as memory mapping and
wait states are specified graphically, and
then saved in an ETV file, which can then
be distributed to other users of the same
type of target. Distribution of ETV files is
especially useful when multiple RAMbased prototype targets are built prior to
the production of a final ROM-based
target. (ETV definitions also work with
simulated targets.)
The ETV feature allows a user to:
Si
When connecting to a target, the memory map is updated, based on the builtin knowledge about the standard part
and the definitions specified in the
ETV for the target.
When an executable image is loaded to
the target, the memory map is checked
to confirm that the memory locations
specified in the executable are valid. The
executable is then loaded to target memory and then read back, to verify a successful load as well as the presence of
working memory. The "auto" sections
of the memory map are also updated to
reflect the type of memory, based on the
information in the executable.
The memory map determines how
memory is displayed (colored) in the
Memory window.
le
ng
define and display external memory
blocks
define access widths and wait states
for external memory blocks
define external peripheral control
blocks
define external memory-mapped
registers, and bit fields within those
registers
termined by the application program. Basic
memory map editing can be done directly
through the Memory tab on the Managers
window. More complex definitions and
mappings can be specified using ETV.
Cross triggering registers
If the external control registers that
control cross-triggering of breakpoints
between processors are memory-mapped,
then the user can specify them using the
ETV facility. The user can then write the
desired values into them before executing
code.
Pr
Figure 12 shows that the user has defined
an Advanced Interrupt Controller as a target peripheral (AIC in the left pane) and
has defined the values and layout that will
be dynamically added to the Register
window.
-
ASPEX uses the Analysis window to
show trace information that can be collected from any of a variety of sources.
The information can be viewed in raw
form, or in a higher level view. The
Profiling view and the function entrylexit
pairs are especially useful for performance analysis.
y
Memory map
ASPEX also has built-in knowledge about
the internal memory map for standard processor chips. It is aware of what internal
memory blocks exist, and generally treats
external or undefined memory as being de-
*%it dig. UO,2 serial ports
!h-200ktfz-IfrbttAJD&D/A
Â¥- 40 MHz,2 ch,12-bi AID
4 ch 2
audio
nl
Figure 13
tO
Registers
As shown in Figure 13, ASPEX has builtin knowledge about the core registers and
other internal registers found in standard
processor chips. These are displayed in
the Register window. When additional
registers are defined using ETV, they can
also be added to the Register window,
where they can be displayed or updated,
just like standard registers.
Program Flow Trace
ASPEX provides a trace facility that correlates instruction-levelexecution with the
source code. Collection and display of
trace information is triggered by a notation inserted into the source code display,
using the GUI. Once the trace has been
acquired, the user can step through the
trace, either at assembly code level, or at
the source code level. It is also possible to
step backward through a trace, to identify
the root cause of a problem.
in
When connecting to a target that has
Extended Target Visibility definitions, the
Register Window, Memory Map, Memory
Window, and ASPEX internal access
methods are all enhanced to include this
information.
The Analysis window allows the user to
view the source code in the Code window,
and the disassembled trace data (representing its execution by the target system)
in the Analysis window. The user can then
step forward or backward through the
source code, simultaneously viewing the
corresponding disassembled trace data.
Trace data can be derived from several
sources, including:
LIIIG~ 1.1'
on neader oe~viceCard
DSP Engineering / Winter 2000 / 27
II
J
I1 I
Si
Instruction set simulators, which can
record the program counter content for
each instruction as the simulation runs.
The simulator can also record the cycle
count. A trace trigger can be used to
control when to start recording, and a
ring buffer can be used to show the last
"n" instructions when the buffer fills.
Hardware logic analyzers, which can
be attached to the processor that is
being debugged.
Breakpoint-based tracing, which is
intrusive, but allows for profiling of
algorithms, and also allows the execution flow to be examined.
On-chip trace buffers, which can show
the last "n" branches, or can be intrusively setup to recordall branch flow.
! The Analysis window also allows filtering
data
access type name (for profiling) The data can also be sorted. The find
menus allow searching for entries based
on criteria such as time (and time range),
Conclusion:
An IDE that is designed specifically for
embedded systems that employ multiple
DSPs and microprocessors can provide
engineers with a single, easy-to-learn and
easy-to-use tool set. However, debug sup-
port must be designed to adapt to available
hardware resources, and to maximize
debug fidelity when debugging multiple
processors. Customized visibility into the
target system exposes more of the system,
and streamlines the debugging process. A
trace and analysis capability simplifies
debugging, and helps the application programmer optimize real-time performance.
An open architecture allows developers to
use best-of-class tools, as well support
their own custom tools. With access to a
comprehensive toolkit, application programmers can focus on developing product features that differentiate their product
from the competition while meeting stringent time-to-market requirements.
References:
I'll Beacon Technoloev Partners. 4B Damonmill
Square, Concord, ~ z 0 1 7 4 2el:
, 978-371-3262,
Fax: 978-371-3288
Dan Jaskolski is
CEO and co-founder
of Allant Software
Corporation. Allant
specializes in providing debugging
solutions for embedded DSP, RISC and
SOC systems. Prior to founding Allant,
Jaskolski was with embedded tools supplier Microtec Research for 13 years as
Executive Vice President and Chief
Operating Officer. He holds a Bachelor's
degree in mathematics from La Salle
College and an MBA from the University
of Pittsburgh.
Ifyou have questions about this article,
or i f you would like to know more
about Allant's products you can contact
Dan at:
tO
in
Allant Software Corporation
1280 Civic Drive, Suite 206
Walnut Creek, CA 94596
Tel: 925-944-9690
Fax: 925-944-9612
Email: danj @allant.com
Web: www.allant.com
2 10 0 00 xca 2 9 0 1 1 . 9 1 meminit i
1
Figure 14
-
y
nl
^
By using the Trace capability of ASPEX,
a user can see the exact calling sequence
of all functions, whether they are still on
the stack or not. By clicking on the Func
tab of the Analysis window (See Figure
15) you can view all function entries that
occurred during the trace. You can then go
to the source code of any such function
just by clicking on it.
Pr
I'
Function tracing
Debuggers can provide the calling
sequence of all nested functions that have
not finished executing, because their calling address still resides on the stack.
However, once the function completes,
its calling address is no longer on the
stack.
le
the raw trace
code and data buffer elements
W a disassembly view a function entry/exit view H an execution profile I
Performance profiling
Typically, a very large percent of a program's execution time is consumed executing in a small percentage of its code.
Thus, the ability to identify "hot spots"
can be very helpful when a program's
performance must be improved. Figure
14 shows how a user can view performance profile information, such as a histogram of the percentage of execution
time for each function.
ng
The Analysis window uses tabs for quick access to different views of the data. including: address (and address range), type of
access, etc. Clicking on an entry will show
that location in the Code window.
Analy~sample-21065) = @SiinASO3:Sim [Unattached]
Figure 15
28 / DSP
Engineering
/Winter
2000 Not Licensed for distribution. Visit opensystems-publishing.com/reprints for copyright permissions.
© 2008
OpenSystems
Publishing.