How to Bridge the Abstraction Gap in System Level Modeling and Design A. Bernstein, Intel Corp., Israel M. Burton, ARM, UK F. Ghenassia, STMicroelectronics, France Abstract— As more and more processors and subsystems are integrated in a single system, the verification bottleneck is driving designers away from RTL and RTL-like strategies for verification and design to higher abstraction levels. Increasing system complexity at the other hand requires much faster simulation and analysis tools. This is leading to new standards and tools around transaction level modeling. Languages such as SystemC and SystemVerilog are rich in behavioral and structural constructs which enable modelling designs at different levels of abstraction without imposing a top-down or bottom-up design flow. In fact, most design flows are iterative and modules at different levels of abstractions have to be considered. A more abstract model is very useful to increase simulation speed and to improve formal verification. SystemC and SystemVerilog stress the importance of verification support for complex SOCs including improvement for hardware verification as well as for the verification of hardware dependent software. In today’s design flows the software development can often only start after the hardware is available. This causes unacceptable delays for the software development. The idea of transaction level modelling (TLM) is to provide in an early phase of the hardware development transaction level models of the hardware. Based on these TLMs a fast enough simulation environment is the basis for the development of hardware and hardware dependent software. The presumption is to run these transaction level models at several tens or some hundreds of thousand transactions per second which should be fast enough for system level modelling and verification. D I. INTRODUCTION EVELOPING an Integrated Circuit today is a totally different experience compared to the same task a decade ago. It used to be that the specification of an IC could be defined adequately in terms of input-output relationship. The relationship might have been deterministic or statistical in nature but was self-contained. The development team could design and validate an IC based on these concise requirements. Designers were mainly concerned about executing the "RTL to GDSII" flow. With ever finer silicon geometries and densities this flow continues to be challenging today, however it now represents only a small portion of a bigger picture. In today's systems an IC can rarely be designed 0-7803-8702-3/04/$20.00 ©2004 IEEE. in isolation from the rest of the system. For example, in the world of cost sensitive Cellular and Handheld products it is critical to tune the architecture to meet performance, cost and power requirements. Incorporating RF, Processors, Graphics and Memory subsystems and running user applications on top of complex software stacks and Operating System, the real challenge is to optimize the "Antenna to Application" flow. Use of spreadsheets or hand calculations is not sufficient to cope with the complex hardware and software interactions. The system is so complicated that end-to-end modeling has become essential for the assessment of performance and effectiveness prior to design completion. II. DESIGN FLOW The ideal design flow would encompass product development from preliminary requirements definition to system performance measurements. Ideally the product requirement specifications should be captured in an executable form (a "program") and then continuously be refined to get the final product design. Such product will be correct by construction. Its performance will be formally traceable to the initial requirement specifications and its quality will be limited only by the quality of the tool managing the flow. In recent years this holistic approach is becoming usable in certain limited cases, one notable example being Processor Design [1]. However for the general case this is still not feasible today. The best known methods today still need to apply different tools to different steps of the product development flow and do not formally guarantee end-to-end equivalence. Understanding the gaps in the capabilities of today's flows is the first step in solving the problem. Examples from ARM, Intel (especially Intel's Cellular and Handheld Group), STMicroelectronics will show different aspects of this ideal flow and relate it to current status of design flows. Before the different applications are described Transaction Level Modeling (TLM) will be introduced. III. TRANSACTION LEVEL MODELING Transaction Level Modeling (TLM) addresses a number of practical system level design problems. The OSCI Transaction 910 Level Working Group (TLMWG) has identified a number of abstraction levels at which modeling can take place [3, 5]. AL: Algorithmic At the algorithmic level, there is no distinction made between hardware and software. SW: A Software View At this level, there is a division made between hardware and software. The model is at least suitable for programmers to develop their software. Of course, not every system will have programmable elements. This level is also referred to as the Architectural view. HW: A Hardware View Finally, this level has enough information for hardware engineers to develop both the device itself and/or the devices surrounding the device being modeled. It may not have the fidelity of the RTL, but enough for the hardware designer. This level is also referred to as the Micro-Architectural view. Orthogonally to this, two technologies have been suggested that people are using to build models. Models are either built in a “parallel” style, or a “sequential” style (we have avoided using the terms blocking and non blocking, as to different people, they mean different things!). Parallel models break functionality down into units that can be executed at the same time. Communication between these units is often achieved using a “FIFO” style interface. The expectation on this interface is that the initiator to target path and the target to initiator response path will be separate. In a sequential model, functionality is broken down into sub-functional blocks which call each-other in sequence. An initiator will call a target for a “service”, and receive a response directly. Communication between these units is often achieved using a function call interface. The function call typically transports information both from an initiator to a target and back from the target to the initiator. The expectation on this interface is that the initiator to target path will be combined with the target to initiator repose. (Of course it is totally possible to use either technology to replicate the other, but that serves no purpose other than to hide the problem!) In addition to these abstraction levels, several “types” of model have been described that operate over one or more abstraction levels and use one (or for PVT both) model technologies. In order to visualize them, it’s convenient to describe a modeling space (Fig. 1). The horizontal access is about how fine, or course grained the functionality of the mode is split. The vertical access is about the level of abstraction that the model attempts to represent. The types that have been identified are: 1. AL : Algorithmic 2. CP : Communicating Processes 3. PV : Programmers View 4. PVT : PV + Timing 5. TA : (Interconnect) Transaction Accurate 6. CC : Cycle Callable 7. RT : RTL level Functional Explore algorithms Concurrent processing Software Development AL CP Parallel PV Sequential PVT CC TA Hardware Development RTL Implementation Hardware/ Software Partitioning and Benchmarking Fig. 1: TLM space IV. TLM BRIDGING THE GAP Since similar technology can be used at many different abstraction levels, the issue is not how to bridge the abstraction gap, but how, or whether to mix simulations of different technologies. Using one technology, it is possible to write a model at the algorithmic level, and move all the way down to a hardware view. However, fundamentally the combined initiator, target initiator path of a function call paradigm is very different from the separate initiator target, target initiator duel paths of a FIFO style interface. Attempting to mix PV models with TA models is a tricky problem. One conclusion is to only use (say) A FIFO style interface. But there are difficulties with this approach. At this point it is worth re-visiting the key fundamental reasons that modeling is used to reduce time to market. There are three principle areas in which this can be achieved: early embedded software development, performance analysis and functional verification. Possibly the single biggest effect can be achieved by commencing early embedded software development activity while the hardware is still unavailable. In order to provide models of systems that are suitable for software engineers, the software programmer will have two principle considerations: Speed (Ideally “real time”) Ability of the environment to ensure that the software written will work on the hardware. At the heart of a model used for software development will be a programmable device. For these, using today’s technology, the speed of a model, encapsulating the same (software view) information written using a function call paradigm can be significantly higher than one using a FIFOstyle interface. Hence, today, the majority of models used for this task are based on function call technology. The second of the two requirements is interesting in its own right. It is often confused and misinterpreted as a requirement 911 to exactly replicate the hardware. In fact in some cases this is an inappropriate decision as it may be very difficult to find hardware/software interaction bugs on the hardware (or an exact replication of it). A model can potentially do “better” than the hardware for this task by specifically stressing the software. This will be explained in the example from STMicroelectronics in the following chapter in more detail. Multi-million gate circuits currently under design with the latest CMOS technologies do not only include hardwired functionalities but also embedded software running most often on more than one processor. In fact, they embed so much functionality of the system they are designed for that they have become systems on their own! This extended scope is driving the need for extensions of the traditional RTL-toLayout design and verification flow. These extensions must address 3 key topics: early embedded software development, performance analysis and functional verification. The embedded software accounts for more than half of the total expected functionality of the circuit and most often most of the modifications that occur during the design of a chip based on an existing platform are software updates. An obvious consequence of this statement is that the critical path for the development of such a circuit is the software, not the hardware. Enabling software development to start very early in the development cycle is therefore of paramount importance to reduce the time-to-market. At the same time, it is worthwhile noticing that adding significant amount of functionality to an existing core platform may have a significant impact on the real-time behavior of the circuit, and many applications that these chips are used in have strong real-time constraints (e.g. automotive, multimedia, telecom). It is therefore equally important to be able to analyze the impact of adding new functionality to a platform with respect to the expected real-time behavior. This latter activity relates to the performance analysis of the defined architecture. The functional verification of IPs that composes the system as well as their integration has also become crucial. The design flow must support an efficient verification process to reduce the development time and also to avoid silicon re-spins that could jeopardize the return on investment of the product under design. At STMicroelectronics, one direction to address the above issues is to extend the CAD solution proposed to product divisions, known as Unicad, beyond the RTL entry point; this extension is referred to as the System-to-RTL flow [4] (Fig. 2). As the current ASIC flow mainly relies on three implementation views of a design, namely the layout, gate and RTL levels, the extended flow adds two new views: TLM and algorithmic. Algorithm models the expected behavior of the circuit, without taking into account how it is implemented. TLM RTL Embedded SW V. TLM BASED DESIGN AND VERIFICATION FLOW AT STMICROELECTRONICS SoC Verification SoC Performance CUSTOMER SYSTEM SPEC Fig. 2: Unicad System-to-RTL flow SoC architecture (i.e. SoC TLM platform) captures all information required to program the embedded software of the circuit, using SystemC [5]. SoC micro-architecture (i.e. SoC RTL platform) captures all information that enables cycle-accurate simulations. Most often, it is modeled at the register-transfer level (RTL) using HDL or Verilog language. Such models are almost always available because they are used as input for logic synthesis. VI. DESIGN FLOW FOR HANDHELD MOBILE TERMINAL AT INTEL A modern handheld mobile terminal includes two major elements, the communication subsystem and the application subsystem. The communications subsystem – sometimes referred to as the "modem" – handles data flow between the antenna and the digital bit stream. The application subsystem can best be described as a general purpose computer platform, running user applications. Historically, the communications subsystem has been the more challenging to develop and received more attention. Its performance, being easily measurable, is dictated by international standards which are strictly enforced by powerful mobile carriers. Following many years of academic research and industrial experience, the art of modem design and validation has progressed to a stage where its performance can be specified, modeled and verified to sub-dB resolution. Starting from a floating-point model and moving to a fixedpoint model, the modem performance is simulated in the presence of a noisy channel. The fixed-point model is manually transformed into an implementation, partially DSP firmware and partially dedicated digital logic circuitry. The fixed-point model is bit-exact i.e. at the same abstraction level as the actual implementation. Written in plain C and running on a small Linux computer farm, the simulation speed is adequate to suit the development team needs. 912 The application subsystem environment is quite different. Its processing needs is not as regular and predictable and is heavily influenced by end-user compound usage scenarios. There are no pre-set minimum performance targets governed by a regulatory body nor are there any established benchmarks. Historically the approach was to bundle an available CPU core and memory subsystem and accept the resulting performance ("you get what you get"). While this was sufficient for the first generation of data-enabled phones, it is no longer adequate for modern 3G devices with their heavy multimedia workloads. As vendors differentiate themselves by optimizing in multiple dimensions (power, cost, speed), a proper modeling infrastructure becomes essential. Such infrastructure includes the following basic elements: modeling engine, model, workload collection and results analysis. The modeling engine is sometimes referred to as the "simulator". Modeling of silicon can be done at different abstraction levels. RTL modeling is very detailed and allows direct synthesis to gates and layout. RTL simulation is useful for validation at the module level and above, however the slow speed makes it is practically impossible to use a complete chip RTL for system simulations. At the other end of the possible range of abstractions is functional simulation, accurately modeling the instruction execution flow of a processor. This is much faster and useful for software development but not for performance analysis. In between these alternatives lies Transaction Level Models (TLM), which includes timing information. Mixed mode modeling is useful to allow the inclusion of RTL models into the system model. Although very slow, including a RTL unit allows cross verification of the two models. which can be characterized as "cycle counting accurate". This TLM provides higher simulation speed at the cost of slightly lower accuracy. It complements a commercially-available "cycle accurate" TLM. Having settled on the simulator and the model, the next step is to develop the input stimuli (workload). In the past system models were rarely constructed. Only very simple benchmarks, like Dhrystone or MPEG decode, were run on simple CPU models to assess CPU performance. Since small benchmarks typically fit inside the instruction cache, the impact of external program memory can be completely overlooked. To verify a Cellular handset will not drop a call or a graphics-intensive game will perform properly, it is necessary to port real world software and operating systems to the model. This places a lower bound on the simulation speed (around 1 MHz); operating system boot could not take more that a few minutes otherwise software debug becomes impractical. Porting large software packages must also be supported by a proper software debugging environment. Finally, having the system running, processing workloads and producing an output (for example a graphics frame image) enable performance optimization. To do this peeking into internal system nodes and resources is necessary. Typically information about interconnect and buffer usage is necessary and runtime statistics have to be collected. The analysis of this data would allow locating bottlenecks or identifying redundant resources. Since the SystemC model, at this point, is just a compiled C program it is tempting to think that generic C code debuggers can do the job. While true to some extent, a generic C debugger is not aware of the SystemC higher-level constructs. Dedicated, commercial SystemCaware analysis tools provide significant value at this step. VII. SYSTEMC APPLICATION AT CHG VIII. CONCLUSION Intel’s Cellular and Handheld Group (CHG) has chosen SystemC as their standard modeling language [2]. The technology of developing a functional model is familiar to most engineers and has been an established practice in the world of CPU design for decades. Likewise, RTL modeling is well understood and being used for silicon design. C is typically used to develop Functional models, and converting such model to SystemC is a simple exercise. Converting Verilog or VHDL RTL to SystemC is also simple (or can be avoided by mixed-mode simulations). Unfortunately neither functional nor RTL is the right abstraction level for effective system modeling. It is the TLM level that brings most rewards – but also poses most challenges in model development. Design engineers are not used to think at this abstraction level; trade-offs exist between development speed, accuracy and runtime speed. The required accuracy level, lower than 100%, has to be derived empirically. It is obvious that lower accuracy models are quicker to develop and run faster, but when Validation and maintenance costs are considered the decision complicates. TLM technology is evolving and there are no agreed classifications as yet. CHG developed its own TLM, The success criterion for any modeling-related investment is the impact it has on product architecture. To have an impact the analysis results must be available early enough in the product development process when tradeoff can still be made. In practice the chip design team will not wait for modeling results and will make ad-hoc decisions as needed to match the chip design and tape-out schedule Timely delivery depends on proper schedule planning, Modeling tool ("engine") selection and model development strategy. Schedule planning means the modeling activity has to start early. The point in which the design team is already struggling with system performance issues is too late to start and will not result in a real architectural impact. Therefore the ability to develop models in-house is essential. The turnaround for custom model development by third-parties is such that models quickly become stale. Attempts to bring the modeling engine development in-house were abandoned because the large investment could not be justified as SystemC tools became available outside. An early project engagement during the first requirements collection phase is supported by a standard SystemC-based tool infrastructure 913 and in-house model development for non-standard IP modules. TLM especially in connection with SystemC is a new modeling and design style to support this process. But the problem is as described above the mixing of different modeling technologies which is always very hard. Especially for the purpose of performance analysis and for verification lower level models with detailed timing are very important. One aspect of both verification and performance analysis is typically replacing components written at a higher level of abstraction with those written at a lower level, finally, RTL. In order for this to work, clearly, it is preferable to use a model that is written using similar technology like a FIFO based interface model. What becomes important for an IP providers perspective is to be able to support all the possible combinations of design flows and use models. Fortunately the TLM working group has only identified two basic technologies, a function call paradigm suitable for “PV” and “PVT” models, and a FIFO interface paradigm. IP providers like ARM can then satisfy most of the people most of the time by providing two basic classes of models. For EDA vendors, the challengers are not only to provide means by which models can be progressively refined through different abstraction levels, but also to provide means by which models of different technologies can be deployed. REFERENCES [1] A. Hofman, H. Meyr and R. Leupers, "Architecture Exploration for Embedded Processors with LISA", Kluwer Academic Publishers, 2002 [2] T. Groetker, S. Liao, G. Martin and S. Swan, "System Design with SystemC", Kluwer Academic Publishers, 2002 [3] W. Müller, W. Rosenstiel, J. Ruf (Eds.), "SystemC Methodologies and Applications,", Kluwer Academic Publishers, 2003 [4] A. Clouard, K. Jain, F. Ghenassia, L. Maillet-Contoz, and J. P. Strassen, “Using transactional level models in a SoC design flow”, in: [3] [5] “SystemC 2.0 Functional Specification” – Open SystemC Initiative, 2000 914