ECE 448 Lecture 21 FPGA Platforms High Level Language (HLL) Design Flows ECE 448 – FPGA and ASIC Design with VHDL George Mason University Resources USB http://en.wikipedia.org/wiki/USB PCI http://en.wikipedia.org/wiki/PCI_Local_Bus PCI-X http://en.wikipedia.org/wiki/PCI-X PCIe http://en.wikipedia.org/wiki/PCI_Express ECE 448 – FPGA and ASIC Design with VHDL 2 Resources • Clive „Max” Maxfield, The Design Warrior’s Guide to FPGAs Chapter 11 C/C++ etc.-Based Design Flows Reconfigurable Supercomputing T. El-Ghazawi, K. Gaj, D. Buell, D. Pointer Tutorial at the Supercomputing 2005 conference http://hpcl.seas.gwu.edu/openfpga/tutorial_html/index.html ECE 448 – FPGA and ASIC Design with VHDL 3 FPGA Device Capacity Trends Virtex-5 550 MHz 24M gates* Xilinx Device Complexity Virtex-II Pro 450 MHz 8M gates* Virtex-II 450 MHz 8M gates Virtex-E 240 MHz 4M gates Virtex 200 MHz 1M gates XC4000 100 MHz 250K gates XC2000 50 MHz 1K gates XC3000 85 MHz 7.5K gates 1985 1987 1991 XC5200 50 MHz 23K gates 1995 Spartan-3 326 MHz 5M gates Spartan-II 200 MHz 200K gates 1998 1999 2000 2002 2003 2004 Year ECE 448 – FPGA and ASIC Design with VHDL Spartan 80 MHz 40K gates Virtex-4 500 MHz 16M gates* 2006 Source: http://class.ece.iastate.edu/cpre583/lectures/Lect-01.ppt 4 Prices of the most recent families of Xilinx FPGAs Low-cost High-performance Spartan 3 < $130* Virtex II, Virtex II-Pro < $3,000* Spartan 3E < $35* Virtex 4, Virtex 5 < $3,000* * approximate cost of the largest device per unit for a batch of 10,000 units ECE 448 – FPGA and ASIC Design with VHDL 5 FPGA families Low-cost Xilinx Altera Spartan 3 Spartan 3E Spartan 3A Virtex 6 Spartan 3AN Spartan 3A DSP Spartan 6 High-performance Virtex 4 LX / SX / FX Virtex 5 LX/LXT/SXT/FXT Cyclone II Aria Stratix II Cyclone III Aria II Stratix II GX Stratix III L/E Stratix IV E/GX/GT ECE 448 – FPGA and ASIC Design with VHDL 6 Virtex 4 Source: [Xilinx, Inc.] ECE 448 – FPGA and ASIC Design with VHDL 7 Virtex-5 Family Platforms ECE 448 – FPGA and ASIC Design with VHDL 8 FPGA Boards ECE 448 – FPGA and ASIC Design with VHDL George Mason University General Architecture of an FPGA-Based Board CLK I/O CARD Processing Element (PE#0) Processing Element (PE#1) Processing Element (PE#N-1) BUS LOCAL MEMORY LOCAL MEMORY LOCAL MEMORY BUS INTERFACE CONTROLLER COMMON MEMORY / INTERCONNECT NETWORK ECE 448 – FPGA and ASIC Design with VHDL 10 Reconfigurable Computing Boards • Boards may have one or several interconnected FPGA chips • Support different bus standards, e.g. PCI, PCI-X, PCIe, USB, etc. • May have direct real-time data I/O through a daughter board • Boards may have local onboard memory (OBM) to handle large data while avoiding the system bus (e.g. PCI) bottleneck ECE 448 – FPGA and ASIC Design with VHDL 11 Reconfigurable Computing Boards • Many boards per node can be supported • Host program (e.g. C) to interface user (and mP) with a board via the board’s API • Driver API functions may include functionalities such as Reset, Open, Close, Set Clocks, DMA, Read, Write, Download Configurations, Interrupt, Readback ECE 448 – FPGA and ASIC Design with VHDL 12 Universal Serial Bus (USB) It supports three data rates. • Full speed rate of 1.5 MB/s as defined by USB 1.0. • Low speed rate of 1.5 Mb/s which is also defined by USB 1.0. Very similar to full speed operation except that it takes each bit 8 times as long to transmit. Devices that run on the low speed rate are Keyboards, Mice and Joysticks. • High speed rate of 60 MB/s as defined by USB 2.0. 13 Digilent: BASYS • FPGA : Spartan-3E (XC 3S100E/3S250E ) in TQ144 • Price : $59 - $69 • Interfaces : USB port • Memory : XCF02 Platform Flash ROM • Ethernet : None • Configuration: Device configuration through JTAG via JTAG3 parallel cable or through USB using Digilent Adept Suite software. • Applications : Academic purposes as a teaching aid in digital logic design courses. • URL: http://www.digilentinc.com/Products/Detail.cfm?Prod=BASYS&Nav1=Products&Nav2=Prog rammable Digilent: Spartan3E starter board • FPGA : Spartan-3E (XC3S500E) • Price : $149 • Interfaces : USB3 port • Memory : XCF04 Platform Flash for storing FPGA configurations, 16 Mb Serial Flash, 128 Mb Strata Flash, 256 Mb DDR SDRAM • Ethernet • Configuration: JTAG programming via on-board USB3 port; JTAG and SPI Flash programming with parallel or JTAG USB cable • Applications : General Prototyping. • URL: : 10/100 Ethernet PHY http://www.digilentinc.com/Products/Detail.cfm?Prod=S3EBOARD&Nav1=Prod ucts&Nav2=Programmable Xilinx: Spartan3A starter kit • FPGA : Spartan-3A (XC3S700A-FG484) • Price : $189 • Interfaces : JTAG USB download board • Memory : 256MB DDR2 SDRAM, 32 Mb parallel Flash, 4 Mb Platform Flash PROM, 2-16 Mb SPI Flash devices • Ethernet • Configuration: Configuration via JTAG using USB port, Platform Flash PROM or SPI Flash Memory • Applications : General Prototyping. • URL: http://www.xilinx.com/products/devkits/HW-SPAR3A-SK-UNI-G.htm : 10/100 Ethernet PHY Common Interface - PCI PCI = Peripheral Component Interconnect 32-bit bus ECE 448 – FPGA and ASIC Design with VHDL 64-bit bus 17 Evolution of the PCI Interface ECE 448 – FPGA and ASIC Design with VHDL 18 Disadvantages of PCI & PCI-X: • Fixed Bus width which all the PCI devices in the system share. • No data prioritization. Important data could get caught in the bottleneck. • Interference and signal degradation common in parallel connections. • Poor materials and cross over signal from nearby wires translates into noise, which slows the connection down. PCI Express (PCIe): • Not a bus like PCI or PCI-X. Communication based on the concept of lanes. • A serial bi-directional point-to-point connection is known as a lane. • Full duplex bi-directional lanes. • Transfer rate of a single Lane is a single bit/cycle in each direction. • Different PCI lane configurations: x1, x2, x4, x8, x16, x32. • Prioritization of data which allows the system to move the most important data first and helps prevent bottlenecks. • Improvements in the physical materials used to make the connections. • Better handshaking and error detection. • Better methods for breaking data into packets and putting the packets together again. Xilinx: Virtex-5 LXT/SXT/FXT ML50x Evaluation Platform • FPGA : Virtex-5 LXT/SXT/FXT (LX50T/SX50T/FX70T-1FFG1136) • Price : $1,195 • Interfaces : x1 PCI Express; SFP, SMA, SATA connectors • Memory : DDR2 SODIMM (256 MB), 1 MB SRAM, 32 MB Linear Flash • Ethernet : x1 Tri-mode Ethernet port • Configuration: Through on board System ACE controller or PROM or Linear Flash or SPI Flash Memory. Can also be downloaded via JTAG through Xilinx download cable. • Applications : High speed design, DSP, Embedded design, Image processing etc. URL: http://www.xilinx.com/products/devkits/HW-V5-ML505-UNI-G.htm • Xilinx: Virtex-5 FXT ML510 Embedded Development Platform • FPGA : Virtex-5 FXT (XC5VFX130T-2FFG1738) • Price : $3,100 • Interfaces : x2 PCIe downstream connectors,x4 32-bit @33 MHz PCI connectors; x2 SATA connectors • Memory (512 MB) : 512 MB Compact Flash card, x2 72-bit DDR2 DIMMs • Ethernet : x2 Tri-mode Ethernet ports • Configuration: Through on board System ACE controller with the configuration files stored in the CF card. • Applications : Embedded design, High speed design, Digital video, Telecom/Datacom etc. URL: http://www.xilinx.com/products/devkits/HW-V5-ML510-G.htm • DINI Group: DN9000K10 'Bride of Monster' • FPGA : Virtex-5 LX330 (2 to 16 FPGAs per board) • Price : $125,000 (for 16 LX330s) • Interface : MEG cards available provide for PCI Express interface • Memory : 6 DDR2 SODIMM sockets (up to 4 GB in each) • Ethernet : None • Configuration: Configured via Compact Flash controlled by an on-board Cypress microprocessor or via USB. • Applications : ASIC prototyping of logic and memory designs for a fraction of the cost of existing solutions. • URL: http://www.dinigroup.com/DN9000k10.php FPGA Boards Conclusions • Boards with PCI Express are of much interest to the design community because of the high speeds they offer which will enable to prototype high speed serial systems. • PCI as a communication interface will soon become outdated in a few years as the need for ever increasing communication speeds and high bandwidth applications increases. • Boards with the PCI Express interface are relatively costly compared to those without it. • The price of the high performance Virtex family FPGA boards ranges from $799 - $125,000 and boards with the PCI, PCI-X or PCI-Express interfaces start from $1,195. • The price of the low cost Spartan3 family FPGA boards ranges from $59 - $2,100. Behavioral Synthesis ECE 448 – FPGA and ASIC Design with VHDL 27 Behavioral Synthesis I/O Behavior Target Library Algorithm Behavioral Synthesis RTL Design Logic Synthesis Classic RTL Design Flow Gate level Netlist ECE 448 – FPGA and ASIC Design with VHDL 28 Need for High-Level Design • • • • • • Higher level of abstraction Modeling complex designs Reduce design efforts Fast turnaround time Technology independence Ease of HW/SW partitioning ECE 448 – FPGA and ASIC Design with VHDL 29 Advantages of Behavioral Synthesis • • • • • • Easy to model higher level of complexities Smaller in size source compared to RTL code Generates RTL much faster than manual method Multi-cycle functionality Loops Memory Access ECE 448 – FPGA and ASIC Design with VHDL 30 Untimed C Domain SystemC (Non-implementation-specific) Timed C Domain RTL Domain (Implementation-specific) Verilog and VHDL (Implementation-specific) ECE 448 – FPGA and ASIC Design with VHDL Augmented C/C++ More abstract, less implementationspecific Pure C/C++ Different Levels of C/C++ Synthesis Abstraction Less abstract, more implementationspecific The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) 31 Pure Untimed C/C++ Design Flow Verilog / VHDL RTL User interaction and guidence RTL Synthesis Gate-level netlist ASIC target Pure C/C++ Pure C/C++ Synthesis Auto-generated, implementation-specific FPGA target - Non-implementation-specific - Easy to create - Fast to simulate - Easy to modify Verilog / VHDL RTL RTL Synthesis LUT/CLBlevel netlist The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) ECE 448 – FPGA and ASIC Design with VHDL 32 Mentor Graphics – Catapult C ECE 448 – FPGA and ASIC Design with VHDL 33 Mentor Graphics – Catapult C • Catapult C automatically converts un-timed C/C++ descriptions into synthesizable RTL. ECE 448 – FPGA and ASIC Design with VHDL 34 Hardware-Oriented High-Level Languages • C-Based System level languages • Commercial • • • • SystemC -- The Open SystemC Initiative Handel C -- Celoxica Ltd. Impulse C -- Impulse Accelerated Technologies Carte C – SRC Computers • Research • Streams-C -- Los Alamos National Laboratory • SA-C -- Colorado State University, University of California, Riverside, Khoral Research, Inc. • SpecC – University of California, Irvine and SpecC Technology Open Consortium ECE 448 – FPGA and ASIC Design with VHDL 35 Other High-Level Design Flows • Matlab-based • AccelChip DSP Synthesis -- AccelChip • System Generator for DSP -- Xilinx • GUI Data-Flow based • Corefire -- Annapolis Microsystems • Java-based • Commercial • Forge -- Xilinx • Research • JHDL – Brigham Young University ECE 448 – FPGA and ASIC Design with VHDL 36 SystemC -based design-flow alternatives Implementation specific, relatively slow to simulate, relatively difficult to modify Auto-RTL Translation Verilog / VHDL RTL RTL Synthesis Gate-level netlist SystemC SystemC Synthesis Alternative SystemC flows ECE 448 – FPGA and ASIC Design with VHDL 37 SystemC Evolution System Untimed SystemC 2.0 Algorithmic Behavioral/ Transactionlevel RTL SystemC 1.0 Timed The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) ECE 448 – FPGA and ASIC Design with VHDL 38 Handel-C Overview • High-level language based on ISO/ANSI-C for the implementation of algorithms in hardware • Allows software engineers to design hardware without retraining • Clean extensions for hardware design including flexible data widths, parallelism and communications • Well defined timing model • Each statement takes a single clock cycle • Includes extended operators for bit manipulation, and high-level mathematical macros (including floating point) ECE 448 – FPGA and ASIC Design with VHDL 39 Handel-C/ANSI-C Comparisons ANSI-C ANSI-C Standard Library Recursion Floating Point HANDEL-C Handel-C Standard Library Preprocessors i.e. #define Pointers Structures Parallelism ANSI-C Constructs Arrays for, while, if, switch Bitwise logical operators Logical operators Arbitrary width variables Enhanced bit manipulation Arithmetic operators Functions Signals RAM, ROM Interfaces ECE 448 – FPGA and ASIC Design with VHDL 40 Handel-C Design Flow Executable Specification Handel-C VHDL Synthesis EDIF EDIF Place & Route ECE 448 – FPGA and ASIC Design with VHDL 41 Type Summary Type Width char 8 bits unsigned char 8 bits short 16 bits unsigned short 16 bits long 32 bits unsigned long 32 bits int Compiler unsigned int Compiler int n n bits unsigned int n n bits unsigned n n bits ECE 448 – FPGA and ASIC Design with VHDL 42 Arrays • Same way as in ANSI-C int 6 x[7]; 7 registers of 6 bits wide unsigned int 6 x [4] [5] [6]; 120 registers of 6 bits wide • Index must be a compile time constant. If random access is required, consider using RAM or ROM ECE 448 – FPGA and ASIC Design with VHDL 43 Internal RAMs and ROMs • Using ram and rom keywords ram int 6 a [43]; a RAM consisting of 43 entries of 6 bits wide rom int 16 b [4]; a ROM consisting of 4 entries of 16 bits wide • RAMs and ROMs are accessed the same way that arrays are accessed in ANSI-C • Index need not be a compile time constant ECE 448 – FPGA and ASIC Design with VHDL 44 Restrictions on RAMs and ROMs • RAMs and ROMs are restricted to performing operations sequentially. Only one element may be addressed in any given clock cycle ram unsigned int 8 x [4]; x [1] = x [3] + 1; illegal if (x [0] == 0) x [1] = 1; illegal ECE 448 – FPGA and ASIC Design with VHDL 45 Multi-port RAMs static mpram Fred { ram <unsigned 8> ReadWrite[256]; (read/write port) rom <unsigned 8> Read[256]; (read only port) } Now we can read and write in a given clock cycle ECE 448 – FPGA and ASIC Design with VHDL 46 Handel-C Language • Each assignment and delay statement take one clock cycle • Automatic generation of the state machine from an algorithmic description of the circuit in terms of parallel and sequential blocks • Automatic scheduling of parallel and sequential blocks, that is the code following a group is scheduled only after that whole group has completed ECE 448 – FPGA and ASIC Design with VHDL 47 Handel C vs. C - functions Functions may not be called recursively, since all logic must be expanded at compile-time to generate hardware You can only call functions in expression statements. These statements must not contain any other calls or assignments. Variable length parameter lists are not supported. Old-style ANSI-C function declarations (where the type of the parameters is not specified) are not supported. main() functions take no arguments and return no values. Each main() function is associated with a clock. If you have more than one main() function in the same source file, they must all use the same clock. ECE 448 – FPGA and ASIC Design with VHDL 48 Celoxica Handel-C + very easy to learn and use + super set of ANSI C + hides implementation details + very flexible , no limitation in parallelism and data type, extended operators for bit manipulation + well-defined timing model + portable to a wide range of FPGA devices - legacy C code requires rewriting - each statement takes 1 clock cycle to execute 49 Handel-C Example x[n] void polyphase() { ram int IN_WIDTH pin0_0[2], pin0_1[2], pin0_2[2], pin0_3[2]; G0(z) 32 G1(z) 32 G31(z) z-1 ram int IN_WIDTH pin1_0[2], pin1_1[2], pin1_2[2], pin1_3[2]; ram int IN_WIDTH pin2_0[2], pin2_1[2], pin2_2[2], pin2_3[2]; 32 z-1 ….. z-1 while (1) { par { padd0_0[half] = (pmult0_0[half][15] @ (pmult0_0[half] \\ 7)) + (pmult0_1[half][15] @ (pmult0_1[half] \\ 7)); padd0_1[half] = (pmult0_2[half][15] @ (pmult0_2[half] \\ 7)) + (pmult0_3[half][15] @ (pmult0_3[half] \\ 7)); pmult0_0[half] = 0; pmult0_1[half] = -7 * (pin0_1[half][7] @ pin0_1[half][7] @ pin0_1[half][7] @ pin0_1[half][7] @ pin0_1[half][7] @ pin0_1[half][7] @ pin0_1[half][7] @ pin0_1[half][7] @ pin0_1[half]); pmult0_2[half] = 109 * (pin0_2[half][7] @ pin0_2[half][7] @ pin0_2[half][7] @ pin0_2[half][7] @ if (half) { par { output[0] ! (((padd0_0[1][9] @ padd0_0[1]) + (padd0_1[1][9] @ padd0_1[1])) \\ 3); ECE 448 – FPGA and ASIC Design with VHDL 50 Reconfigurable Supercomputers ECE 448 – FPGA and ASIC Design with VHDL 51 What is a Reconfigurable Computer? Microprocessor system mP ... mP mP memory ... mP memory I/O Interface ECE 448 – FPGA and ASIC Design with VHDL Reconfigurable system FPGA ... FPGA FPGA . . . FPGA memory memory Interface I/O 52 Most advanced reconfigurable computing machines currently on the market Machine Released SRC 6 from SRC Computers 2002 Cray XD1 from from Cray 2005 SGI Altix from SGI 2005 SRC 7 from SRC Computers, Inc, 2006 ECE 448 – FPGA and ASIC Design with VHDL 53 Pros and cons of reconfigurable computers + can be programmed using high-level programming languages, such as C, by mathematicians & scientist themselves + facilitates hardware/software co-design + shortens development time, encourages experimentation and complex optimizations + allows sharing costs among users of various applications - high entry cost (~$100,000) - hardware aware programming - limited portability - limited availability of libraries - limited maturity of tools. ECE 448 – FPGA and ASIC Design with VHDL 54 Two major high-level language (HLL) programming models SRC 6 & SRC 7 from SRC Computers SRC MAP C programming model Cray XD1 from from Cray Mitrion-C programming model SGI Altix from SGI ECE 448 – FPGA and ASIC Design with VHDL 55 SRC Programming Model Microprocessor function_1 FPGA main.c macro_1(a, b, c) function_1() function_2() Libraries of macros macro_1 macro_2 macro_3 macro_4 ………………………. macro_2(b, d) macro_2(c, e) VHDL FPGA function_2 I/O a macro_3(s, t) ANSI C Macro_1 macro_1(n, b) macro_4(t, k) c b Macro_2 MAP C (subset of ANSI C) Macro_2 d e I/O ECE 448 – FPGA and ASIC Design with VHDL 56 SRC Compilation Process Application sources Macro sources .mc or .mf files .c or .f files .vhd or .v files . HDL sources .v files mP Compiler Logic synthesis MAP Compiler Netlists .ngo files Object files .o files .o files Linker Application executable ECE 448 – FPGA and ASIC Design with VHDL Place & Route .bin files Configuration bitstreams 57 Library Development - SRC LLL (ASM) HLL (C, Fortran) HLL (C, Fortran) mP system FPGA system HDL (VHDL, Verilog) HLL (C, Fortran) Library Developer ECE 448 – FPGA and ASIC Design with VHDL HLL (C, Fortran) Application Programmer 58 SRC Programming Environment + very easy to learn and use + standard ANSI C + hides implementation details + very well integrated environment + mature - in production use for over 4 years with constant improvements - subset of C - legacy C code requires rewriting - C limitations in describing HW (paralellism, data types) - closed environment, limited portability of code to HW platforms other than SRC ECE 448 – FPGA and ASIC Design with VHDL 59 Application Development for Reconfigurable Computers ECE 448 – FPGA and ASIC Design with VHDL 60 Application Development for Reconfigurable Computers Program Entry Platform mapping Debugging & Verification Compilation Execution ECE 448 – FPGA and ASIC Design with VHDL 61 Program Entry Program ECE 448 – FPGA and ASIC Design with VHDL 62 Platform Mapping SW/HW Partitioning Program Software (executed in the microprocessor system) ECE 448 – FPGA and ASIC Design with VHDL Hardware (executed in the reconfigurable processor system) 63 SW/HW Partitioning & Coding Traditional Approach Specification SW/HW Partitioning SW Coding HW Coding SW Compilation HW Compilation SW Profiling HW Profiling ECE 448 – FPGA and ASIC Design with VHDL 64 SW/HW Partitioning & Coding New Approach Specification SW/HW Coding SW/HW Partitioning SW Compilation HW Compilation SW Profiling HW Profiling ECE 448 – FPGA and ASIC Design with VHDL 65 Platform Mapping FPGA mapping Program Hardware FPGA 1 FPGA 2 Software FPGA 3 FPGA 4 ECE 448 – FPGA and ASIC Design with VHDL 66 Platform Mapping FPGA-FPGA data transfer & synchronization Program Hardware FPGA 1 FPGA 2 Software FPGA 3 FPGA 4 ECE 448 – FPGA and ASIC Design with VHDL 67 Platform Mapping Use of Internal and External Memories Program Hardware FPGA 1 Software OCM FPGA 2 SM FPGA 3 OCM – On-Chip Memory LM – Local Memory SM – Shared Memory ECE 448 – FPGA and ASIC Design with VHDL FPGA 4 LM 68 Platform Mapping I/O Program Hardware OCM FPGA 1 Software FPGA 2 SM SRC StarBridge FPGA 3 FPGA 4 LM ECE 448 – FPGA and ASIC Design with VHDL 69 Ideal Program Entry Function Program Entry ECE 448 – FPGA and ASIC Design with VHDL 70 Actual Program Entry Preferred Architectures Use of FPGA Resources (multipliers, μP cores) Function SW/HW Partitioning Program Entry Sequence of Run-time Reconfigurations SW/HW Interface ECE 448 – FPGA and ASIC Design with VHDL FPGA Mapping Data Transfers & Synchronization Use of Internal and External Memories 71 Evolution and the current status of tools Not Supported Manual Entry Compiler Automated mP-FPGA Partitioning FPGA-FPGA Partitioning mP-FPGA Data Transfer FPGA-FPGA Data Transfer Computation-Data transfer Overlapping Choosing component version ......... ECE 448 – FPGA and ASIC Design with VHDL 72 Summary • Mapping algorithms onto reconfigurable computing systems is a parallel processing problem • Languages for reconfigurable computers range from high level C/Java to schematic to hardware description languages • Compilers face a daunting task - extract ILP, pipeline loops, unroll, trade-off area/speed • Current tool chains have many components unfamiliar to software developers ECE 448 – FPGA and ASIC Design with VHDL 73
© Copyright 2024