Ph.D. Research Plan Presentation Anup Gangwar Embedded Systems Group (http://www.cse.iitd.ac.in/esproject) Department of Computer Science & Engineering Indian Institute of Technology Delhi June 11, 2002 Presentation Outline Introduction and motivation Specialization opportunities in VLIW processors Methodology Validation framework (supporting tools required) Work plan Status of work References Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 2 Introduction Why customize architectures? General purpose computing domain Vs embedded Customization leads to cheaper design solutions Architectural choices for exploiting ILP Superscalar processors Try to extract ILP at run time, so, complex hardware Limited clock speeds and high power dissipation Not suited for embedded type of applications VLIW processors Compiler has lot of knowledge about hardware Compiler extracts ILP statically, so, simplified hardware Possible to attain higher clock speeds Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 3 Introduction - Problems with VLIW Processors Complex compiler required for extracting ILP Adequate hardware support needed for compiler controlled execution Code size expansion due to explicit NOPs if, The application does not contain enough parallelism The compiler is not able to extract parallelism from the application Need for good instruction encoding and NOP compression schemes Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 4 Presentation Outline Introduction and motivation Specialization opportunities in VLIW processors Methodology Validation framework (supporting tools required) Work plan Status of work References Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 5 Specialization Opportunities -> FUs Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 6 Specialization Opportunities -> FUs (contd...) Functional Unit Types MISO or Multiple Input Single Output MIMO or Multiple Input Multiple Output MIMO with LD/ST or MIMOs with memory interaction Rigid or flexible I/O timeshapes NAME Inputs and Sources Outputs and Dests. I/O Policy MISO Multiple (Regfile) Single (Regfile) Flexible or Rigid MIMO Multiple (Regfile) Multiple (Regfile) Flexible or Rigid MIMO with LD/ST Multiple (Regfile or Mem.) Multiple (Regfile or Mem.) Flexible or Rigid for Reg. and block LD/ST for mem. Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 7 Specialization Opportunities -> Reg. File Single register file organization doesn’t scale well Area grows as N3 Delay grows as N3/2 Power grows as N3 where N is the no. of Functional Units connected to the register file Clustered VLIW architectures are the solution Each FU can read from/write to only a subset of registers Data copying may increase execution latency Powerful application analysis required to overcome above mentioned problems Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 8 Specialization Opportunities -> Reg. File (contd...) A Clustered VLIW Architecture Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 9 Specialization Opportunities -> Interconnect Clustering FUs together requires deciding ICN between different clusters between clusters and memory Analysis of data access patterns required for evaluating cost-performance tradeoffs Current ASIP vendors do not offer customizable interconnects Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 10 Specialization Opportunities -> Encoding Instruction encoding/decoding scheme affects Code size Object code compatibility Branch miss prediction penalty Hardware cost Address specification in code size Each UniOp is equivalent to a RISC/CISC instruction UniOp UniOp UniOp UniOp MultiOp Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 11 Specialization Opportunities -> Encoding IALU.0 ADD IALU.1 FALU.0 NOP FMUL (contd...) BU.0 NOP NOPs in a MultiOp VLIW Processor Pipeline with Instruction Decompressor Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 12 Specialization Opportunities -> Summary Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 13 Presentation Outline Introduction and motivation Specialization opportunities in VLIW processors Methodology Validation framework (supporting tools required) Work plan Status of work References Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 14 Existing Methodologies -> Simulation Driven Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 15 Task Set and Constraints Architecture Description Application Parameter Extraction Architecture Design Space Exploration Retargetable Compiler Instruction Encoding Specialization Validation (Simulation with encoded instructions) Architecture Description (Output to synthesizer) VLIW ASIP Synthesis Methodology Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 16 Presentation Outline Introduction and motivation Specialization opportunities in VLIW processors Methodology Validation framework (supporting tools required) Work plan Status of work References Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 17 Validation Framework -> Trimaran C Program Bridge Code IMPACT •ANSI C Parsing •Code profiling •Classical machine independent optimizations •Block formation ELCOR ELCOR IR •Machine dependent code optimizations Generated Simulator (Statistics) •Compute and stall cycles •Cache stats •Spill code info •Code scheduling SIMULATOR Generator •Register allocation •ELCOR IR to low level C files •HPL-PD virtual machine •Cache simulation •Performance statistics HMDES Machine Description Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 18 Validation Framework -> Trimaran (contd...) REBEL Low level C files C libraries Emulation Library Code Processor HMDES Native Compiler Executable for the host platform Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 19 Validation Framework -> Retargetable Assembler Instruction Encoding Description Toolkit Generator Generated Assembler Assembly Instructions Object Code To Simulator (for simulation with encoded instructions) Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 20 Presentation Outline Introduction and motivation Specialization opportunities in VLIW processors Methodology Validation framework (supporting tools required) Work plan Status of work References Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 21 Work Plan -> Interconnect/RF/FU Specialization Initially model the interconnect problem as ILP and later on move to other solutions Code selection problem in compilers is similar to identifying compute intensive parts for AFUs No. and type of FUs has not been properly explored RF clustering problem has not been dealt with elsewhere Jacome et. al. Deal with Interconnect/RF/FU specialization simultaneously Operation chaining is not considered Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 22 Work Plan -> Encoding/Decoding Specialization Goal is to be able to generate encoding schemes automatically Work of Shail Aditya et. al. Basically a parameterized encoding scheme Techniques especially for HPL-PD architecture Do not talk of dynamic code size minimization Encoding template is fixed exploration limited only to within the template design space Various encoding templates need to be explored, also the template itself may be derived from application Dynamic code size minimization needs to be considered Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 23 Presentation Outline Introduction and motivation Specialization opportunities in VLIW processors Methodology Validation framework (supporting tools required) Work plan Status of work References Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 24 Work Status -> Specialized FUs in Trimaran Modeling MISOs Model as external function calls Replace in Trimaran bridge code and replace with AFU op Model new AFU in MDES with the required ops Introduce the semantics in simulator op definitions file Modeling MIMOs Model as external function calls returning voids Replace in Trimaran bridge code and replace with AFU op Explicitly reserve registers in C-code for returning values Introduce operation semantics in simulator op definition file Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 25 Work Status -> Specialized FUs in Trimaran (contd...) Modeling MIMOs with LD/ST Model as regular MIMOs Memory interaction with block LD/ST at beginning and end of execute cycles Additionally Possible to impose register file constraints Various I/O timeshapes, rigid or flexible Possible to introduce pipelined functional units Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 26 Work Status -> Instruction Enc. in Trimaran Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 27 Work Status -> Instruction Enc. in Trimaran (contd...) New Jersey Machine Code Toolkit (NJMC) Deals with bits at symbolic level Can be used to write assemblers/disassemblers Specification in SLED (Specification Language for Encoding/Decoding) Model instruction decompressor in HMDES Instrument ELCOR to generate assembly code Encoding is done using procedures generated by NJMC Problems with NJMC VLIW instruction need to be broken up into 32 bit tokens Encoded instructions must end on 8 bit boundary Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 28 Work Status -> Code Gen. for Clustered ASIPs ELCOR Disadvantages ELCOR is heavily oriented towards HPL-PD architecture Does not support clustered VLIW architecture Advantages Strong optimizing compiler Rich library to deal with the IR IMPACT compiler system offers another choice for building a backend Feasibility study being carried out to fix a particular direction of work Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 29 Presentation Outline Introduction and motivation Specialization opportunities in VLIW processors Methodology Validation framework (supporting tools required) Work plan Status of work References Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 30 References Bhuvan Middha, Varun Raj, Anup Gangwar, M. Balakrishnan, Anshul Kumar and Paolo Ienne, “A Trimaran based framework for exploring design space of VLIW ASIPs with coarse grain FUs”, ISSS-2002. Anup Gangwar, M. Balakrishnan and Anshul Kumar, “A framework for studying the effect of VLIW processor instruction encoding and decoding schemes”, Mini Project, Dept. of CSE. M. Jacome and G. de. Veciana, “Design challenges for new application specific processors”, IEEE Design and Test of Computers-2000. B. Ramakrishna Rau and Michael S. Schlansker, “Embedded computer architecture and automation”, IEEE Computer-2001 Michael S. Schlansker and B. Ramakrishna Rau, “EPIC: An architecture for instruction-level parallel processors”, HPCA-2000. N. G. Busa, A. van der Werf and M. Bekooij, “Scheduling coarse grain operations for VLIW processors”, ASPDAC-1998. Shail Aditya, Scott A. Mahlke and B. Ramakrishna Rau, “Code size minimization and retargetable assembly for custom EPIC and VLIW processors”, ISSS-1999. Research Plan Presentation, June 11, 2002 http://www.cse.iitd.ac.in/esproject Slide 31
© Copyright 2024