How to Reduce Product Launch Delays Through Systematic Thermal Design and Validation DegreeC White Paper: How to Reduce Product Launch Delays Through Systematic Thermal Design and Validation Introduction The need for higher product reliability has driven thermal design to the fore-front in the development of electronics components and systems. It is well known that the reliable operation of an electronic device depends largely on its operating temperature – the lower the operating temperature, the longer the longterm reliability of the component. Numerous tests and empirical measurements have confirmed that component failure rates are exponentially related to temperature due to a number of thermally-dependent failure mechanisms. Not only that, any excess in temperature beyond the design limit would result in an instant failure of the component and hence the rest of the system. Thus thermal design has steadily gained importance as one of the two primary considerations in the design and packaging of electronic components and systems – the other being the actual product functionality. Thermal design is an all-encompassing discipline targeted to the management of heat generated by electronic circuitry. Thus it is involved in the packaging of electronics at the chip or device level, at the board level, at the chassis or enclosure level, and at the room level. At every level, the rapid and continuing growth in packaging density have posed enormous challenges to electronics cooling, and increasingly there is a need for thermal design to take a synergistic or holistic approach that attempts to squeeze increasing efficiencies at all packaging levels collectively and at every level individually. There are three paradigms at work that make thermal reliability the primary consideration in system-level packaging today: 1. 2. 3. Increasing clock rates and functional integration at the chip level following Moore’s law, leading to higher power dissipations per component. Figure 1 shows the recent trend in high performance chip power dissipation and chip heat flux respectively. Increasing packaging density at the system level, driven by the demand for higher performance, higher bandwidths, faster communication, and wider range of services offered per box. This in turn has resulting in ever increasing heat densities across all product platforms. Figure 2 shows the historical and expected heat density trends of various types of datacomm equipment. Maximum chip temperature requirements have not changed much even as the power dissipation has been increasing rapidly. The vast majority of commercial-off-the-shelf (COTS) components still have long-term operating temperatures in the range of 85-105°C, and maximum operating temperatures not to exceed 125°C. Most of the Central Processor Units (CPU) that is the core of the majority of systems cannot exceed 72°C case temperature without throttling. There has been a lot of research on high temperature electronics for harsh environments such as under-the-hood automotive applications where ambient temperatures can reach as high as 125°C, but by and large the vast majority of the components used in electronics packaging are COTS components. Chip Heat Flux(watts/cm 2) 200 Max - 2002 150 Max - 2000 Min - 2002 100 Min - 2000 50 0 2001 2003 2006 2009 2012 Year Figure 1: High Performance Chip Heat Flux Trends (source: NEMI Technology Roadmaps) Page 2 of 12 DegreeC White Paper: How to Reduce Product Launch Delays Through Systematic Thermal Design and Validation Figure 2: Power Density Trends at Facility Level (source: ASHRAE TC9) The Basic Electronics Thermal Management Task The basic heat transport phenomena related to cooling an electronic device is schematically shown in Figure 1. The electronic device could be a packaged silicon chip that processes information, a power supply that transforms and conditions power input to other devices on a board or chassis, an optical transceiver, or a light emitting diode (LED). As a by-product of performing its function, the input power supplied to the device is eventually transformed into waste heat that serves to raise the temperature of the device. In order to prevent the device temperature from exceeding a known limit which will cause it to fail, the generated heat must be transported away from the device, eventually into the environment as the ultimate heatsink. Between the device and the environment, there can be any number of intervening hardware and media intended to cool the device and bring the device temperature as close as possible to the environment temperature, and even sometimes necessarily below the environment temperature. T∞ Tmax ≥ Tj > T∞ qc qk qr Q& qrb qcb Device qkb Tj qkb qcb qkb qrb Tb Board Figure 3: Heat generation and dissipation in an electronic device mounted on a board. & ) is generated at a relatively small portion of the device called the junction. Part As shown in Fig.3, heat ( Q of the heat is conducted from the device package into the board (qkb) which is ultimately lost into the surrounding media by convection (qbc) and radiation (qbr). The remaining portion of the heat is lost through the surface of the package by any combination of convection (qc), radiation (qr), and conduction (qk) Page 3 of 12 DegreeC White Paper: How to Reduce Product Launch Delays Through Systematic Thermal Design and Validation depending on the prevailing local environment around the device and the cooling hardware. The device would generally be at an elevated temperature above the ambient temperature (T∞). The device temperature is highest at the junction (Tj) and gradually falls across the spreading resistance of the package. The basic thermal management task is to maintain the junction temperature (Tj) below a known limit commonly referred to as the maximum operating temperature. With this as the primary objective, the design of the package to withstand the elevated temperature and distribute the generated heat, the metrics and characterization of the generated heat, the selection and design of the intervening media and hardware to transport the heat, and the impact of the environment temperature and heat transport on the short- and long-term reliability of the device, constitute the core scope of electronics thermal management. The Need for a Systematic Approach to Thermal Management In an electronics system that contains hundreds or thousands of electronics, each device would have its own operating temperature limit. In such situations, the task of maintaining every device below its own temperature limit could become quite daunting, and requires a great deal of sophistication and experience in finding a solution. Fortunately, rapid advances have been made in the tools and methods of thermal management. Sophisticated computational tools are now readily available to simulate and model the enormous complexity of heat transport phenomena in electronics systems for the purpose of realizing an optimal design, and advances in testing hardware and methodologies, as well as definition of standards, have made it possible to troubleshoot and verify the field performance of systems before they leave the door. Beyond product reliability concerns, there are other equally important factors that are primary considerations in the packaging of electronics. Aggressive competition has lead to equally aggressive cost squeeze across all product segments and services. Thus even though the cooling hardware is traditionally a small percentage of the cost of a product, the thermal designer is nevertheless under the same price pressure as others to add to the bottomline. This often limits the number of viable choices for the cooling hardware and demands that a careful and systematic approach be taken to design and optimize cost-effective solutions. An equally important consideration in thermal design is the short design cycle. Shrinking product lifecycles (especially in the commercial sector) make time-to-market one of the primary factors in datacomm equipment design. Datacomm equipment refreshes occur roughly every 1.5 years. Thus engineers in the datacomm arena do not have the luxury of long design cycles traditionally obtained in other market segments. In the datacomm arena the time available from concept to market introduction is very short, requiring that the thermal design be done right the first time – there is virtually no room for re-design to make corrections. Due to the short design cycle, thermal design must be carried out concurrently with the rest of the product development effort, as shown conceptually in Fig. 4. Page 4 of 12 DegreeC White Paper: How to Reduce Product Launch Delays Through Systematic Thermal Design and Validation Po r we Concurrent Engineering Fun Ele ctiona c Lay trical l Desi gn out , , Sof twa re Procurement l ica ys ign h P es D Th e De rmal sig n Figure 4: Concurrent product development of a datacomm equipment. Due to reliability concerns, thermal design is normally the lead-in to the physical design activity once the initial product idea has been actualized into a workable concept. This initial prototyping thermal design is required to define or validate the overall physical architecture of the product. As the product design develops, more in-depth thermal analyses are carried out at various points in the design cycle to size or select cooling hardware, determine the exact physical architecture of the board and components placement, probe and qualify the thermal health of the equipment under postulated operating scenarios, etc, all to ensure that the product will function reliably when deployed in the field. The various stages of thermal design employed through the design cycle is illustrated in Fig. 5. Page 5 of 12 DegreeC White Paper: How to Reduce Product Launch Delays Through Systematic Thermal Design and Validation Functional Design (Power, Hardware and Software) Expected flow rates and temperature rise Component details and board layout Prototyping Thermal Analysis Concept • Overall cooling requirements • Initial cooling system specification Overall chassis size, plenums, vents Modified board layout Detailed Thermal Design • • • • Component temperatures Detailed cooling profiles Heatsink design/selection Probe design envelope: failure scenarios, elevated temperatures, elevation CAD Files Initial DVT • Build and test mock-up • Validate Thermal Model Final Thermal DVT • Temperature & Flow measurements • Survivability under failure scenarios • Thermal controller speed tuning • Burn-in • Agency pre-qual tests Vent %open ares, Filter selection, EMI mitigation Mechanical Design Figure 5: Various stages of thermal design employed for product development Thermal Design Stage 1: Initial Calculations for Sizing and Physical Layout An initial thermal design is used right at the concept stage to qualify the initial thinking about the overall architecture of the product. The platform, cooling method (conduction, radiation, natural or forced convection air cooling, liquid or two-phase cooling) and service conditions are normally determined by the target market and application. The initial thermal analysis is used to check for the feasibility of cooling the expected total heat dissipation within a given chassis size and available cooling method. The nature and sophistication of this initial thermal analysis depends on the type of equipment, available information, experience of the thermal engineer, and availability of time and tools to perform the analysis. The initial thermal analysis could range from a simple back-of-the-envelope calculation (e.g. the size of a natural convection cooled box), to a more elaborate such as designing a fan tray to cool a multi-blade server or router. In all cases the total heat expected to be dissipated in the equipment is uniformly smeared across the heat dissipating surfaces (normally the board). The main purposes of this exercise is to arrive at a workable Page 6 of 12 DegreeC White Paper: How to Reduce Product Launch Delays Through Systematic Thermal Design and Validation maximum air temperature rise in the chassis. Normally the following items would be determined from the initial analysis: • • • • • • Flow configuration: Push, Pull, and Push-Pull Fan selection and placement, mindful of fan life and acoustic requirements Required chassis size Vent sizes, locations, and overall flow resistance Expected bulk flow rates Expected air temperature rise Tota l Flow vs. He ight for the Six-Slot Chassis 900 24.5 24 800 Total Flow 23.5 23 Flo w [cfm] 600 22.5 500 22 T otal C hassis Heig h t [in] 700 Total Flow W ithout Res trictions in Front and B ac k Vents Total Flow W ith Botttom P lenum Inc reased 1" Total Chass is Height 400 21.5 300 21 200 20.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Plen u m He ig h t A bo ve Fan T ray [in] Without Louvers With Louvers Figure 6: A prototyping thermal analysis such as shown here employs uniform heat dissipation smeared on all the boards. This kind of analysis is used to select the appropriate size, quantity, and placement of fans required to cool the system. In this particular case the analysis also was used to determine the required height of the chassis as well as the relative benefit of using louvers at the inlet. Page 7 of 12 DegreeC White Paper: How to Reduce Product Launch Delays Through Systematic Thermal Design and Validation Thermal Design Stage 2: Detailed Thermal Modeling to Optimize and Verify Performance At the completion of the prototyping thermal analysis, hardware architects would have information on the cooling available for components they select for the functional design. Mechanical engineers would have an idea of the overall size of the chassis and spatial requirements for physical layout of chassis. The CAD file from the mechanical design and the board layout file from the board design, along with physical and thermal information on the major electronic components, are subsequently used to carry out the next stage of thermal analysis. This thermal simulation is used to obtain detailed local cooling rates for all important components in the system. If necessary, iterative simulations, which could involve changes in component selection and layout, and changes in the mechanical design of the chassis, are made to ensure that all components are satisfactorily cooled under all postulated operating scenarios. It is therefore of paramount importance - given the short design cycle - that the thermal simulation platform be compatible with both the mechanical CAD platform used for mechanical design, and the ECAD platform used for board layout. Both the mechanical CAD and ECAD files are imported into the thermal design platform to carry out the detailed analysis. Figure 7: Example of a detailed thermal simulation. In this picture the mechanical CAD geometry shown on the left is used directly to generate the thermal model shown on the right, because both the mechanical design and thermal simulation platforms can share geometry. As a result any changes made in the thermal model can quickly be migrated into the mechanical design, and vice versa. Figure 8: A detailed thermal simulation of the equipment is used to obtain a detailed picture of how each component is cooled. This kind of analysis can immediately pin-point potential causes for alarm and ways to mitigate them. Page 8 of 12 DegreeC White Paper: How to Reduce Product Launch Delays Through Systematic Thermal Design and Validation Generally, detailed thermal analysis is used for the following purposes: • • • • • • • To help refine the component layout so as to optimize cooling of all components in the system To help refine the physical layout and mechanical components of the chassis so as to maximize conduction and surface heat loss from natural convection cooled systems To help refine the physical layout so as to maximum the air flow rate thru forced convection cooled systems To design or select localized thermal management solutions such as heatsinks, heat pipes, thermoelectric coolers, and such. High power components would normally require heatsinks. It is particularly important to verify the performance of each heatsink in situ in the system model, regardless of the stated catalog performance. Off-the-shelf heatsinks can be selected and verified in the detailed thermal model. If an off-the-shelf solution cannot be found, custom heatsinks are also designed and verified using the detailed thermal model. Some level of grid embedding capability in the thermal analysis software is particularly useful when designing heatsinks. To probe the operating envelope in order to ensure that the equipment will perform satisfactorily under various environmental and service conditions, such as: o Elevated ambient temperatures o Redundancy, such as a single fan failure o Elevation o Solar radiation o Filter blockage The particular set of conditions to satisfy is normally defined by governing standard(s) for the application and location where the equipment will be deployed. Acoustic and EMI emissions are other criteria defined in standards which the equipment must satisfy. The mitigation for acoustic noise is normally to control the fan speeds as a function of the ambient temperature. The mitigation for EMI emission at the chassis level, is normally to control the opening percentage of the vents in order to contain the radiated emission within the chassis. In either case, a detailed thermal model is crucial to ensure that the equipment is cooled adequately at reduced fan output. To generate control strategy for thermally controlled systems. Managing acoustic noise normally requires fan speeds to be controlled in forced air cooled systems. A temperature controller may be necessary for systems whose temperatures need to be maintained constant during operation. In all cases the detailed thermal model is useful to generate the control curve that will be implemented in the controller. Figure 8: The thermal model shown here on the left was used to generate the airflow rates shown by the chart on the right. The speed settings shown on the chart were implemented in the thermal controller and verified through thermal DVT to be adequate for cooling the system Page 9 of 12 DegreeC White Paper: How to Reduce Product Launch Delays Through Systematic Thermal Design and Validation Figure 9: A detailed thermal model may contain important only components on one or more detailed boards, as shown on by the example on the left, while the other slots are represented with appropriate flow impedance. Normally the detailed board is inserted into the worst case slot selected according to the airflow rate. Depending on the capability of the modeling software it may be possible to detail every slot in the chassis, as shown by the example on the right. Thermal Design Stage 3: Initial DVT to Validate the Thermal Model It is beneficial to build and test a physical mockup of the equipment whenever time and cost allow. Despite the sophistication and capability of modeling tools, some gaffs can still occur in the thermal modeling especially when an inexperienced user is involved. Even with experienced thermal practitioners some level of initial DVT can still prove quite invaluable for highly critical systems, or systems that might incorporate new and unproven cooling technologies. In this case a quick physical mock-up and testing may be used to verify or discard assumptions made in the modeling. The initial DVT in essence is used to calibrate and validate the thermal model. Once the model is validated for a given set of design parameters and operating condition, then it can be used for all the other purposes outlined in the detailed modeling stage. This is normally the best way to go. Depending on time constraints and cost, a physical mockup may involve any number of the following: • • • • It is far easier and cost effective to make design changes and test for various operating conditions in software rather than prototyping each change. But testing in software is only useful if the thermal model is correct. The initial DVT helps to ensure that the thermal model is correct. A cardboard or sheet metal mockup of the chassis. Even though this is only a mockup, it is very important to ensure that the physical layout, venting pattern, and air filter (as applicable) be as close as possible to what is envisioned for the final design. Load boards consisting of heater blocks and other flow obstructions glued or taped onto bare PCB, cardboard, or aluminum sheets to represent the boards. In many cases the new equipment may be a derivative of something that already exists. It is better and quicker to use the old boards in such cases. A mock up of the fan tray in forced convection cooled systems. The actual fans specified for the new systems should be used. Thermocouples and airflow sensors installed at important locations in the mockup The physical mockup is normally used to test for the following depending on the equipment: 1. 2. 3. Overall volumetric airflow through the equipment. This is normally done in a wind tunnel Flow velocities at critical locations throughout the equipment Where heaters are used, air temperatures at critical locations throughout the equipment Page 10 of 12 DegreeC White Paper: How to Reduce Product Launch Delays Through Systematic Thermal Design and Validation These measurements are used to validate the thermal model. If the model and test results are not in satisfactory agreement, then the thermal model (or test rig for that matter) needs to be re-visited to bring them in closer agreement. A validated thermal model is the conclusion of this exercise. The model may then be used in subsequent numerical testing to refine the design and test for postulated service conditions. Thermal Design Stage 4: Final Thermal DVT The final thermal DVT is necessary not only to verify the thermal design but also as a pre-screening or prequalification test for the more comprehensive agency testing down the road. Agency tests are very expensive, so it is imperative that the thermal engineer ensures that there will be no failure and repeat tests. As such it is beneficial that the final thermal DVT be comprehensive and covers as many of the design parameters as possible. Specifically, the final DVT should address the following (depending on the equipment): 1. 2. 3. 4. 5. 6. 7. Overall volumetric airflow for forced convection systems Flow distribution and flow velocities at critical locations in the equipment. In this regard it is often important to measure the air velocities at heatsink locations, for diagnostics if the heatsink is not performing as expected, or for cost-saving heatsink re-design if the heatsink is over-specified. Air temperatures at critical locations. Every electronic device have specifications for the maximum ambient temperature and/or the maximum junction or case temperature. When the latter cannot be measured or not available, the local ambient temperature is the sole criterion for judging the design and as such it is very important to measure this value. Case temperatures of critical components. For components with heatsinks it may be necessary to drill a tiny hole through the heatsink for thermocouple insertion. Elevated ambient temperature. A thermal chamber is used for this purpose. Air temperatures and component temperatures are measured at various settings of ambient (chamber) temperature. If a fan speed controller is used in the equipment, then this test must also measure the fan speeds, which is normally controlled to the ambient temperature. These tests may be used to refine the setpoints of the thermal controller. Ambient temperature variations. This is normally accomplished by temperature cycling in a thermal chamber. Standards would normally define number of cycles, the ambient ramp rates, as well as the dwell levels and soak periods. The equipment functionality is normally tested during the dwell periods. Acoustic noise. This is normally done in an anechoic chamber. When possible it is better to combine this test with elevated ambient tests, which may be accomplished in a walk-in thermal chamber with microphones installed as defined in the prevailing standards (e.g. NEBS) Figure 10: Final thermal DVT setup in a wind tunnel (Left). Whenever possible it is useful to obtain the overall thermal picture of the board and components with a thermal camera, as shown by the picture on the right. This picture gives a bird’s-eye view of the hot spots on the board and where to attach thermocouples. Page 11 of 12 DegreeC White Paper: How to Reduce Product Launch Delays Through Systematic Thermal Design and Validation Thermal Design Stage 5: Deployment Level Simulation to Establish Service Conditions Sometimes it is necessary to simulate the expected installation or deployment configurations of an equipment to positively ensure that it will cool as design throughout its expected life-cycle. Deploymenttype thermal simulation or testing is normally used to establish alarm annunciation for the system controller and service intervals for the operator, such as when to change the filter or how much time is available to swap out a fan tray or perform a graceful shutdown in the event of loss of cooling. Also, for datacomm equipment two or more of the same chassis is normally designed and intended to be installed on the same equipment rack, and in most cases the rack structure or nearby equipment is expected to partially block some of the vents in the subject equipment. In all such situations deployment-type thermal simulation can highlight the dangers to watch out for and to include in the operator’s manual. The example below is one such simulation for an equipment intended to be installed in groups of 3 units inside an ETSI rack. In this example the middle unit is used to simulate and investigate the combined effects of pre-heating by the unit below and exhaust blockage by the unit on top. Figure 11: Deployment level simulation is used to ensure that nothing about the installation and use will deteriorate the cooling design appreciably. Page 12 of 12
© Copyright 2024