How to Reduce Product Launch Delays Through Izuh Obinelo PhD

How to Reduce Product Launch Delays Through
Systematic Thermal Design and Validation
Izuh Obinelo PhD
Director
Center for Airflow and Thermal Technologies
Degree Controls, Inc.
How to Reduce Product Launch Delays Through Systematic Thermal Design and Validation
Introduction
The need for higher product reliability has driven thermal design to the fore-front in the development of
electronics components and systems. It is well known that the reliable operation of an electronic device
depends largely on its operating temperature – the lower the operating temperature, the longer the longterm reliability of the component. Numerous tests and empirical measurements have confirmed that
component failure rates are exponentially related to temperature due to a number of thermally-dependent
failure mechanisms. Not only that, any excess in temperature beyond the design limit would result in an
instant failure of the component and hence the rest of the system. Thus thermal design has steadily gained
importance as one of the two primary considerations in the design and packaging of electronic components
and systems – the other being the actual product functionality.
Thermal design as applied to electronics packaging is an all-encompassing discipline targeted to the
management of heat generated by electronic circuitry. Thus it is involved in the packaging of electronics at
the chip or device level, at the board level, at the chassis or enclosure level, and at the room level. At every
level, the rapid and continuing growth in packaging density have posed enormous challenges to electronics
cooling, and increasingly there is a need for thermal design to take a synergistic or holistic approach that
attempts to squeeze increasing efficiencies at all packaging levels collectively and at every level
individually.
There are three paradigms at work that make thermal reliability the primary consideration in system-level
packaging today:
2.
3.
Increasing clock rates and functional integration at the chip level following Moore’s law, leading
to higher power dissipations per component. Figure 1 shows the recent trend in high performance
chip power dissipation and chip heat flux respectively.
Increasing packaging density at the system level, driven by the demand for higher performance,
higher bandwidths, faster communication, and wider range of services offered per box. This in
turn has resulted in ever increasing heat densities across all product platforms. Figure 2 shows the
historical and expected heat density trends of various types of datacomm equipment.
Maximum chip temperature requirements have not changed much even as the power dissipation
has been increasing rapidly. The vast majority of commercial-off-the-shelf (COTS) components
still have long-term operating temperatures in the range of 85-105°C, and maximum operating
temperatures not to exceed 125°C. Most of the Central Processor Units (CPU) cannot exceed 72°C
case temperature without throttling. There has been a lot of research on high temperature
electronics for harsh environments such as under-the-hood automotive applications where ambient
temperatures can reach as high as 125°C, but by and large the vast majority of the components
used in electronics packaging are COTS components.
200
Chip Heat Flux(watts/cm 2)
1.
Max - 2002
150
Max - 2000
Min - 2002
100
Min - 2000
50
0
2001
2003
2006
2009
2012
Year
Figure 1: High Performance Chip Heat Flux Trends (source: NEMI Technology Roadmaps)
Page 2 of 12
How to Reduce Product Launch Delays Through Systematic Thermal Design and Validation
Figure 2: Power Density Trends at Facility Level (source: ASHRAE TC9)
The Basic Electronics Thermal Management Task
The basic heat transport phenomena related to cooling an electronic device is schematically shown in
Figure 1. The electronic device could be a packaged silicon chip that processes information, a power supply
that transforms and conditions power input to other devices on a board or chassis, an optical transceiver, or
a light emitting diode (LED). As a by-product of performing its function, the power supplied to the device
is eventually transformed into waste heat that serves to raise the temperature of the device. In order to
prevent the device temperature from exceeding a known limit which will cause it to fail, the heat must be
transported away from the device, and to the surrounding environment as the ultimate heatsink. Between
the device and the environment, there can be any number of intervening hardware and media intended to
cool the device and bring the device temperature as close as possible to the environment temperature, and
even sometimes necessarily below the environment temperature.
T∞
Tmax ≥ Tj > T∞
qc
qk
qrb
qcb
Device
qkb
Board
qr
Q&
Tj
qkb
qcb
qkb
qrb
Tb
Figure 3: Heat generation and dissipation in an electronic device mounted on a board.
& ) is generated at a relatively small portion of the device called the junction.
As shown in Fig. 3, heat ( Q
Part of the heat is conducted from the device package into the board (qkb) which is ultimately lost into the
surrounding media by convection (qbc) and radiation (qbr). The remaining portion of the heat is lost through
Page 3 of 12
How to Reduce Product Launch Delays Through Systematic Thermal Design and Validation
the surface of the package by any combination of convection (qc), radiation (qr), and conduction (qk)
depending on the prevailing local environment around the device, and the cooling hardware employed. The
device would generally be at an elevated temperature above the ambient temperature (T∞). The device
temperature is highest at the junction (Tj) and gradually falls across the spreading resistance of the package.
The basic thermal management task is to maintain the junction temperature (Tj) below a known limit
commonly referred to as the maximum operating temperature. With this as the primary objective, the
design of the package to withstand the elevated temperature and distribute the generated heat, the metrics
and characterization of the generated heat, the selection and design of the intervening media and hardware
to transport the heat, and the impact of the environment temperature and heat transport on the short- and
long-term reliability of the device, constitute the core scope of electronics thermal management.
The Need for a Systematic Approach to Thermal Management
In an electronics system that contains hundreds or thousands of electronics, each device would have its own
operating temperature limit. In such situations, the task of maintaining every device below its own
temperature limit could become quite daunting, and requires a great deal of sophistication and experience
in finding a solution. Fortunately, rapid advances have been made in the tools and methods available for
thermal management. Sophisticated computational tools are now readily available to simulate and model
the enormous complexity of heat transport phenomena in electronics systems for the purpose of realizing an
optimal design. Advances in testing hardware and methodologies, as well as definition of standards, have
made it possible to troubleshoot and verify the field performance of systems before they leave the door.
Beyond product reliability concerns, there are other equally
important factors that are primary considerations in the
packaging of electronics. Aggressive competition has lead to
equally aggressive cost squeeze across all product segments and
services. Thus even though the cooling hardware is traditionally
a small percentage of the cost of a product, the thermal designer
is nevertheless under the same cost pressure. This often limits
the number of viable choices for the cooling hardware and
demands that a careful and systematic approach be taken to
design and optimize cost-effective solutions.
Costs, reliability, and time-tomarket are the three main factors
influencing the thermal design of a
product. Due to the complexity of
the most products, a careful and
systematic approach must be used
to address these concerns.
An equally important consideration in thermal design is the short design cycle. Shrinking product lifecycles (especially in the commercial sector) make time-to-market one of the primary factors in product
development. For example, datacomm equipment refreshes occur roughly every 1.5 years. Thus engineers
in the datacomm arena do not have the luxury of long design cycles traditionally obtained in other market
segments. In the datacomm arena the time available from concept to market introduction is very short,
requiring that the thermal design be done right the first time – there is virtually no room for re-design to
make corrections. Due to the short design cycle, thermal design must be carried out concurrently with the
rest of the product development effort, as shown conceptually in Fig. 4.
The various stages of thermal design employed through the design cycle are illustrated in Fig. 5. Due to
reliability concerns, thermal design is normally the lead-in to the physical design activity once the initial
product idea has been actualized into a workable concept. This initial prototyping thermal design is
required to define or validate the overall physical architecture of the product. As the product design
develops, more in-depth thermal analyses are carried out at various points in the design cycle to size or
select cooling hardware, determine the exact physical architecture of the board and components placement,
probe and qualify the thermal health of the equipment under postulated operating scenarios, etc, all to
ensure that the product will function reliably when deployed in the field.
Page 4 of 12
How to Reduce Product Launch Delays Through Systematic Thermal Design and Validation
r
we
Po
l
ica
ys ign
h
P es
D
Concurrent
Engineering
Th
e
De rmal
sig
n
Fun
Ele ctiona
c
Lay trical l Desi
gn
out ,
,
Sof
twa
re
Procurement
Figure 4: Concurrent product development of a datacomm equipment.
Electrical and Software Design
Expected airflow
rates and
temperature rise
Stage 1
Prototyping
Thermal
Analysis
Design
Start
- Overall cooling
requirements
- Initial cooling
system
specification
Component
details and
board
layout
Modified
board
layout
Stage 2
Detailed Thermal
Design
- Device temperatures
- Detailed cooling profiles
- Heatsink design &
selection
- Probe design envelope:
failure scenarios,
elevated temperatures,
elevation
Validate final
component
placements.
Finalize
heatsinks
and TIMs
Validate
component temps
at: Startup, fan
failure, high temp,
filter blockage
Fine tune
thermal
control
algorithm.
Finalize set
points.
Stage 3
Initial DVT
Stage 4
Final Thermal DVT
- Build & test
physical mockup
- Validate
thermal model
- Temperature & Flow
measurements
- Survivability under
failure scenarios
- Thermal controller
speed tuning
- Burn-in
- Agency pre-qual tests
Configuration
instructions.
Performance info
Stage 5
Deployment
Level Analysis
- Temperature &
Flow
measurements
- Proximity checks
- Vent blockage
checks
- Multiple
configurations
(Product
Concept)
Overall chassis
size, plenums,
vents
CAD
Files
Vent %open
ares, Filter
selection, EMI
mitigation
Finalize vents
and grilles,
filter, fan
placements,
plenum heights,
surface finishes
Finalize
Hardware
design:
Chassis,
Fan tray,
Controller
Validate
acoustics and
operation at
ambient temp,
humidity,
altitude
Deployment
instructions
Usage instructions
Service instructions
Physical/Mechanical Design
Figure 5: Various stages of thermal design employed for product development
Page 5 of 12
How to Reduce Product Launch Delays Through Systematic Thermal Design and Validation
Thermal Design Stage 1: Initial Calculations for Sizing and
Physical Layout
An initial thermal design is used right at the concept stage to qualify the initial thinking about the overall
architecture of the product. The target market and application normally dictate the type of platform and
cooling method to be employed (conduction, radiation, natural or forced convection air cooling, liquid or
two-phase cooling), as well as the service conditions. The initial thermal analysis is used to check for the
feasibility of cooling the expected total heat dissipation within a given chassis size and available cooling
method. The nature and sophistication of this initial thermal analysis depends on the type of equipment,
available information, experience of the thermal engineer, and availability of time and tools to perform the
analysis. The initial thermal analysis could range from a simple back-of-the-envelope calculation (e.g. the
size of a natural convection cooled box), to something more elaborate such as designing a fan tray to cool a
multi-blade server or router. In all cases the total heat expected to be dissipated in the equipment is
uniformly smeared across the heat dissipating surfaces (normally the board). The main purpose of this
exercise is to arrive at a workable maximum air temperature rise in the chassis. Normally the following
items would be determined from the initial analysis:
• Flow configuration: Push, Pull, and Push-Pull
• Fan selection and placement, mindful of fan life and acoustic requirements
• Required chassis size
• Vent sizes, locations, and overall flow resistance
• Expected bulk flow rates
• Expected air temperature rise
Total Flow vs. Height for the Six-Slot Chassis
24.5
900
24
800
Total Flow
23.5
23
Flow [cfm]
600
22.5
500
22
Total Chassis Height [in]
700
Total Flow W ithout
Restrictions in Front and
Back Vents
Total Flow W ith Botttom
Plenum Increased 1"
Total Chassis Height
400
21.5
300
21
200
20.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Ple num He ight Above Fan Tray [in]
With Louvers
Without Louvers
Figure 6: A prototyping thermal analysis such as shown here employs uniform heat dissipation smeared on all the boards.
This kind of analysis is used to select the appropriate size, quantity, and placement of fans required to cool the system. In
this particular case the analysis was also used to determine the required height of the chassis as well as the relative benefit
of using louvers at the intake.
Page 6 of 12
How to Reduce Product Launch Delays Through Systematic Thermal Design and Validation
Thermal Design Stage 2: Detailed Thermal Modeling to
Optimize and Verify Performance
At the completion of the prototyping phase of thermal analysis, hardware architects would have
information on the cooling available for components they select for the functional design. Mechanical
engineers would have an idea of the overall size of the chassis and spatial requirements for physical layout
of chassis. The CAD file from the mechanical design and the board layout file from the board design, along
with physical and thermal information on the major electronic components, are subsequently used to carry
out the next stage of thermal analysis. This thermal simulation is used to obtain detailed local cooling rates
for all important components in the system. If necessary, iterative simulations, which could involve
changes in component selection and layout, and changes in the mechanical design of the chassis, are made
to ensure that all components are satisfactorily cooled under all postulated operating scenarios. It is
therefore of paramount importance - given the short design cycle - that the thermal simulation platform be
compatible with both the mechanical CAD platform used for mechanical design, and the ECAD platform
used for board layout. Both the mechanical CAD and ECAD files are imported into the thermal design
platform to carry out the detailed analysis.
Figure 7: Example of a detailed
thermal simulation. In this picture
the mechanical CAD geometry is
used directly to generate the thermal
model shown on the right, because
both the mechanical design and
thermal simulation platforms can
share geometry. As a result any
changes made in the thermal model
can quickly be migrated into the
mechanical design, and vice versa.
Figure 8: A detailed thermal simulation of the equipment is used to obtain a detailed picture of how
each component is cooled. This kind of analysis can immediately pin-point potential causes for alarm
and ways to mitigate them.
Page 7 of 12
How to Reduce Product Launch Delays Through Systematic Thermal Design and Validation
Generally, detailed thermal analysis is used for the following purposes:
•
•
•
•
•
•
•
Refine the component layout so as to optimize cooling of all components in the system
Refine the physical layout and mechanical components of the chassis so as to maximize
conduction and surface heat loss from natural convection cooled systems
Refine the physical layout so as to maximize the air flow rate thru forced convection cooled
systems
To design or select localized thermal management solutions such as heatsinks, heat pipes, thermoelectric coolers, etc. High power components would normally require heatsinks. It is particularly
important to verify the performance of each heatsink in situ in the system model, regardless of the
stated catalog performance. Off-the-shelf heatsinks can be selected and verified in the detailed
thermal model. If an off-the-shelf solution cannot be found, custom heatsinks are also designed
and verified using the detailed thermal model. Some level of grid embedding capability in the
thermal analysis software is particularly useful when designing heatsinks.
To probe the operating envelope in order to ensure that the equipment will perform satisfactorily
under various environmental and service conditions, such as:
o Elevated ambient temperatures
o Redundancy, such as a single fan failure
o Elevation
o Solar radiation
o Filter blockage
The particular set of conditions to satisfy is normally defined by governing standard(s) for the
application and location where the equipment will be deployed.
Acoustic and EMI emissions are other criteria defined in standards which the equipment must
satisfy. The usual mitigation for acoustic noise is to control the fan speeds as a function of the
ambient temperature. The usual mitigation for EMI emission at the chassis level is to control the
opening percentage of the vents in order to contain the radiated emission within the chassis. In
either case, a detailed thermal model is crucial to ensure that the equipment is cooled adequately at
reduced fan output.
To generate control strategy for thermally controlled systems. Managing acoustic noise normally
requires fan speeds to be controlled in forced air cooled systems. A temperature controller may be
necessary for systems whose temperatures need to be maintained constant during operation. In all
cases the detailed thermal model is useful to generate the control curve that will be implemented in
the controller.
Figure 9: A detailed thermal model may contain
only important components on one or more detailed
boards while the other slots are represented with
appropriate flow impedance, as shown by the
example on the left. Normally the detailed board is
inserted into the worst case slot selected according
to the airflow rate. Depending on the capability of
the modeling software it may be possible to detail
every slot in the chassis, as shown by the example
on the right.
Page 8 of 12
How to Reduce Product Launch Delays Through Systematic Thermal Design and Validation
Thermal Design Stage 3: Initial DVT to Validate the Thermal
Model
It is beneficial to build and test a physical mockup of the equipment whenever time and cost allow. Despite
the sophistication and capability of modeling tools, some gaffs can still occur in the thermal modeling
especially when an inexperienced user is involved. Even with experienced thermal practitioners some level
of initial DVT can still prove quite invaluable for highly critical systems, or systems that might incorporate
new and unproven cooling technologies. In this case a quick physical mock-up and testing may be used to
verify or discard assumptions made in the modeling phase.
The initial DVT in essence is used to calibrate and validate
the thermal model. Once the model is validated for a given
set of design parameters and operating condition, then it can
be used for all the other purposes outlined in the previous
section. This is normally the best way to go. Depending on
time constraints and cost, a physical mockup may involve
any number of the following.
•
•
•
•
It is far easier and cost effective to
make design changes and test for
various operating conditions in
software rather than prototyping
each change. But testing in
software is only useful if the thermal
model is correct. The initial DVT
helps to ensure that the thermal
model is correct.
A cardboard or sheet metal mockup of the chassis.
Even though this is only a mockup, it is very
important to ensure that the physical layout, venting
pattern, and air filter (as applicable) be as close as
possible to what is envisioned for the final design.
Load boards consisting of heater blocks and other flow obstructions glued or taped onto bare
PCB, cardboard, or aluminum sheets to represent the boards. In many cases the new equipment
may be a derivative of something that already exists. It is better and quicker to use the old boards
in such cases.
A mock up of the fan tray in forced convection cooled systems. The actual fans specified for the
new systems should be used.
Thermocouples and airflow sensors installed at important locations in the mockup
The physical mockup is normally used to test for the following depending on the equipment:
1. Overall volumetric airflow through the equipment. This is normally done in a wind tunnel
2. Flow velocities at critical locations throughout the equipment
3. Where heaters are used, air temperatures at critical locations throughout the equipment
These measurements are used to validate the thermal model. If the model and test results are not in
satisfactory agreement, then the thermal model (or the test setup for that matter) needs to be re-visited to
bring them in closer agreement. A validated thermal model is the conclusion of this exercise. The model
may then be used in subsequent numerical testing to refine the design and test for postulated service
conditions.
Page 9 of 12
How to Reduce Product Launch Delays Through Systematic Thermal Design and Validation
Figure 8: The thermal model shown here on the left was used to generate the airflow rates shown by
the chart on the right. The speed settings shown on the chart were implemented in the thermal
controller and verified through thermal DVT to be adequate for cooling the system
Thermal Design Stage 4: Final Thermal DVT
The final thermal DVT is necessary not only to verify the thermal design but also as a pre-screening or prequalification test for the more comprehensive agency testing down the road. Agency tests are very
expensive, so it is imperative on the thermal engineer to ensure that there will be no failure and repeat
agency tests. Most of the thermally-related agency requirements can be covered in the thermal DVT. As
such it is beneficial that the final thermal DVT be comprehensive and covers as many of the design
parameters as possible. Specifically, the final DVT should address the following (depending on the
equipment):
1.
2.
3.
4.
5.
6.
7.
Overall volumetric airflow for forced convection systems
Flow distribution and flow velocities at critical locations in the equipment. In this regard it is often
important to measure the air velocities at heatsink locations, for diagnostics in case the heatsink is
not performing as expected, or for cost-saving heatsink re-design if it turns out the heatsink is
over-specified.
Air temperatures at critical locations. Every electronic device have specifications for the
maximum ambient temperature and/or the maximum junction or case temperature. When the latter
cannot be measured or not available, the local ambient temperature is the sole criterion for judging
the design and as such it is very important to measure this quantity.
Case temperatures of critical components. For components with heatsinks it may be necessary to
drill a tiny hole through the heatsink for thermocouple insertion.
Elevated ambient temperature. A thermal chamber is used for this purpose. Air temperatures and
component temperatures are measured at various settings of ambient (chamber) temperature. If a
fan speed controller is used in the equipment, then this test must also measure the fan speeds,
which is normally controlled to the ambient temperature. These tests may be used to refine the
setpoints of the thermal controller.
Ambient temperature variations. This is normally accomplished by temperature cycling in a
thermal chamber. Test standards would normally define number of cycles, the ambient ramp rates,
as well as the dwell levels and soak periods. The equipment functionality is normally tested during
the dwell periods.
Acoustic noise. This is done in an anechoic chamber. When possible it is better to combine this
test with elevated ambient tests, which may be accomplished in a walk-in thermal chamber with
microphones installed as defined in the prevailing standards (e.g. NEBS)
Page 10 of 12
How to Reduce Product Launch Delays Through Systematic Thermal Design and Validation
Figure 10: Final thermal DVT setup in a wind tunnel (Left). Whenever possible it is useful to obtain the
overall thermal picture of the board and components with a thermal camera, as shown by the picture on the
right. This picture gives a bird’s-eye view of the hot spots on the board and where to attach thermocouples.
Thermal Design Stage 5: Deployment Level Simulation to
Establish Service Conditions
Sometimes it is necessary to simulate the expected installation or deployment configurations for a piece of
equipment to positively ensure that it will cool as designed throughout its expected life-cycle. Deploymenttype thermal simulation or testing is normally used to establish alarm annunciation for the system controller
and to determine service intervals for the operator. When to change the filter, for example, or how much
time is available to swap out a fan tray or perform a graceful shutdown in the event of loss of cooling. For
datacomm equipment it is often the case that two or more of the same chassis are intended to be installed
on the same equipment rack, and in most cases the rack structure or nearby equipment is expected to
partially block some of the vents in the subject equipment. In a multi-blade server arrangement, sometimes
not all the slots are occupied. In such situations the flow by-pass of the occupied slots due to the low
impedance of nearby empty slots may be of concern. Deployment type simulation of possible
configurations (or just the worst case configuration) may be used to quantify how much of a concern the
bypass issue is, to determine if slot blockers are needed for the empty slots, and if so the design of an
appropriate slot blocker.
Deployment-type thermal simulation can highlight potential dangers tnd serve a basis for instructions to be
included in the operator’s manual. The example below is one such simulation for a piece of equipment
intended to be installed in groups of three units inside an ETSI rack. In this example the middle unit is used
to simulate and investigate the combined effects of pre-heating by the unit below and exhaust blockage by
the unit on top.
Page 11 of 12
How to Reduce Product Launch Delays Through Systematic Thermal Design and Validation
Figure 11: Deployment level simulation is used to ensure that nothing about the
installation and use will deteriorate the cooling design appreciably.
Conclusions
Thermal design is a critical part of any electronic product development effort. Specialized tools and
methods are now available and, in the hands of a sophisticated thermal practitioner, can improve time-tomarket, reduce production and maintenance costs, and improve product reliability. DegreeC offers the full
range of expertise, tools, methods, and validation capabilities to provide a turn-key thermal control solution
for your electronic product development effort.
About the Author
Dr. Izuh Obinelo is currently the director of the center for thermal technologies (CATT) at Degree Controls
Inc. He has over 20 years cumulative experience in fluid dynamics and heating and cooling phenomena in
several application areas. Dr. Obinelo played a leading role in the research and development of Electronics
System Cooling (ESC), a premier commercial software for thermal design in electronics packaging. He has
authored or co-authored several technical articles on electronics cooling.
Page 12 of 12