How to Certify Your Cluster

TECHNICAL BRIEF
Intel® Cluster Ready
How to Certify Your Cluster
The goal of Intel® Cluster Ready certification is to verify that your cluster solutions comply with the Intel Cluster Ready
Specification, making them interoperable with compliant applications. This document describes the mechanics, or concrete
steps, that are required to certify a cluster. Additional information may be found in the Intel Cluster Ready Specification
and the Intel Cluster Checker User’s Guide. If you are building a cluster using a reference design that has already been
certified, you may not need to recertify it—see the program document, Certification Quick-Start Guide, for more details.
By joining the Intel Cluster Ready
Program, your company already has
obtained the Intel® Cluster Checker tool
and a valid license. Be sure you have the
current version—you must have Intel
Cluster Checker version 1.8 or later to use
the procedures described here. Also, this
document assumes you have developed
a new cluster reference design and built
a cluster based on that design. You are
now ready to perform the required Intel
Cluster Checker certification run on that
cluster to verify that the reference design
is Intel Cluster Ready–compliant.
The run will produce two output files—
a text file and an XML file. Both files and
a bill of materials (BOM) for your cluster
must be included in the certification sub­
mission. Microsoft Word, Microsoft Excel,*
or plain ASCII text are the preferred
formats for BOMs.
Here are the basic steps to certify
your new cluster reference design:
1. Review the Cluster Bill of Materials
Each Intel Cluster Ready reference design
has a specific hardware and software
bill of materials that is defined by your
design engineering team. Table 1 shows
an example of the hardware components
of a BOM, and Table 2 shows the
software components.
Before you begin your certification run,
obtain the BOM for the particular cluster
you will be testing. The test cluster
should conform to the BOM, and the BOM
will determine some of the Intel Cluster
Checker settings for the certification run.
When your company sells future clusters
based on this reference design, some
variances from the original BOM hardware
will be allowed. For example, the customer
may order more memory, faster hard disk
drives, or a different number of nodes
(see the program document, MassProducing Your Certified Cluster Solutions,
for details). However, the BOM software
will need to match each new cluster and
will be verified by Intel Cluster Checker.
Table 1. BOM example, hardware components
Quantity
Item
Manufacturer
Model
32
Intel® Server Board
Intel
S5520UR
32
Intel® Server Chassis
Intel
SR1600URSASBPP
32
Intel® HDD Backplane
Intel
ASR1600PASBP
32
2 Intel® Xeon® Processors
Intel
Intel Xeon Processor X5660
32
6x 2GB DDR3 PC3-10600
Micron
MT18JSF25672PDZ-1G4D1
32
500Gb SATA Hard Disk Drive 3 Gbs
Seagate*
Barracuda* ST3500320NS
32
ConnectX IB—Dual-Port InfiniBand
Adapter Card QDR
Mellanox*
32
DVD/CDRW—Slimeline SR1550/SR1560
Intel
MHQH29-XTC
Hw Revision: a0
Fw Version: 2.7.000
AXXDVDCDR
1
A low-latency Gigabit Ethernet switch
Hewlett-Packard* ProCurve* J4904A
1
Mellanox*
MTS3600Q-1BNC
32
MTS3600-BNC 36-port 20 and
40Gb/s InfiniBand Switch System
Infiniband cabling / QSFP connector
Mellanox
1
KVM over IP Solution
Avocent
MCC4Q30C-002 Copper Cable
4XQSFP 30AWG 2m
DSR8035
Table 2. BOM example, software components
Distributed By Description
Contact Information
Intel®
Corporation
http://www.intel.com/go/cluster
Intel®
Corporation
Platform
Computing*
Platform
Computing
Red Hat
Platform
Computing
2
Reference Design Package
The following files are included:
•T
he Intel Cluster Checker files
(config, head and node list)
• The Reference Design Scripts
•T
he Intel Cluster Ready Certificate
•T
he Intel Cluster Ready Reference
Implementation Release Notes
Intel® Network Driver.
The following Lan drivers are included:
• e 1000 version 8.0.25
• e1000e version 1.2.20
• igb version 2.4.11
• ixgbe version 3.2.9
Intel® Cluster Checker 1.8
Program registration is needed.
Intel_Cluster_Ready_Reference_DesignS5520UR-ICR1.1-HPC2.1-RH5.5-C1-v1.0.zip
http://downloadcenter.intel.com/Detail_Desc.
aspx?agr=Y&DwnldID=18239&lang=eng
intel-lan_linux_v16.0.zip
md5: 36a21f3230fe3d60ceff340e137d49e2
http://my.platform.com
kit-intel-cluster-checker-1.8-1.x86_64.iso
md5: 7784cd007bf5fc7be59df0f76dd48338
Platform HPC Enterprise Edition 2.1 RHEL http://my.platform.com
Included kit list in the ISO image:
hpc21-4613.rhel.iso
•B
ase Kit
md5: 4e0bc888373af697d88ffbc471b9af7f
•P
latform LSF Kit
•N
agios Kit
•P
latform OFED Kit
•G
UI Console Kit
•P
latform HPC Kit
•P
latform ISFAC Kit
•P
latform MPI Kit
•P
latform RTM Kit
Site registration is needed.
Red Hat* Enterprise Linux 5.5
https://www.redhat.com/apps/download/
RHEL5.5-Server-20100322.0-x86_64-DVD.iso
md5: f3119f883257ef9041234feda2f1cad0
Intel® Cluster Runtimes 3.0
http://my.platform.com
Program registration is needed.
kit-intel-runtime-3.0-1.x86_64.iso
md5:0f8520960feceb43166351c2f289b8da
3. Generate Fingerprint Files
At this point, you will need to generate a
set of reference fingerprint files that will
be used in the future to test clusters built
from this reference design. To verify that
the fingerprint is valid, you must make the
newly generated fingerprint part of your
Intel Cluster Checker runs.1
It is also important to configure Intel
Cluster Checker to verify all of the
message fabrics in your cluster. Both the
DAPL and TCP/IP interfaces of all the
message fabrics must be included. For
example, if you are certifying a cluster
with both Ethernet and InfiniBand*
fabrics, you need to configure Intel Cluster
Checker to test both. Several different
test modules may be affected depending
on the fabrics involved (Figure 1).
Then, specify the resulting fingerprint
files for this cluster in the configuration of
the Intel Cluster Checker packages module
before running Intel Cluster Checker in the
next step. See the Intel Cluster Checker
User’s Guide and Module Reference Guide
for more information.
Add any other checks appropriate for your
specific cluster. You can use the <include_
module> tag in the configuration file to
include optional Intel Cluster Checker
test modules—for example, to include
the openib module for clusters with
InfiniBand, or the e1000 check for cluster
nodes that use baseboards with an Intel®
PRO/1000 network adapter. Additional
checks can also be added using the <add_
dependency> tag, which in many cases
is the better option. Intel recommends
You must save the fingerprint files. Keep
them with this reference design so that
your company can use them each time you
build a new cluster from the design (as
detailed in Mass-Producing Your Certified
Cluster Solutions). The files will be used
to verify each new cluster, ensuring that
it matches the certified reference design
software stack. You should distribute the
fingerprint files to your end customers
along with each new cluster you build
from this reference design.
Generate fingerprints of your head and
compute nodes with the following command:
cluster-check <xmlfile> --packages
Table 3. Intel® Cluster Checker performance-related modules
Module Name
Value
disk_bandwidth
Measures storage sub-system performance
hpcc
System-wide performance benchmark
imb_pingpong_intel_mpi
Network bandwidth and latency indicators
memory_bandwidth_stream
Measures bandwidth to memory
mflops_intel_mkl
Indicator of computing throughput
Test Module
Include if
Needed
Next, generate the Intel® Cluster Checker
XML configuration file to define your
cluster configuration (as detailed in the
Intel Cluster Checker User’s Guide). Also,
it is highly recommended that you save
the Cluster Checker XML configuration at
/etc/intel/clck/config.xml. An important
part of this process is defining appropriate
performance thresholds for the cluster
you are testing, based on the BOM.
Although this task is optional, Intel highly
encourages you to set thresholds for all
performance-related test modules. Key
modules that provide an indication of
cluster performance are shown in Table 3.
that you use the XML schema provided
with Intel Cluster Checker to validate your
configuration file with third-party tools
such as xmllint. Beginning with version 1.5,
Intel Cluster Checker provides capabilities
to automatically detect some system
parameters and create a configuration
input file. See the Intel Cluster Checker
User’s Guide for more information.
Configuration
Required
2. Configure Specific Tests
for Your Cluster
dat_conf
hpcc
imb_collective_intel_mpi
imb_message_integrity_intel_mpi
imb_pingpong_intel_mpi
intel_ethernet_driver
intel_mpi_rt
intel_mpi_rt_internode
intel_mpi_testsuite
ipoib
openib
subnet_manager
Figure 1. Intel® Cluster Checker fabric-related
modules
4. Run Intel Cluster Checker as a
Regular User
Once you have generated the Intel Cluster
Checker XML configuration file, you are
ready to run the tool. Use the Intel Cluster
Checker command-line interface and start
this run with the following command:
cluster-check <xmlfile>
--certification 1.1
If the cluster has TCP/IP over Ethernet
only, the command line option --exclude
intel_mpi_testsuite may be added to the
above command. As part of this run, Intel
Cluster Checker will verify the capabilities
of your message fabric or fabrics. Be sure
you have configured Intel Cluster Checker
to verify all of the message fabrics in your
cluster as described in Step 2.
After the run, save the output so that
you can send it to Intel. The run should
be completed successfully to obtain
certification, but specific exceptions are
permitted: if either the file_tree test or
lib32_counterpart_lib64 test fails for
one of the following reasons, then your
compliance run may still be accepted.
However, Intel strongly encourages you to
use the exclusion options for these tests
to allow these tests to pass. Refer to the
Intel Cluster Checker Module Reference
Guide for more information on adding
exclusions for these modules.
3
file_tree test
5. Submit Results for Certification
A node is allowed to differ for the
following reasons:
You should now have a text output file
and an XML output file for the Intel Cluster
Checker runs, plus the bill of materials (in
Microsoft Word, Excel, or ASCII text format)
for your cluster, ready to submit for certi­
fication. To make the submission, complete
the form below and use the “Submit by
E-mail” button to open an e-mail to cluster
@intel.com. Before sending, attach your
two Intel Cluster Checker output files and
your cluster bill of materials to the e-mail.
•The file differs due to the use of prelink
(or similar utility) by the Linux* distri­
bution. The checksum of the original,
unmodified file must be identical on
all nodes.
•The file contains inline version control
system information. The file must be
identical on all nodes other than the
inline version information.
•The file contains node-specific identi­
fication or configuration data. The file
must be identical on all nodes other
than the node-specific data.
lib32_counterpart_lib64 test
A 32-bit library is allowed to be present
without a 64-bit counterpart if:
•The 32-bit and 64-bit libraries are both
present but have different names—for
example, libA.so and libA-x86_64.so.
•The 32-bit library has a corresponding
64-bit library but does not correspond
to the same version. The 64-bit version
must be more recent—for example, libB.
so.1 (32-bit) and libB.so.2 (64-bit).
If the tests fail for reasons other than
these exceptions, or other tests in this
run fail, you must resolve the reported
issues or your run cannot be accepted.
Sometimes, you may have to perform
multiple iterations of debugging and
retesting, making changes to the cluster
itself or to the file describing the cluster
configuration, to resolve all issues. After
completing the run successfully, save the
output so you can send it to Intel.
4
Once you receive certification for this
reference design, you can use the design
to mass-produce certified clusters for sale
to your customers. In fact, you can sell sev­
eral different types of clusters from this
reference design by varying the hardware
while maintaining the same software stack.
Learn how to leverage your reference
design engineering investment—see the
program document, Mass-Producing Your
Certified Cluster Solutions, for details.
Interactive Submission Form­
Please check the following:
I have read and understood the Intel® Cluster Ready specification. By checking this box, I certify that my cluster
recipe meets all of the requirements contained in the specification, including the following requirements:
•Fully automated node provisioning, including adding and removing nodes
•All non-Ethernet network fabrics configured to enable both TCP/IP and DAPL interfaces
•Remote console capabilities
•Adherence to all primary, referenced standards (for example, POSIX)
Company name:________________________________________________________________________________________
Cluster system product name:_ ____________________________ Number of nodes certified:_____________________
The identifier should be the one used by your customers.
It should not be an internal codename unless no other identifier exists._
If necessary, please distinguish between types of nodes
(for example, compute, service, and so on).
Contact information:
Name:_ ________________________________________________ Phone:______________________________________
Title:_ _________________________________________________ Fax:________________________________________
Address:_ ______________________________________________ E-mail:______________________________________
This individual will receive the Intel® Cluster Ready compliance certificate by e-mail. This individual will also be contacted if there are any questions regarding the submission.
Submit by E-mail
1You must either supply the full path when invoking Intel Cluster Checker or use the environment setup script included with the tool.
This individual will receive the Intel® Cluster Ready compliance certificate by email. This individual will also be contacted if there are any questions regarding the submission.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY
INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL
ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS
INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT
OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR
ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions
marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to
them. The information here is subject to change without notice. Do not finalize a design with this information.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current
characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies
of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intel’s Web site at
www.intel.com.
Copyright © 2011 Intel Corporation. All rights reserved. Intel, the Intel logo, and Xeon are trademarks of Intel Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others.
Printed in USA
1011/KE/HEM/XX/PDF
Please Recycle
326233-001US