TECHNICAL BRIEF Intel® Cluster Ready How to Certify Your Cluster The goal of Intel® Cluster Ready certification is to verify that your cluster solutions comply with the Intel Cluster Ready Specification, making them interoperable with compliant applications. This document describes the mechanics, or concrete steps, that are required to certify a cluster. Additional information may be found in the Intel Cluster Ready Specification and the Intel Cluster Checker User’s Guide. If you are building a cluster using a reference design that has already been certified, you may not need to recertify it—see the program document, Certification Quick-Start Guide, for more details. By joining the Intel Cluster Ready Program, your company already has obtained the Intel® Cluster Checker tool and a valid license. Be sure you have the current version—you must have Intel Cluster Checker version 1.8 or later to use the procedures described here. Also, this document assumes you have developed a new cluster reference design and built a cluster based on that design. You are now ready to perform the required Intel Cluster Checker certification run on that cluster to verify that the reference design is Intel Cluster Ready–compliant. The run will produce two output files— a text file and an XML file. Both files and a bill of materials (BOM) for your cluster must be included in the certification sub mission. Microsoft Word, Microsoft Excel,* or plain ASCII text are the preferred formats for BOMs. Here are the basic steps to certify your new cluster reference design: 1. Review the Cluster Bill of Materials Each Intel Cluster Ready reference design has a specific hardware and software bill of materials that is defined by your design engineering team. Table 1 shows an example of the hardware components of a BOM, and Table 2 shows the software components. Before you begin your certification run, obtain the BOM for the particular cluster you will be testing. The test cluster should conform to the BOM, and the BOM will determine some of the Intel Cluster Checker settings for the certification run. When your company sells future clusters based on this reference design, some variances from the original BOM hardware will be allowed. For example, the customer may order more memory, faster hard disk drives, or a different number of nodes (see the program document, MassProducing Your Certified Cluster Solutions, for details). However, the BOM software will need to match each new cluster and will be verified by Intel Cluster Checker. Table 1. BOM example, hardware components Quantity Item Manufacturer Model 32 Intel® Server Board Intel S5520UR 32 Intel® Server Chassis Intel SR1600URSASBPP 32 Intel® HDD Backplane Intel ASR1600PASBP 32 2 Intel® Xeon® Processors Intel Intel Xeon Processor X5660 32 6x 2GB DDR3 PC3-10600 Micron MT18JSF25672PDZ-1G4D1 32 500Gb SATA Hard Disk Drive 3 Gbs Seagate* Barracuda* ST3500320NS 32 ConnectX IB—Dual-Port InfiniBand Adapter Card QDR Mellanox* 32 DVD/CDRW—Slimeline SR1550/SR1560 Intel MHQH29-XTC Hw Revision: a0 Fw Version: 2.7.000 AXXDVDCDR 1 A low-latency Gigabit Ethernet switch Hewlett-Packard* ProCurve* J4904A 1 Mellanox* MTS3600Q-1BNC 32 MTS3600-BNC 36-port 20 and 40Gb/s InfiniBand Switch System Infiniband cabling / QSFP connector Mellanox 1 KVM over IP Solution Avocent MCC4Q30C-002 Copper Cable 4XQSFP 30AWG 2m DSR8035 Table 2. BOM example, software components Distributed By Description Contact Information Intel® Corporation http://www.intel.com/go/cluster Intel® Corporation Platform Computing* Platform Computing Red Hat Platform Computing 2 Reference Design Package The following files are included: •T he Intel Cluster Checker files (config, head and node list) • The Reference Design Scripts •T he Intel Cluster Ready Certificate •T he Intel Cluster Ready Reference Implementation Release Notes Intel® Network Driver. The following Lan drivers are included: • e 1000 version 8.0.25 • e1000e version 1.2.20 • igb version 2.4.11 • ixgbe version 3.2.9 Intel® Cluster Checker 1.8 Program registration is needed. Intel_Cluster_Ready_Reference_DesignS5520UR-ICR1.1-HPC2.1-RH5.5-C1-v1.0.zip http://downloadcenter.intel.com/Detail_Desc. aspx?agr=Y&DwnldID=18239&lang=eng intel-lan_linux_v16.0.zip md5: 36a21f3230fe3d60ceff340e137d49e2 http://my.platform.com kit-intel-cluster-checker-1.8-1.x86_64.iso md5: 7784cd007bf5fc7be59df0f76dd48338 Platform HPC Enterprise Edition 2.1 RHEL http://my.platform.com Included kit list in the ISO image: hpc21-4613.rhel.iso •B ase Kit md5: 4e0bc888373af697d88ffbc471b9af7f •P latform LSF Kit •N agios Kit •P latform OFED Kit •G UI Console Kit •P latform HPC Kit •P latform ISFAC Kit •P latform MPI Kit •P latform RTM Kit Site registration is needed. Red Hat* Enterprise Linux 5.5 https://www.redhat.com/apps/download/ RHEL5.5-Server-20100322.0-x86_64-DVD.iso md5: f3119f883257ef9041234feda2f1cad0 Intel® Cluster Runtimes 3.0 http://my.platform.com Program registration is needed. kit-intel-runtime-3.0-1.x86_64.iso md5:0f8520960feceb43166351c2f289b8da 3. Generate Fingerprint Files At this point, you will need to generate a set of reference fingerprint files that will be used in the future to test clusters built from this reference design. To verify that the fingerprint is valid, you must make the newly generated fingerprint part of your Intel Cluster Checker runs.1 It is also important to configure Intel Cluster Checker to verify all of the message fabrics in your cluster. Both the DAPL and TCP/IP interfaces of all the message fabrics must be included. For example, if you are certifying a cluster with both Ethernet and InfiniBand* fabrics, you need to configure Intel Cluster Checker to test both. Several different test modules may be affected depending on the fabrics involved (Figure 1). Then, specify the resulting fingerprint files for this cluster in the configuration of the Intel Cluster Checker packages module before running Intel Cluster Checker in the next step. See the Intel Cluster Checker User’s Guide and Module Reference Guide for more information. Add any other checks appropriate for your specific cluster. You can use the <include_ module> tag in the configuration file to include optional Intel Cluster Checker test modules—for example, to include the openib module for clusters with InfiniBand, or the e1000 check for cluster nodes that use baseboards with an Intel® PRO/1000 network adapter. Additional checks can also be added using the <add_ dependency> tag, which in many cases is the better option. Intel recommends You must save the fingerprint files. Keep them with this reference design so that your company can use them each time you build a new cluster from the design (as detailed in Mass-Producing Your Certified Cluster Solutions). The files will be used to verify each new cluster, ensuring that it matches the certified reference design software stack. You should distribute the fingerprint files to your end customers along with each new cluster you build from this reference design. Generate fingerprints of your head and compute nodes with the following command: cluster-check <xmlfile> --packages Table 3. Intel® Cluster Checker performance-related modules Module Name Value disk_bandwidth Measures storage sub-system performance hpcc System-wide performance benchmark imb_pingpong_intel_mpi Network bandwidth and latency indicators memory_bandwidth_stream Measures bandwidth to memory mflops_intel_mkl Indicator of computing throughput Test Module Include if Needed Next, generate the Intel® Cluster Checker XML configuration file to define your cluster configuration (as detailed in the Intel Cluster Checker User’s Guide). Also, it is highly recommended that you save the Cluster Checker XML configuration at /etc/intel/clck/config.xml. An important part of this process is defining appropriate performance thresholds for the cluster you are testing, based on the BOM. Although this task is optional, Intel highly encourages you to set thresholds for all performance-related test modules. Key modules that provide an indication of cluster performance are shown in Table 3. that you use the XML schema provided with Intel Cluster Checker to validate your configuration file with third-party tools such as xmllint. Beginning with version 1.5, Intel Cluster Checker provides capabilities to automatically detect some system parameters and create a configuration input file. See the Intel Cluster Checker User’s Guide for more information. Configuration Required 2. Configure Specific Tests for Your Cluster dat_conf hpcc imb_collective_intel_mpi imb_message_integrity_intel_mpi imb_pingpong_intel_mpi intel_ethernet_driver intel_mpi_rt intel_mpi_rt_internode intel_mpi_testsuite ipoib openib subnet_manager Figure 1. Intel® Cluster Checker fabric-related modules 4. Run Intel Cluster Checker as a Regular User Once you have generated the Intel Cluster Checker XML configuration file, you are ready to run the tool. Use the Intel Cluster Checker command-line interface and start this run with the following command: cluster-check <xmlfile> --certification 1.1 If the cluster has TCP/IP over Ethernet only, the command line option --exclude intel_mpi_testsuite may be added to the above command. As part of this run, Intel Cluster Checker will verify the capabilities of your message fabric or fabrics. Be sure you have configured Intel Cluster Checker to verify all of the message fabrics in your cluster as described in Step 2. After the run, save the output so that you can send it to Intel. The run should be completed successfully to obtain certification, but specific exceptions are permitted: if either the file_tree test or lib32_counterpart_lib64 test fails for one of the following reasons, then your compliance run may still be accepted. However, Intel strongly encourages you to use the exclusion options for these tests to allow these tests to pass. Refer to the Intel Cluster Checker Module Reference Guide for more information on adding exclusions for these modules. 3 file_tree test 5. Submit Results for Certification A node is allowed to differ for the following reasons: You should now have a text output file and an XML output file for the Intel Cluster Checker runs, plus the bill of materials (in Microsoft Word, Excel, or ASCII text format) for your cluster, ready to submit for certi fication. To make the submission, complete the form below and use the “Submit by E-mail” button to open an e-mail to cluster @intel.com. Before sending, attach your two Intel Cluster Checker output files and your cluster bill of materials to the e-mail. •The file differs due to the use of prelink (or similar utility) by the Linux* distri bution. The checksum of the original, unmodified file must be identical on all nodes. •The file contains inline version control system information. The file must be identical on all nodes other than the inline version information. •The file contains node-specific identi fication or configuration data. The file must be identical on all nodes other than the node-specific data. lib32_counterpart_lib64 test A 32-bit library is allowed to be present without a 64-bit counterpart if: •The 32-bit and 64-bit libraries are both present but have different names—for example, libA.so and libA-x86_64.so. •The 32-bit library has a corresponding 64-bit library but does not correspond to the same version. The 64-bit version must be more recent—for example, libB. so.1 (32-bit) and libB.so.2 (64-bit). If the tests fail for reasons other than these exceptions, or other tests in this run fail, you must resolve the reported issues or your run cannot be accepted. Sometimes, you may have to perform multiple iterations of debugging and retesting, making changes to the cluster itself or to the file describing the cluster configuration, to resolve all issues. After completing the run successfully, save the output so you can send it to Intel. 4 Once you receive certification for this reference design, you can use the design to mass-produce certified clusters for sale to your customers. In fact, you can sell sev eral different types of clusters from this reference design by varying the hardware while maintaining the same software stack. Learn how to leverage your reference design engineering investment—see the program document, Mass-Producing Your Certified Cluster Solutions, for details. Interactive Submission Form Please check the following: I have read and understood the Intel® Cluster Ready specification. By checking this box, I certify that my cluster recipe meets all of the requirements contained in the specification, including the following requirements: •Fully automated node provisioning, including adding and removing nodes •All non-Ethernet network fabrics configured to enable both TCP/IP and DAPL interfaces •Remote console capabilities •Adherence to all primary, referenced standards (for example, POSIX) Company name:________________________________________________________________________________________ Cluster system product name:_ ____________________________ Number of nodes certified:_____________________ The identifier should be the one used by your customers. It should not be an internal codename unless no other identifier exists._ If necessary, please distinguish between types of nodes (for example, compute, service, and so on). Contact information: Name:_ ________________________________________________ Phone:______________________________________ Title:_ _________________________________________________ Fax:________________________________________ Address:_ ______________________________________________ E-mail:______________________________________ This individual will receive the Intel® Cluster Ready compliance certificate by e-mail. This individual will also be contacted if there are any questions regarding the submission. Submit by E-mail 1You must either supply the full path when invoking Intel Cluster Checker or use the environment setup script included with the tool. This individual will receive the Intel® Cluster Ready compliance certificate by email. This individual will also be contacted if there are any questions regarding the submission. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or by visiting Intel’s Web site at www.intel.com. Copyright © 2011 Intel Corporation. All rights reserved. Intel, the Intel logo, and Xeon are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Printed in USA 1011/KE/HEM/XX/PDF Please Recycle 326233-001US
© Copyright 2024