System Health Check [Compatibility Mode]

PureData System for Analytics (PDA)
System Health Check
© 2014 IBM Corporation
But first . . .
Some general “How am I doing?” utilities
What versions are running
– NPS – nzrev
– HPF – cat /nzlocal/scripts/version.txt
– FDT – cat /opt/nz/fdt/version.txt
Are you collecting query history?
– Did you know that you can configure ODBC drivers to submit user data
Are you using IBM Netezza Performance Portal?
– With exception of configuring the event manager, it does as much and more than nzAdmin
Review Event Manager configuration with every major release change
nzhostbackup, nzraidcheck, nz_check_disk_scan_speeds
nz_health, nzhealthcheck, nzhw –show issues
nz_best_practices, nz_sysutil_history
Call Home
PureData System for Analytics (PDA)
© 2014 IBM Corporation
Agenda topics
✔
NPS and System Health Check versions
✔
Rules
✔
Running System Health Check
© 2014 IBM Corporation
NPS and System Health Check versions
It is possible to run a later version of nzhealthcheck than what was installed by the NPS
version upgrade
nzhealthcheck and its rules are contained in the S/W Support Tools package
See the following Technote for information on S/W Support Tools and versions of NPS
supported
https://www-304.ibm.com/support/entdocview.wss?uid=swg21668047
See the IBM Netezza System Administrator’s Guide, version 7.1 and later
http://www-01.ibm.com/support/knowledgecenter/SSULQD_7.1.0/com.ibm.nz.adm.doc/c_shc_overview.html
http://www-01.ibm.com/support/knowledgecenter/SSULQD_7.2.0/com.ibm.nz.adm.doc/c_shc_overview.html
Visit Fix Central for PDF versions of the above manual reference.
© 2014 IBM Corporation
Level 1 rules
Basic checks, usually platform-independent resolution
Checking reported components states: (i.e. AMM goes offline, fan turns off, dimm reports
error):
amm, pwr, chassis fan, blower, switch fan, host cpu, disk, dimm, host fan, sas controller state, host power supply,
spu, hba ...
Checking basic counts and presence:
SPU and CPU dimm
Number of active FPGAs
Check host error counters for disks, SAS controller
Check cluster and DRBD state
© 2014 IBM Corporation
Level 2 rules
Complex rule, specifying advice and problem description
Complex logic, like identifying correct disc/spu/enclosure balance
Specific per model logic
Rule dependencies, e.g. don't report enclosure PHY's turned off, when you know, that the
blade at the other side of SAS connection is down
© 2014 IBM Corporation
Running System Health Check
Daemon to run nzhealthcheck started automatically by nzinit
Can check on status of daemon, start or stop it via Linux service command
service nzhealthcheck {start|stop|status|restart}
Changes to admin password will require a restart
If started manually, will be prompted for the root password
Will prompt for root password upon initial execution.
Will run w/o root password but some checks will not be performed.
This is a requirement for some of the host device managers.
© 2014 IBM Corporation
Two Modes of Operation
Diagnostic – default mode
Monitoring
No data gathering
Automatic data collection
No event generation
Automatic rules evaluation
no nzCallHome events
Eventing generation
nzCallHome events
© 2014 IBM Corporation
Call Home
nzhealthcheck replaces the
disk_monitor script
nzOpenPMR can be activated
for some events related to disk
issues
predictive disk failure notification
based on LogPage 0x15 reporting
predictive disk failure notification
based on grown defects
For 7.1 it is initiated by adding a
line to CRONTAB
/nz/kit/bin/adm/nzhealthcheck –p
note the –p option
Post 7.2 configuration is
automatic
© 2014 IBM Corporation
Additional information gathered by nzhealthcheck
If using the sysinfo option additional data is
gathered, recommended for inclusion in
PMRs
Disk non-media error rate check
SCSI error count check
Transfer & error correction statistics (sysinfo
report only)
Firmware & hardware revisions (from
sys_rev_check)
Various host checks (from pts-check)
Network connections (from concheck)
Cluster check (from hpf_health)
Regen status (from nzds)
Verify availability of transaction ids in DB
et cetera
nzhealthcheck sysinfo
Frontend Hosts Utilization and Statistics
************************************************************************
Host
vendor
: IBM
model
: -[7947AC1]uuid
: 2EC092FB-9E52-3B99-86E3-F2578EB58039
serial_number: KQWWGHM
hw_version : 00
role
: primary
rack_id
:1
slot
:2
Host Fans
count dev_status name rpm slot status
---------------------------------------------------------2
Present
Fan 1 2234 1
ok
2
Present
Fan 2 1823 2
ok
2
Present
Fan 3 2652 3
ok
Host Power
avg name
p12 p3 p5 pBAT unit
------------------------------------------------------------26 Power Unit 12.15 3.34 5.04 3.09 ok
Host SAS Controllers
•
•
•
© 2014 IBM Corporation
File locations
nzhealthcheck is part of the Software Support Tools
Installed in /nz/kit/bin/adm pre NPS 7.2
The reports generated are in /nz/kit/log/nzhealthcheck
Rules documented in /nz/kit/share/nzhealthcheck/rules-doc.pdf
EventManager can be configured by editing the configuration file at
/nz/kit/share/nzhealthcheck/nzhealthcheck.cfg
Configuration at /nz/kit/share/nzhealthcheck/devmgrs/devmgrs.ini
Example of small segment of configuration
[nz@nz80409-h2 devmgrs]$ grep -A8 scsi_ls devmgr.ini
[scsi_ls]
triggeredOn = bom
path = devmgrs/bin/adm/nz_query_logsense
•
•
•
npsStates = online,paused
© 2014 IBM Corporation
Trademarks, disclaimer, and copyright information
IBM, the IBM logo, ibm.com, Current, and PureSystems are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and
service names might be trademarks of IBM or other companies. A current list of other IBM trademarks is available on the web at "Copyright and trademark information" at
http://www.ibm.com/legal/copytrade.shtml
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL PURPOSES ONLY. WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND
ACCURACY OF THE INFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. IN ADDITION, THIS
INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE. IBM SHALL NOT BE RESPONSIBLE FOR
ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION. NOTHING CONTAINED IN THIS PRESENTATION IS
INTENDED TO, NOR SHALL HAVE THE EFFECT OF, CREATING ANY WARRANTIES OR REPRESENTATIONS FROM IBM (OR ITS SUPPLIERS OR LICENSORS), OR ALTERING THE TERMS
AND CONDITIONS OF ANY AGREEMENT OR LICENSE GOVERNING THE USE OF IBM PRODUCTS OR SOFTWARE.
© Copyright International Business Machines Corporation 2014. All rights reserved.
PureData System for Analytics (PDA)
© 2014 IBM Corporation