It is a great tool It summarizes and gives you a broad look at the health of your exadata system. I used it on an X3-8 system. The latest version can be downloaded from the support id : “Oracle Exadata Database Machine exachk or HealthCheck” [ID 1070954.1].
Why and when do we need it ?
Exacheck is advised to be run as a part of periodic maintenance operations on the exadata. It is strongly recommended to be run before or after any upgrade, configuration change or any change on the software or hardware.
How does it work ?
Download the latest version from the oracle support (ID 1070954.1).
Download the “exachk 2.2.2 production (Oracle hardware)” executable bundle. (check for a new version).
In a directory owned by the orace user. The unzipped files must be owned by the oracle user.
You can find a version that comes with the machine in that directory : /opt/oracle.SupportTools/exachk. Most probably, It is an old version so download a new version.
Query the version like that :
[oracle@]$ /opt/oracle.SupportTools/exachk/exachk -v
EXACHK VERSION: 2.1.5_20120524
[oracle@]$ /opt/oracle.SupportTools/exacheck20130616/exachk -v
EXACHK VERSION: 2.2.1_20130506
For a full health check run the command with “–a “switch. Run it with the oracle user.
It will start to run commands with root, oracle, nm2user privileges on all the DB and cell nodes.
You have to provide the passwords or enable ssh equivelancies. If you don’t have access, you have option to skip the checks on those components also.
What does it provide?
It generates html based detailed reports. It includes the versions of the components. Findings that need attention (FAIL, WARNING, INFO) section is the most useful. If a cable is unplugged or configuration issues, it will show up there. If the database or host configuration does not follow the best practice recommendations then you can get recommendations. One of the best part is most of the time you can find the command that generated the report. Then later on You can run the command and check if it is fixed or not.
It is a great tool and even if you don’t plan a major change in the system it can be run monthly for maintenance.
Sometimes you get a warning like this in the report it means some checks failed and not checked. What I experienced, most of the time it is connection timeout or password problem to a remote server. Also Check the ilom of the server.
WARNING! The data collection activity appears to be incomplete for this exachk run. Please review the “Killed Processes” and / or “Skipped Checks” section and refer to “Appendix A – Troubleshooting Scenarios” of the “Exachk User Guide” for corrective actions.
Timed out while running collections on host
You can set the timeout limits before exacheck execution.
* RAT_TIMEOUT (default 90 seconds, non-root individual commands)
* RAT_ROOT_TIMEOUT (default 300 seconds, root userid command sets)
* RAT_PASSWORDCHEK_TIMEOUT (default 1 second, ssh login DNS handshake)
[oracle@host]$ export RAT_TIMEOUT=200 [oracle@host]$ export RAT_ROOT_TIMEOUT=500 [oracle@host]$ export RAT_PASSWORDCHEK_TIMEOUT=10