Hadoop: Basic Monitoring Tools

Web UIs¶

Namenode Web UI:¶

Default port: 50070 (50470 for https)
The port can be configured with the dfs.namenode.http-address property of the hdfs-site.xml configuration file.
The Web Interface provides information on the Namenode and the HDfS as well. It provides also some utilities like file and logs browsing.

png

Datanodes Web UI:¶

Default port: 50075 (50475 for https)
The port can be configured with the dfs.namenode.http-address property in the hdfs-site.xml configuration file.

Some useful entrypoints:
/blockScannerReport : datanode block checks report
/logLevel : log4j logging levels form. This is similar to hadoop daemonlog utility
/stacks : Stack traces for all threads
/metrics?format=json : Metrics for the server

YARN Web UI¶

Default Port: 8042
The port can be configured with the yarn.resourcemanager.webapp.address property of the yarn-site.xml configuration file.
The Resource Manager Web interface provides cluster information on applications and scheduler.

png

Daemon Logs¶

Each daemon writes its own logs. The logs location can be configured in the hadoop|yarn|mapred.env files. These logs contains useful information on possible slowness, connectivity issues, platform bugs, etc.
The log location will be reported in the console when daemons are launches (just replace .out by .log).

$ hadoop-daemon.sh start namenode
starting namenode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-namenode-hadoop.out
$ hadoop-daemon.sh start datanode
starting datanode, logging to /var/log/hadoop/hdfs/hadoop-hdfs-datanode-hadoop.out

The job client¶

A mapreduce job client set up an application context (local files, jar, command, etc.) and prepare the very first container that will run the Application Master service in charge of negociating resources from the Resource Manager and work with the NodeManagers to execute and monitor container.

A mapreduce job application context is described on HDFS in the /user/<user_name>/.staging directory. The job client creates job.xml, job.<timestamp>.conf.xml containing all the hadoop parameters used to execute the job and jar files used in the hadoop jar command and then used by all the containers to execute the job.

The following command runs the sleep example with one mapper and one reducer:

$ yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient.jar sleep -m 1 -r 1 -mt 6000000

The information collected by the job client from the Application Master (through the Resource Manager) will be reported back to the console.

$ hdfs dfs -ls /user/client/.staging/job_1445722923719_0001
Found 6 items
-rw-r--r--  10 client supergroup    1533962 2015-10-24 23:42 /user/client/.staging/job_1445722923719_0001/job.jar
-rw-r--r--  10 client supergroup        151 2015-10-24 23:42 /user/client/.staging/job_1445722923719_0001/job.split
-rw-r--r--   2 client supergroup         19 2015-10-24 23:42 /user/client/.staging/job_1445722923719_0001/job.splitmetainfo
-rw-r--r--   2 client supergroup     100169 2015-10-24 23:42 /user/client/.staging/job_1445722923719_0001/job.xml
-rw-r--r--   2 client supergroup      12665 2015-10-24 23:43 /user/client/.staging/job_1445722923719_0001/job_1445722923719_0001_1.jhist
-rw-r--r--   2 client supergroup     119192 2015-10-24 23:43 /user/client/.staging/job_1445722923719_0001/job_1445722923719_0001_1_conf.xml

To kill the job, find the running job id with the mapred job -list command, then use mapred job -kill job_1445722923719_0001

The Application Master¶

How it works¶

The Application Master is the actual owner of the job.
The Application Master service is in charge of negociating resources from the Resource Manager and works with the NodeManagers to execute and monitor job containers.

The client submits the application and the Application Master specifications to the RM wich will negociate the NodeManager in which the Application Master will be launched.

The NodeManager generates shell script in the local application cache which are used to execute the application master container.
The application master server container directory is located in the yarn.nodemanager.local-dirs directory defined in yarn-site.xml.

$ ls -al usercache/client/appcache/application_1445722923719_0001/container_1445722923719_0001_01_000001
total 56
drwx--x--- 4 yarn yarn 4096 Oct 25 00:47 .
drwx--x--- 6 yarn yarn 4096 Oct 25 00:47 ..
-rw-r--r-- 1 yarn yarn   12 Oct 25 00:47 .container_tokens.crc
-rw-r--r-- 1 yarn yarn   16 Oct 25 00:47 .default_container_executor.sh.crc
-rw-r--r-- 1 yarn yarn   16 Oct 25 00:47 .default_container_executor_session.sh.crc
-rw-r--r-- 1 yarn yarn   36 Oct 25 00:47 .launch_container.sh.crc
-rw-r--r-- 1 yarn yarn  105 Oct 25 00:47 container_tokens
-rwx------ 1 yarn yarn  717 Oct 25 00:47 default_container_executor.sh
-rwx------ 1 yarn yarn  663 Oct 25 00:47 default_container_executor_session.sh
lrwxrwxrwx 1 yarn yarn  105 Oct 25 00:47 job.jar -> /srv/hadoop_data/yarn/local/usercache/client/appcache/application_1445722923719_0001/filecache/10/job.jar
lrwxrwxrwx 1 yarn yarn  105 Oct 25 00:47 job.xml -> /srv/hadoop_data/yarn/local/usercache/client/appcache/application_1445722923719_0001/filecache/13/job.xml
drwxr-xr-x 2 yarn yarn 4096 Oct 25 00:47 jobSubmitDir
-rwx------ 1 yarn yarn 3490 Oct 25 00:47 launch_container.sh
drwx--x--- 2 yarn yarn 4096 Oct 25 00:47 tmp

NOTE: job.jar is only a soft link that points to the actual job.jar in the applications filecache directory. This is how yarn handles distributed cache for containers.

Once launched, The AM communicate with RM to negociate appropriate resource containers according to its need. The Scheduler (in the Resource Manager) responds to this request by granting a container. After allocation, AM communicate with the Node Managers to launch container application on them.
The container is the resource allocation and grants an application to use a specific amount of resources (memory, cpu, etc.) on a specific host.

The client through the RM can communicate with its own AM to get status, progress updates, etc.

Once the application is complete, the Application Master deregisters with the Resource Manager and shuts down, allowing its own container to be repurposed.

MapReduce Containers¶

During application code execution within a container, the resulting logs are located in the yarn.nodemanager.log-dirs directory (syslog, stderr, stdout).

If a map or reduce task stay stucked, the syslog and stderr files are a good place to check.
Once the job has completed the NodeManager will keep the log for each container for yarn.nodemanager.log.retain-seconds.

$ find  application_1445722923719_0002/ -print
application_1445722923719_0002/
application_1445722923719_0002/container_1445722923719_0002_01_000001
application_1445722923719_0002/container_1445722923719_0002_01_000001/stdout
application_1445722923719_0002/container_1445722923719_0002_01_000001/stderr
application_1445722923719_0002/container_1445722923719_0002_01_000001/syslog
application_1445722923719_0002/container_1445722923719_0002_01_000002
application_1445722923719_0002/container_1445722923719_0002_01_000002/stdout
application_1445722923719_0002/container_1445722923719_0002_01_000002/stderr
application_1445722923719_0002/container_1445722923719_0002_01_000002/syslog

To debug failed tasks, set the mapreduce.task.files.preserve.failedtasks property in the mapred-site.yml file to true since it allows to leave "dead" files.

For more information on debugging see How to Debug Map/Reduce Programs

Note:
The log aggregation feature allows Hadoop to store the application logs in HDFS.
This feature can be activated with the yarn.log-aggregation-enable property set to true in the yarn-site.xml file.
The logs will be located at hdfs:///logs/userlogs/<user_name>/<app_id>. To access them use the following command line:

$ yarn logs -applicationId <applicationId> # Read Application logs

Note:
Hdfs default configuration
MapReduce default configuration
Yarn default configuration