Tuesday, 8 July 2014

Ganglia Monitoring Tool 1


What is Ganglia:

  • It is a highly scalable monitoring system for high performance computing.
  • It can monitor a system or clusters of systems or grid of clusters.
  • It uses the XML technology for data representation.
  • It uses the RRDtool for the data storage and visualization..
  • The implementation of ganglia is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on thousands of clusters around the world.
  • It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes.
In a simple manner, “Ganglia is a real time cluster monitoring tool that collects information from each computers in the cluster and provides and interactive way to view the performance of computers and cluster a whole.”

Like other monitoring tool ganglia only provide a way to view but not control the performance of the cluster.
Architecture of Ganglia:
      The Ganglia system consists of, two daemons gmond and gmetad, a PHP based web frontend, and two other utilities gmetric and gstat.
What is Gmond:
      Gmond runs on every node of the cluster and gather the information like CPU, memory, network, disk, swap etc.
What is Gmetad:
      Gmetad runs on head node. It gathers data from all other nodes and stores them in round robin database. It can poll multiple clusters and aggregate the metrics. It is also used by the web frontend in generating the UI.
What is PHP Web Frontend:
     The Ganglia web front-end provides a view of the gathered information via real-time dynamic web pages. Most importantly, it displays Ganglia data in a meaningful way for system administrators and computer users. It should be installed on the same machine where gmetad is installed.
Ganglia Installation:
Installation of ganglia on master node:
 apt-get install ganglia-monitor rrdtool gmetad ganglia-webfrontend

The above command will install the gmond, gmetad and ganglia web UI on the node. The ganglia web frontend package also installs the required apache server and php modules. In order to deploy and run Ganglia in Apache server, it is required to copy the apache.conf file from /etc/ganglia-webfrontend/apache.conf to /etc/apache2/sites-enabled/:
sudo cp /etc/ganglia-webfrontend/apache.conf /etc/apache2/sites-enabled/ganglia.conf

The /etc/ganglia-webfrontend/apache.conf contains a simple alias for /ganglia towards /usr/share/ganglia-webfrontend.
Installation of ganglia on other nodes:
  apt-get install ganglia-monitor

The above command will install the ganglia monitor.
Gmond configuration on master node:
      There are two type of configuration ganglia supports, one is multicast and other is unicast. Here I am taking an example of a cluster to configure the ganglia in unicast mode. I have a cluster named “Test” with the 192.168.1.1 as a master node and 192.168.1.2 and 192.168.1.3 as slave nodes.
 globals {                   
  daemonize = yes             
  setuid = yes            
  user = ganglia             
  debug_level = 0              
  max_udp_msg_len = 1472       
  mute = no             
  deaf = no         
  allow_extra_data = yes  
  host_dmax = 0 /*secs */
  cleanup_threshold = 300 /*secs */
  gexec = no            
  send_metadata_interval = 30                                       
}
cluster {
  name = "Test"
  owner = "clusterOwner"
  latlong = "unspecified"
  url = "unspecified"
}
udp_send_channel {
  host = 192.168.1.1
  port = 8649
  ttl = 1
} 
udp_recv_channel {
  port = 8649
}
tcp_accept_channel {
  port = 8649
}
Gmond configuration on other nodes:
globals {                   
  daemonize = yes             
  setuid = yes            
  user = ganglia             
  debug_level = 0              
  max_udp_msg_len = 1472       
  mute = no            
  deaf = no         
  allow_extra_data = yes  
  host_dmax = 0 /*secs */
  cleanup_threshold = 300 /*secs */
  gexec = no            
  send_metadata_interval = 30
}
cluster {
  name = "Test"
  owner = "clusterOwner"
  latlong = "unspecified"
  url = "unspecified"
}
udp_send_channel {
   # mcast_join = 239.2.11.71
  host = 192.168.1.1
  port = 8649
  ttl = 1
} 
tcp_accept_channel {
  port = 8649
}
The gmond configuration defines the following properties.
global section :
  • daemonize : It is a Boolean attribute. When true, gmond will daemonize. When false, gmond will run in the foreground.
  • setuid : The setuid attribute is a boolean. When true, gmond will set its effective UID to the uid of the user specified by the user attribute. When false, gmond will not change its effective user. 
  • debug_level : The debug_level is an integer value. When set to zero (0), gmond will run normally. A debug_level greater than zero will result in gmond running in the foreground and outputting debugging information. The higher the debug_level the more verbose the output.
  • mute : The mute attribute is a boolean. When true, gmond will not send data regardless of any other configuration directives.
  • deaf : The deaf attribute is a boolean. When true, gmond will not receive data regardless of any other configuration directives. 
  • allow_extra_data  : The allow_extra_data attribute is a boolean. When false, gmond will not send out the EXTRA_ELEMENT and EXTRA_DATA parts of the XML. This might be useful if you are using your own frontend to the metric data and will like to save some bandwith.
  • host_dmax: The host_dmax value is an integer with units in seconds. When set to zero (0), gmond will never delete a host from its list even when a remote host has stopped reporting. If host_dmax is set to a positive number then gmond will flush a host after it has not heard from it for host_dmax seconds. 
  • cleanup_threshold : The cleanup_threshold is the minimum amount of time before gmond will cleanup any hosts or metrics where tn > dmax a.k.a. expired data. 
  • gexec : The gexec boolean allows you to specify whether gmond will announce the hosts availability to run gexec jobs. Note: this requires that gexecd is running on the host and the proper keys have been installed.
  • send_metadata_interval  : The send_metadata_interval establishes an interval in which gmond will send or resend the metadata packets that describe each enabled metric. This directive by default is set to 0 which means that gmond will only send the metadata packets at startup and upon request from other gmond nodes running remotely. If a new machine running gmond is added to a cluster, it needs to announce itself and inform all other nodes of the metrics that it currently supports. In multicast mode, this isn't a problem because any node can request the metadata of all other nodes in the cluster. However in unicast mode, a resend interval must be established. The interval value is the minimum number of seconds between resends.

Cluster section : 
  • name : The name attributes specifies the name of the cluster of machines.
  • owner : The owner tag specifies the administrators of the cluster. The pair name/owner should be unique to all clusters in the world.
  • latlong : The latlong attribute is the latitude and longitude GPS coordinates of this cluster on earth.
    Specified to 1 mile accuracy with two decimal places per axis in decimal.
  • url : The url for more information on the cluster. Intended to give purpose, owner, administration, and account details for this cluster.
Udp_send_channel :
      You can define as many udp_send_channel sections as you like within the limitations of memory and file descriptors. If gmond is configured as mute this section will be ignored.
The udp_send_channel has a total of five attributes: mcast_join, mcast_if, host, port and ttl.
  • mcast _join and mcast_if : The mcast_join and mcast_if attributes are optional. When specified gmond will create the UDP socket and join the mcast_join multicast group and send data out the interface specified by mcast_if.
  • ttl : The ttl is time to live field for send data.
  • host and port : If only a host and port are specified then gmond will send unicast UDP messages to the hosts specified. You could specify multiple unicast hosts for redundancy as gmond will send UDP messages to all UDP channels.
Udp_recv_channel :
      You can specify as many udp_recv_channel sections as you like within the limits of memory and file descriptors. If gmond is configured deaf this attribute will be ignored.
The udp_recv_channel section has following attributes: mcast_join, bind, port, mcast_if, family.
  • mcast_join and mcast_if : The mcast_join and mcast_if should only be used if you want to have this UDP channel receive multicast packets the multicast group mcast_join on interface mcast_if. If you do not specify multicast attributes then gmond will simply create a UDP server on the specified port.
  • port : The port is for creating a udp server on port.
  • bind : You can use the bind attribute to bind to a particular local address.
Tcp_accept_channel :
      You can specify as many tcp_accept_channel sections as you like within the limitations of memory and file descriptors. If gmond is configured to be mute, then these sections are ignored.
  • bind : The bind address is optional and allows you to specify which local address gmond will bind to for this channel. 
  • port : The port is an integer than specifies which port to answer requests for data.

Gmetad Configuration:
 data_source "Test" 15 192.168.1.1:8649
The gmetad configuration defines the data source configuration with cluster name, pooling interval and the gmond running ip and port. In data source configuration“Test” is the cluster name, 15 is the gmetad polling interval for metrics and “192.168.1.1:8649” is the gmond ip and port of head node.
Starting Ganglia :
No old process of gmetad and gmond should be running on machines.
Starting gmetad : Run the below command on head node of cluster.
 sudo service gmetad start
Starting gmond : Run the below command on all the nodes of cluster.
 sudo service ganglia-monitor start
Starting Apache Server :
Stop old running instance of apache2 server. Then run the below command to start apache server.

Ganglia Monitoring tool

Ganglia Monitoring tool


Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization.
It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on thousands of clusters around the world. It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes.
Be mindful that Ganglia will only help you to view the performance of your servers, and it doesn’t tweak or improve the performance. In this tutorial, we are going to implement Ganglia Monitoring Tool on Ubuntu 13.10 server and let us use Ubuntu 13.04 as our Monitoring target. Though it was tested on Ubuntu 13.10, the same method should work on Debian 7 and other Ubuntu versions as well.
Install Ganglia On Ubuntu 13.10
Before proceeding to install Ganglia, you have to complete the following tasks.
Make sure your Server has a properly installed and configured LAMP stack. To install and configure LAMP server, refer the following link.
If you’re using Debian, refer the following link.
Ganglia consists of two main daemons called gmond (Ganglia Monitoring Daemon) and gmetad (Ganglia Meta Daemon), a PHP-based web front-end and a few other small utilities.
Ganglia Monitoring Daemon (gmond):
Gmond runs on each node you want to monitor and monitor changes in the host state, announce relevant changes, listen to the state of all other ganglia nodes via a unicast or multicast channel and answer requests for an XML description of the cluster state.
Ganglia Meta Daemon (gmetad):
Gmetad runs on the master node which gathers all information from the client nodes.
Ganglia PHP Web Front-end:
It displays all the gathered information from the clients in a meaningful way like graphs via web pages.
Ganglia Installation On Master node
Install Ganglia using command:
$ sudo apt-get install ganglia-monitor rrdtool gmetad ganglia-webfrontend
During installation, you’ll be asked to restart apache service to activate the new configuration. Click Yes to continue.
sk@server: ~_001
Configure Master node
Now copy ganglia configuration file /etc/ganglia-webfrontend/apache.conf to /etc/apache2/sites-enabled/ directory as shown below.
$ sudo cp /etc/ganglia-webfrontend/apache.conf /etc/apache2/sites-enabled/ganglia.conf
Then edit file /etc/ganglia/gmetad.conf,
$ sudo nano /etc/ganglia/gmetad.conf
Find the following line and modify as shown below.
data_source "my cluster" 50 192.168.1.101:8649
As per the above line, the logs will be collected from each node every 50 seconds. Also, you can assign a name for your client groups. In my case, I use the default group name “my cluster”. Here 192.168.1.101 is my master node IP address.
Save and close the file.
Edit file /etc/ganglia/gmond.conf,
$ sudo nano /etc/ganglia/gmond.conf
Find the following sections and modify them with your values.
[...]
cluster {
  name = "my cluster"  ## Name assigned to the client groups
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"
}

[...]

udp_send_channel   {
#mcast_join = 239.2.11.71 ## Comment
  host = 192.168.1.101   ## Master node IP address
  port = 8649
  ttl = 1
}

[...]

udp_recv_channel {
  port = 8649
}

/* You can specify as many tcp_accept_channels as you like to share
   an xml description of the state of the cluster */
tcp_accept_channel {
  port = 8649
}

[...]
The changes in the above configuration file show that the master node which has IP address 192.168.1.101 will collect data from all nodes on tcp and udp port 8649.
Save and close the file. Then start ganglia-monitor, gmetad and apache services.
$ sudo /etc/init.d/ganglia-monitor start
$ sudo /etc/init.d/gmetad start
$ sudo /etc/init.d/apache2 restart
Ganglia Installation On Clients
Install the following package for each client you want to monitor.
On Debian / Ubuntu clients:
$ sudo apt-get install ganglia-monitor
On RHEL based clients:
# yum install ganglia-gmond
Configure Clients
Edit file /etc/ganglia/gmond.conf,
$ sudo nano /etc/ganglia/gmond.conf
Make the changes as shown below.
[...]

cluster {
  name = "my cluster"     ## Cluster name
  owner = "unspecified"
  latlong = "unspecified"
  url = "unspecified"

[...]

udp_send_channel {
  #mcast_join = 239.2.11.71   ## Comment
  host = 192.168.1.104   ## IP address of master node
  port = 8649
  ttl = 1
}
## Comment the whole section
/* You can specify as many udp_recv_channels as you like as well.
udp_recv_channel {
  mcast_join = 239.2.11.71
  port = 8649
  bind = 239.2.11.71
}
*/

tcp_accept_channel {
  port = 8649
}

[...]
Save and close the file. Next, restart ganglia-monitor service.
On Debian based systems:
$ sudo /etc/init.d/ganglia-monitor restart
On RHEL based systems:
# service gmond restart
Access Ganglia web frontend
Now point your web browser with URL http://ip-address/ganglia. You should see the client node graphs.
Ganglia:: unspecified Cluster Report - Mozilla Firefox_002
To view a particular node graphs, select the particular node you want from the Grid Choose Node drop-down box.
For example, i want to see the graphs of Ubuntu client which has IP address 192.168.1.100.
Ganglia:: unspecified Cluster Report - Mozilla Firefox_005
Graphs of my Ubuntu client (192.168.1.100) client:
Ganglia:: 192.168.1.100 Host Report - Mozilla Firefox_004
Client Node View:
Ganglia:: 192.168.1.100 Node View - Mozilla Firefox_006
Server Node view:
Ganglia:: 192.168.1.101 Node View - Mozilla Firefox_007
As you see in the above outputs, my client node (192.168.1.101) is down and server node (192.168.1.100) is up.