Tuesday, 8 July 2014

Ganglia Monitoring Tool 1


What is Ganglia:

  • It is a highly scalable monitoring system for high performance computing.
  • It can monitor a system or clusters of systems or grid of clusters.
  • It uses the XML technology for data representation.
  • It uses the RRDtool for the data storage and visualization..
  • The implementation of ganglia is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on thousands of clusters around the world.
  • It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes.
In a simple manner, “Ganglia is a real time cluster monitoring tool that collects information from each computers in the cluster and provides and interactive way to view the performance of computers and cluster a whole.”

Like other monitoring tool ganglia only provide a way to view but not control the performance of the cluster.
Architecture of Ganglia:
      The Ganglia system consists of, two daemons gmond and gmetad, a PHP based web frontend, and two other utilities gmetric and gstat.
What is Gmond:
      Gmond runs on every node of the cluster and gather the information like CPU, memory, network, disk, swap etc.
What is Gmetad:
      Gmetad runs on head node. It gathers data from all other nodes and stores them in round robin database. It can poll multiple clusters and aggregate the metrics. It is also used by the web frontend in generating the UI.
What is PHP Web Frontend:
     The Ganglia web front-end provides a view of the gathered information via real-time dynamic web pages. Most importantly, it displays Ganglia data in a meaningful way for system administrators and computer users. It should be installed on the same machine where gmetad is installed.
Ganglia Installation:
Installation of ganglia on master node:
 apt-get install ganglia-monitor rrdtool gmetad ganglia-webfrontend

The above command will install the gmond, gmetad and ganglia web UI on the node. The ganglia web frontend package also installs the required apache server and php modules. In order to deploy and run Ganglia in Apache server, it is required to copy the apache.conf file from /etc/ganglia-webfrontend/apache.conf to /etc/apache2/sites-enabled/:
sudo cp /etc/ganglia-webfrontend/apache.conf /etc/apache2/sites-enabled/ganglia.conf

The /etc/ganglia-webfrontend/apache.conf contains a simple alias for /ganglia towards /usr/share/ganglia-webfrontend.
Installation of ganglia on other nodes:
  apt-get install ganglia-monitor

The above command will install the ganglia monitor.
Gmond configuration on master node:
      There are two type of configuration ganglia supports, one is multicast and other is unicast. Here I am taking an example of a cluster to configure the ganglia in unicast mode. I have a cluster named “Test” with the 192.168.1.1 as a master node and 192.168.1.2 and 192.168.1.3 as slave nodes.
 globals {                   
  daemonize = yes             
  setuid = yes            
  user = ganglia             
  debug_level = 0              
  max_udp_msg_len = 1472       
  mute = no             
  deaf = no         
  allow_extra_data = yes  
  host_dmax = 0 /*secs */
  cleanup_threshold = 300 /*secs */
  gexec = no            
  send_metadata_interval = 30                                       
}
cluster {
  name = "Test"
  owner = "clusterOwner"
  latlong = "unspecified"
  url = "unspecified"
}
udp_send_channel {
  host = 192.168.1.1
  port = 8649
  ttl = 1
} 
udp_recv_channel {
  port = 8649
}
tcp_accept_channel {
  port = 8649
}
Gmond configuration on other nodes:
globals {                   
  daemonize = yes             
  setuid = yes            
  user = ganglia             
  debug_level = 0              
  max_udp_msg_len = 1472       
  mute = no            
  deaf = no         
  allow_extra_data = yes  
  host_dmax = 0 /*secs */
  cleanup_threshold = 300 /*secs */
  gexec = no            
  send_metadata_interval = 30
}
cluster {
  name = "Test"
  owner = "clusterOwner"
  latlong = "unspecified"
  url = "unspecified"
}
udp_send_channel {
   # mcast_join = 239.2.11.71
  host = 192.168.1.1
  port = 8649
  ttl = 1
} 
tcp_accept_channel {
  port = 8649
}
The gmond configuration defines the following properties.
global section :
  • daemonize : It is a Boolean attribute. When true, gmond will daemonize. When false, gmond will run in the foreground.
  • setuid : The setuid attribute is a boolean. When true, gmond will set its effective UID to the uid of the user specified by the user attribute. When false, gmond will not change its effective user. 
  • debug_level : The debug_level is an integer value. When set to zero (0), gmond will run normally. A debug_level greater than zero will result in gmond running in the foreground and outputting debugging information. The higher the debug_level the more verbose the output.
  • mute : The mute attribute is a boolean. When true, gmond will not send data regardless of any other configuration directives.
  • deaf : The deaf attribute is a boolean. When true, gmond will not receive data regardless of any other configuration directives. 
  • allow_extra_data  : The allow_extra_data attribute is a boolean. When false, gmond will not send out the EXTRA_ELEMENT and EXTRA_DATA parts of the XML. This might be useful if you are using your own frontend to the metric data and will like to save some bandwith.
  • host_dmax: The host_dmax value is an integer with units in seconds. When set to zero (0), gmond will never delete a host from its list even when a remote host has stopped reporting. If host_dmax is set to a positive number then gmond will flush a host after it has not heard from it for host_dmax seconds. 
  • cleanup_threshold : The cleanup_threshold is the minimum amount of time before gmond will cleanup any hosts or metrics where tn > dmax a.k.a. expired data. 
  • gexec : The gexec boolean allows you to specify whether gmond will announce the hosts availability to run gexec jobs. Note: this requires that gexecd is running on the host and the proper keys have been installed.
  • send_metadata_interval  : The send_metadata_interval establishes an interval in which gmond will send or resend the metadata packets that describe each enabled metric. This directive by default is set to 0 which means that gmond will only send the metadata packets at startup and upon request from other gmond nodes running remotely. If a new machine running gmond is added to a cluster, it needs to announce itself and inform all other nodes of the metrics that it currently supports. In multicast mode, this isn't a problem because any node can request the metadata of all other nodes in the cluster. However in unicast mode, a resend interval must be established. The interval value is the minimum number of seconds between resends.

Cluster section : 
  • name : The name attributes specifies the name of the cluster of machines.
  • owner : The owner tag specifies the administrators of the cluster. The pair name/owner should be unique to all clusters in the world.
  • latlong : The latlong attribute is the latitude and longitude GPS coordinates of this cluster on earth.
    Specified to 1 mile accuracy with two decimal places per axis in decimal.
  • url : The url for more information on the cluster. Intended to give purpose, owner, administration, and account details for this cluster.
Udp_send_channel :
      You can define as many udp_send_channel sections as you like within the limitations of memory and file descriptors. If gmond is configured as mute this section will be ignored.
The udp_send_channel has a total of five attributes: mcast_join, mcast_if, host, port and ttl.
  • mcast _join and mcast_if : The mcast_join and mcast_if attributes are optional. When specified gmond will create the UDP socket and join the mcast_join multicast group and send data out the interface specified by mcast_if.
  • ttl : The ttl is time to live field for send data.
  • host and port : If only a host and port are specified then gmond will send unicast UDP messages to the hosts specified. You could specify multiple unicast hosts for redundancy as gmond will send UDP messages to all UDP channels.
Udp_recv_channel :
      You can specify as many udp_recv_channel sections as you like within the limits of memory and file descriptors. If gmond is configured deaf this attribute will be ignored.
The udp_recv_channel section has following attributes: mcast_join, bind, port, mcast_if, family.
  • mcast_join and mcast_if : The mcast_join and mcast_if should only be used if you want to have this UDP channel receive multicast packets the multicast group mcast_join on interface mcast_if. If you do not specify multicast attributes then gmond will simply create a UDP server on the specified port.
  • port : The port is for creating a udp server on port.
  • bind : You can use the bind attribute to bind to a particular local address.
Tcp_accept_channel :
      You can specify as many tcp_accept_channel sections as you like within the limitations of memory and file descriptors. If gmond is configured to be mute, then these sections are ignored.
  • bind : The bind address is optional and allows you to specify which local address gmond will bind to for this channel. 
  • port : The port is an integer than specifies which port to answer requests for data.

Gmetad Configuration:
 data_source "Test" 15 192.168.1.1:8649
The gmetad configuration defines the data source configuration with cluster name, pooling interval and the gmond running ip and port. In data source configuration“Test” is the cluster name, 15 is the gmetad polling interval for metrics and “192.168.1.1:8649” is the gmond ip and port of head node.
Starting Ganglia :
No old process of gmetad and gmond should be running on machines.
Starting gmetad : Run the below command on head node of cluster.
 sudo service gmetad start
Starting gmond : Run the below command on all the nodes of cluster.
 sudo service ganglia-monitor start
Starting Apache Server :
Stop old running instance of apache2 server. Then run the below command to start apache server.

No comments:

Post a Comment