Network Monitoring Tools

Help Scripts

Network Monitoring Tools

In this section some tools are described from which its main task is to monitor the network. Two type of tools are described:

Tools that are purely directed to network equipment, using SNMP (Simple Network Management Protocol).
Host oriented tools. In contradiction to the SNMP directed tools, these tools are better capable of showing an end-user perceptive impression of the network. In general they have a more intrusive character because host generated traffic is involved, although to generate as much as possible traffic is no option here.

In general both monitor types are required to give a complete overview of the network performance.

Cricket

Description

Cricket is a high performance, flexible system for monitoring trends in time-series data. Cricket has two components, a collector and a grapher. The collector runs from cron every 5 minutes (or at a different rate, when desired), and stores data into a data structure managed by RRDtool. A web-based (CGI-bin) interface can be used to view graphs of the data. Cricket reads a set of configuration files called a configuration tree. This setup makes it possible to manage a larger number of devices in a scalable way. On the other hand it is our experience that this also makes the debugging of configuration errors more difficult. For instance the order in which files are read in a configuration directory of the tree might be important. When the configuration tree has been changed it should be compiled, but the compiler only notifies for syntax errors and not for structural weaknesses. This in contradiction to many compilers of various programming languages.

It is also a disadvantage of Cricket that general SNMP counter MIB files are not supported. OID's should be explicitly specified, although in the top of the configuration tree there are templates available for standard counters (i.e. 32 and 64 bit Octet counters) and in sub directories of it for some standard Cisco equipment. But it would be much more convenient when it would also be possible to load arbitrary MIB files.

In general one may state that the configuration structure of Cricket is a bit rigid. For instance data types and graphs of them are tightly coupled, while it would be more convenient to have also the possibility to connect a distinct graph type to the data from a particular interface. This would make it possible for instance to display the Octet counters from a 10 Gigabit/s interface with a larger resolution than the counters from a 1 Gigabit/s interface.

The overview of the available results is presented in the form of HTML table listings of the devices and their interfaces. That is fine when a large numbers of equipment has to be managed with no immediate correlation between the interfaces. However, in the situation with network tests, involving a limited number of hosts and interfaces, it often also is convenient to obtain an overview of the activity at all, or at some selected, interfaces of the device, a.o. for debugging purposes. Fortunately this feature can be configured. See the "Useful, non-default configurations" subsection below for a description. We also don't like very much the layout of this index table. It would be a nice feature to add some statistics of the listed interfaces to it. In the current setup there is sufficient space available for that. Unfortunately, the layout of this index page seems to be not configurable.

But a very positive point of Cricket is that it is not required to specify SNMP interface ID's in the configuration tree. Instead the interface names, as defined in the SNMP interface tree, can be used. Cricket itself searches with an appropriate SNMP query the ID connected to a particular interface which is called in the documentation instance mapping. This is especially useful, because the interface ID distribution in general will be different when the interface topology of a network device has been changed. On the other hand the interface name are in general directly related with the slot the interface is connected. Therefore, these names are in general invariant for hardware reconfigurations.

Another nice feature of Cricket is the default property to store and display also the maximum counter values. This is especially useful in a test environment that typically shows an inconstant traffic load. Otherwise the long-term graphs with their long averaging intervals would underestimate the traffic that had been running.

Useful, non-default configurations

Below some configurations are described that are useful in the viewpoint of our monitor perspectives. Unfortunately these configurations are a bit hidden described in the configuration reference. Therefore, they are explicitly mentioned here.

Multiple interface graphs

To become an overview of the total activity from a test setup, the targets variable, that is residing under the Target configuration section, can be used to generate an index view where the results in graphics form of multiple interfaces are displayed. From this page there are hyperlinks to display the information of the corresponding interfaces in greater details, including current and average counter values and the available short and long term graphs that could be displayed on demand.

This multiple graph index page is exact the same functionality which is default in the MRTG tool discussed below. The reason that this is not the default feature of Cricket is that each graph image is generated on demand by a separate CGI call. When there are many interfaces available at a device, this would lead to a serious performance load of the Web server host. However, see the possible solution described in the "Modifications" section below.

Counter arithmetic

When for instance the cumulatively generated traffic by a set of hosts connected with a network device should be analysed, it might be useful to display the counter sum of all interfaces connecting the hosts. In Cricket this is possible with a combination of the mtargets and mtargets-ops variables that are residing under the Target configuration section.

To make this feature a bit more clear, in the following configuration fragment the counter sum of the interfaces entitled gigeth_3_10, ..., gigeth_3_12 will be calculated:

Target  Sum_GigEth
    short-desc	 = "Sum Gigabit Ethernet Interfaces"
    mtargets     = "gigeth_3_10;
		    gigeth_3_11;
		    gigeth_3_12"
    mtargets-ops = sum()

This feature also is also available in MRTG, but there it will result in the querying of new counter values, while Cricket will reuse the existing counter values of the corresponding interfaces which is in our opinion a preferable approach.

Modifications

As mentioned before, the multiple graph index page, described above, has one important drawback: each single graph is created by a separate CGI call by the mini-graph.cgi Perl script, that is on its turn called by the grapher.cgi grapher script, that produces all Cricket CGI-bin output. When many graphs are displayed, the multiple calls of the mini-graph.cgi script might introduce a considerable load at the Web server host, because for each graph a new process will be spawn.

Therefore, just as with the SmokePing tool, described below, the SpeedyCGI tool has been used. This tool speeds up CGI-bin written in Perl, by making them memory resident and handling new requests to the script which is already running. The tool is transparent in the sense that no reconfiguration of the Web server is involved. To use it is only required to change the top line of Perl CGI-bin scripts from "#! PerlPath" into "#! SpeedyPath"¹, where SpeedyPath is in general /usr/local/bin/speedy.

The only drawback from SpeedyCGI is that global Perl variables are being reused, so with a sloppy initialisation of these variables, problems might occur. However, when used in both the grapher.cgi and mini-graph.cgi grapher scripts we do not observe unexpected behaviour. Also in the Internet there has been reported that SpeedyCGI works fine with these grapher scripts. As the collector scripts are unaffected by SpeedyCGI, stored data never can become corrupted by this tool.

Runtime Example

Consider as current runtime example the Cricket monitor of the Force 10 switch which is located at NIKHEF to which the DAS-2 nodes are connected. The multi-graphs page, explained above, can be viewed using this link.

MRTG

Description

The Multi Router Traffic Grapher (MRTG) is a tool to monitor the traffic load on network-links. MRTG generates HTML pages containing PNG images which provide a visual representation of this traffic. MRTG is based on the Perl scripting language and the C programming language for the sampling of the results. It works under Unix and Windows NT.

MRTG is a tool which has been developed before RRDtool, and therefore, does not use that tool. However, both tools have in common that they were written by Tobias Oetiker.

In fact MRTG default uses a flat file format to store the time-related database. The HTML files and the PNG image graphics are always generated after new data have been add to the file and are not generated-on-demand. This makes MRTG less scalable as Cricket. However, MRTG can also be configured to be used with RRDtool. In that case HTML and graphics are created by CGI-scripts provided by external providers. See also the MRTG-RRD documentation.

In our opinion the usage of external providers for a part of the functionality that is not clear separable might lead to maintenance problems. For instance the external CGI-scripts should also understand the MRTG configuration files. When scalability is not an issue,we believe it is preferable to use "classical" version of MRTG that uses flat database files. On the other hand Cricket, described above, with the discussed non-default configurations and modifications add to the default setup is also a good alternative.

For its configuration, MRTG uses one configuration file that is indeed not very well scalable, but reasonable convenient and flexible to use, although the syntax of the SNMP counters to monitor is not ideal. As in Cricket is is possible to execute arbitrary arithmetical operations with the SNMP counters to monitor. We used that for to monitor the total of all ingoing and outgoing traffic from the test hosts. But a clear disadvantage is that the instance mapping feature from Cricket is missing. See the documentation for more information about configuration settings.

MRTG contains a program called indexmaker that can be used to generate from the configuration file configurable index views of the monitor results. In general these index files are useful in obtaining a better overview of the monitor results.

Another disadvantage of MRTG is that in each graph only two counter types can be monitored, in general incoming and outgoing counters. The graphical functionality of RRDtool is in this expect more extended.

Modifications

The following modifications to the MRTG version which uses flat database files have been executed:

To make MRTG work in a Gigabit environment, 64 bit counters should. MRTG can be configured to work with MIB-2 which contains 64 bit counters, so that is no problem. However the rateup program, written in the C programming language, works with 32 bit unsigned long Integer variables. This program has been modified to work with 64 bit unsigned long long counters for MRTG version 2.9.17. This version, containing the modified rateup source rateup.c and is from this site available for downloading. Installation is identical to the regular MRTG version.
There is a bug in the implementation of MIB-2 for the Cisco 15454 implying that with Multiple-GET SNMP calls all counters queried at the same interface are zero, with the exception of the first counter. Cisco promised to supply a bug fix in version 3.4.0. At November 5, 2002 this version had been installed and the bug indeed appeared to be fixed. As long as that fix was not yet available we were using MRTG with the mrtg_if_filter Perl script that queries the router with single GET calls. It can be downloaded from this site. Also a configuration file example has been included.
In the configuration files footer and header HTML code can be add. That can be used to add hyperlinks to the pages to make the set of MRTG HTML pages easier to browse and a more connected set of pages. Unfortunately this feature is missing in the indexmaker script. We solved this by adding the desired links with dedicated scripts who first run indexmaker and then add the appropriate links.

SmokePing

Description

SmokePing is a latency logging and graphing system, also written by Tobias Oetiker. It consists of a daemon process which organises the latency and a CGI-bin script which presents the graphs. The fping command is used to do the actual pinging. Compared with MRTG the Web configuration possibilities have been improved, because besides a configuration file there also exists a Web template. Again RRDtool has been used as frontend. Also SmokePing is written in Perl and therefore should port to any Unix flavour. Because plug-in modules can be used, the tool is also extensible to monitor other data types beside Ping data.

Using CGI-bin for Web presentations is always relatively slow and resource intensive, because for each request a new process has to be spawn. This is especially true when no executables but (Perl) scripts are used, because these scripts contain in fact the complete system library to support all facets of the OS. The large difference with for instance JAVA Applets is that here the resources are taken from the client host and not from the Web server.

To deal with these CGI-bin related problems, SmokePing uses SpeedyCGI, that speeds up CGI-bin written in Perl, by making them memory resident and handling new requests to the script which is already running. SmokePing has been optimised for use with SpeedyCGI. The tool is transparent in the sense that no reconfiguration of the Web server is involved. The CGI-bin Perl scripts should be spawn by the plain and simple speedy executable in the place of the usual perl command¹.

Previously, Ping data were also monitored with MRTG with the configuration option to use external filter programs for instance the MRTG-Ping-Probe tool. The great advantage of SmokePing is that all aspects that can be derived from a set of ping packets send are expressed with a single graph as function of the time the packets were send:

The median value is expressed with a horizontal, coloured line piece at the time the packets were send.
The # packets lost are expressed with the colour of the line piece.
The distribution of the Round-Trip-Times in the set of packets send are expressed with a vertical set of gray blocks, where the blackness of a block is an indication for the # packets in the sample bin, expressed by the height of the block. The gray blocks are of course only drawn when the distribution of Round-Trip-Times is observable from within the resolution of the ping command and / or the vertical RTT scale.

1 Which executable Exec should be used to parse an executable script can be indicated in Unix by placing the following line at the top of the script: "#! ExecPath".

`SshFPing` Probe

Our hosts, participating in the Netherlight Lambda project from SURFnet are in general with their Gigabit interfaces connected to separated Netherlight VLAN's. Internally they are all connected with Fast Ethernet. It is the intension to monitor all Netherlight VLAN's from the Web server host that is also running SmokePing. The solution is that the Web server host starts at the corresponding Netherlight test hosts the fping command using a secure shell. To be able to do this the SshFPing probe has been written which is derived from the FPing probe written by Tobias Oetiker. Unfortunately the documentation of the probes API is rather poor, so the probes examples need a thorough study to write your own probes.

The SshFPing.pm probe source file can be downloaded from this site. It should be placed in the lib/probes sub directory of the SmokePing installation.

Runtime Example

As example of this way of latency presentation our SmokePing monitor at the ASP is presented, where Netherlight Lambda connections are compared with regular Internet connections. Because the Gigabit interfaces of the test hosts are situated in different VLAN's, the SshFPing probe has been used at the Web server host to start the fping command with a Secure-Shell connection via the Fast Ethernet interfaces.

rTPL

Description

The rTPL (remote Throughput Ping Load) package can be used to execute periodic net performance measurements tests between a set of workstations. The performance measurements consist of round-trip and throughput measurements between a number of hosts. Optionally also UDP tests can be executed, using the UDPmon toolkit (for Pentium processors only). By default all pairs in the hosts set are used, but it is also possible to select the host pairs for the tests. Also, the machine load of the participating workstations is measured. In this way performance loss can be related to heavy machine load. The tests are performed by Perl scripts which are used to parse, sample and / or organise the results. The performance measurements are executed by the tools listed below.

Netperf is used to do the throughput tests.
The round-trip latency tests are executed by the system ping command.
The optional UDP tests are executed by the udp_bw_resp and udp_bw_mon commands from the UDPmon toolkit.
The host load values are obtained with the system uptime command.

The presentation of the results is Web based and dynamic: the net performance data are stored in ZIP compressed plain text files, which are accessible from a Web server. There are various files so that a user can be offered several views of the data, including several time based averages. The file data are read into the Web browser by a JAVA Applet. The HTML scripting language JavaScript is used to display the data in various tables. The Applet can also be used to present the data in plot form.

See the rTPL overview page for more information.

Download

The current distribution and installation instruction can be found on the rTPL download page.

Runtime Example

Examples of running (and stopped) monitors can be found in the rTPL overview page.

Overview Current Monitors

Up-to-date hyperlinks to our currently used monitors at the WTCW, Watergraafsmeer, Amsterdam, combined with links to related monitors can be found in this overview page.

Network Test Tools

Table Of Contents

Help Scripts