Architecture Overview

The proposed architecture is composed of  probes, the monitoring engine, the
presentation component, and  DB storage component.  Each object's
responsibility is as follows:

Probes

The probe is a software component that is able to notify the monitoring
engine that a something has happened.  The probe communicates with the
monitoring engine through standard IP sockets, so that the probe can be
placed on any system on the network.  Also, there can be many probe
communicating with the monitoring engine.

A probe can also be a passive type of system.  That is for example it just
accepts something, and it translates it into a format understood by the
monitoring engine.

Types of probes envisioned from the get go are
- SNMP probe: This probe will issue SNMP requests and check the response.
  It will also accept traps and issue notifications to the monitoring engine.
- ICMP probe: Just do an ICMP ping and check reponse.  Can issue a
  notification if response too slow, host not reachable or too much packet
  loss.
- Service probe: This probe checks for standard Internet services such as
  SMTP, FTP, HTTP, DNS, etc and ensures that they are available.  The probe
  can also issue a notification if the response is too slow.
- Syslog probe: This probes listens to syslog and issues notifications if a
  certain message is seen.  It can also do higher level functionality such as
  "if you see this message X number of times within Y hours" so notification.

Additional probes envisioned include:

- Script Probe: A probe that executes a script at the local system and
  returns a pass/fail condition.  This will allow very quick integration of
  all kind of monitoring areas.
- NT eventlog probe: probe that allows integration of NT event log error
  messages the same way as Unix syslog messages.
- IPX probe: check Novell protocol
- Tcpdump probe: a probe that looks for specific data packets, protocol,
  etc situations and issues notifications based on it.

All probes accept command requests from a monitoring engine.  These request
allow the monitoring engine to see the current status of the probe.  The
commands understood by a probe are:
- Systems being monitored: Show what systems you are monitoring, along with
  parameters to determine when to issue notifications.
- Current status of one system.
- Perform a check now.  This will allow for verifying the condition of a
  system/service at a point.

For probes that actively check the status of systems, polling configurations
are standard.  Some points about polling:
- A default polling entry is defined so that affects systems/services that
  are not defined.
- A polling can be time dependent.  For example we should be able to say
  something like:
    Between 5:00 AM and 7:00 PM week-days,
      check status every 2 minutes
      notify immediately.
    Between 7:01 PM and 9:59 PM week-days,
      check status every 10 minutes
      notify after system down for 30 minutes.
    Between 10:00 PM and 4:59 AM week-days,
       No checks.
    On week-ends,
       No checks.

- When a notification occurs, it reports the time that it first was noted
  and not the time that the notication is being issued.

Additional notes about probes:

- A probe can be connected to more than 1 monitoring engine.
- A probe should save all notifications for the past X days.  Monitoring
  engines can request a probe to resend all notification since a specific
  point in time.
- A probe can be configured both locally, or through socket commands.
- Any change to the configuration is logged through a notification to the
  monitoring engine.
- Probes require a user-ID password combination to accept connection to
  them.
- Protocol used between probe and monitoring engine is tailored along the
  lines of the SMTP or FTP monitor allowing ease of troubleshooting.
- Probes report problems based on their local time-zone information.  A
  monitoring engine can query the probe’s time-zone settings to know what it
  is.

Monitoring Engine

The monitoring engine is the heart of the architecture.  It is responsible
for:
- Connect to a list of probes to start receiving notifications.  Note that
  upon connect, the monitoring engine would most likely ask the probe to send
  all notifications issued since the monitoring engine was off the air.
- Accept connections from the presentation layer, to communicate with the
  users.
- Process and log notifications.

Terminology

- Notification: An unrequested message sent from a probe to a monitoring
  engine.