Architecture Overview The proposed architecture is composed of probes, the monitoring engine, the presentation component, and DB storage component. Each object's responsibility is as follows: Probes The probe is a software component that is able to notify the monitoring engine that a something has happened. The probe communicates with the monitoring engine through standard IP sockets, so that the probe can be placed on any system on the network. Also, there can be many probe communicating with the monitoring engine. A probe can also be a passive type of system. That is for example it just accepts something, and it translates it into a format understood by the monitoring engine. Types of probes envisioned from the get go are - SNMP probe: This probe will issue SNMP requests and check the response. It will also accept traps and issue notifications to the monitoring engine. - ICMP probe: Just do an ICMP ping and check reponse. Can issue a notification if response too slow, host not reachable or too much packet loss. - Service probe: This probe checks for standard Internet services such as SMTP, FTP, HTTP, DNS, etc and ensures that they are available. The probe can also issue a notification if the response is too slow. - Syslog probe: This probes listens to syslog and issues notifications if a certain message is seen. It can also do higher level functionality such as "if you see this message X number of times within Y hours" so notification. Additional probes envisioned include: - Script Probe: A probe that executes a script at the local system and returns a pass/fail condition. This will allow very quick integration of all kind of monitoring areas. - NT eventlog probe: probe that allows integration of NT event log error messages the same way as Unix syslog messages. - IPX probe: check Novell protocol - Tcpdump probe: a probe that looks for specific data packets, protocol, etc situations and issues notifications based on it. All probes accept command requests from a monitoring engine. These request allow the monitoring engine to see the current status of the probe. The commands understood by a probe are: - Systems being monitored: Show what systems you are monitoring, along with parameters to determine when to issue notifications. - Current status of one system. - Perform a check now. This will allow for verifying the condition of a system/service at a point. For probes that actively check the status of systems, polling configurations are standard. Some points about polling: - A default polling entry is defined so that affects systems/services that are not defined. - A polling can be time dependent. For example we should be able to say something like: Between 5:00 AM and 7:00 PM week-days, check status every 2 minutes notify immediately. Between 7:01 PM and 9:59 PM week-days, check status every 10 minutes notify after system down for 30 minutes. Between 10:00 PM and 4:59 AM week-days, No checks. On week-ends, No checks. - When a notification occurs, it reports the time that it first was noted and not the time that the notication is being issued. Additional notes about probes: - A probe can be connected to more than 1 monitoring engine. - A probe should save all notifications for the past X days. Monitoring engines can request a probe to resend all notification since a specific point in time. - A probe can be configured both locally, or through socket commands. - Any change to the configuration is logged through a notification to the monitoring engine. - Probes require a user-ID password combination to accept connection to them. - Protocol used between probe and monitoring engine is tailored along the lines of the SMTP or FTP monitor allowing ease of troubleshooting. - Probes report problems based on their local time-zone information. A monitoring engine can query the probe’s time-zone settings to know what it is. Monitoring Engine The monitoring engine is the heart of the architecture. It is responsible for: - Connect to a list of probes to start receiving notifications. Note that upon connect, the monitoring engine would most likely ask the probe to send all notifications issued since the monitoring engine was off the air. - Accept connections from the presentation layer, to communicate with the users. - Process and log notifications. Terminology - Notification: An unrequested message sent from a probe to a monitoring engine.