Monitoring distributed Systems (OrWell)

(1996-1998)

The improvement of computer systems and the progress in network technologies has lead to an intensified:
decentralization of services and datas utilization of common resources (e.g. network) parallelization of processing steps in application systems

The partitioning of the application system into a distributed application system allows a good adaption of the system on physical, organizational and softwaretechnical requirements. A consequence of this development could be a increasing demand of distributed application systems. During development of a distributed application systems, developers are confronted with problems, which do not exists for the development of centralized systems, or exists only in a restricted form. Examples for such problems are the distribution of resources and/or processes to multiple computers, the parallelization of processing steps, the identification and treatment of failures, the steering of the processing of asynchronous events and the integration of communications services. »Monitoring« and dynamic program analysis shall help to support the development of distributed application systems. Major concerns are the fulfilment of system requirements, the coverage of problems as mentioned above and the observation of running systems.

This project includes the development of a online monitoring environment for distributed object-oriented applications. The monitoring environment is based on a distributed object-oriented software infrastructure (ObjectWire). We have developed a flexible monitoring architecture for different event classes (predefined and user-defined events) which allows us to plug-in sensors during runtime, to support dynamic program analysis in different stages of the software development cycle, like prototyping of the architecture, detection of faults during runtime of the distributed application system etc. The back-end of this environment supports different visualizations, from simple event diagrams over space and time, to transmission amounts and frequencies, to effectiveness of error management. Because of the huge state space of the distributed system and the huge amount of observed events, the user could not even detect or identify a problem or the reason of a problem.

Keywords: Monitoring Infrastructure, Monitoring Tools