In network management, fault management is the set of functions that detect, isolate, and correct malfunctions in a telecommunications network, compensate for environmental changes, and include maintaining and examining error logs, accepting and acting on error detection notifications, tracing and identifying faults, carrying out sequences of diagnostics tests, correcting faults, reporting error conditions, and localizing and tracing faults by examining and manipulating database information.
When a fault or event occurs, a network component will often send a notification to the network operator using a protocol such as SNMP. An alarm is a persistent indication of a fault that clears only when the triggering condition has been resolved. A current list of problems occurring on the network component is often kept in the form of an active alarm list.
A fault management console allows a network administrator or system operator to monitor events from multiple systems and perform actions based on this information. Ideally, a fault management system should be able to correctly identify events and automatically take action, either launching a program or script to take corrective action, or activating notification software that allows a human to take proper intervention (i.e. send e-mail or SMS text to a mobile phone).
There are two primary ways to perform fault management - these are active and passive. Passive fault management is done by collecting alarms from devices (normally via SNMP) when something happens in the devices. Active fault management actively monitors devices via tools such as PING to determine if the device is active and responding. If the device stops responding, active monitoring will throw an alarm showing the device as unavailable and allows for the proactive correction of the problem.
Large development of telecommunications and IT technology has definitely led to the rapid increase in the number of services that are now offered to users. Also, technological development result is that services today are not related to technology or domain to which they belong, but complicated depending on the different technologies and equipment from different manufacturers. In this environment, the maintenance service has become much more complicated than before. In addition, the telco service’s market is very competative, and stable service is a "must have" in this business environment today. Good Fault Management system is imperative for each service provider's.
Ibis Instruments, in Fault management area, offers IBM's Tivoli Netcool product portfolio. Previously Micromuse, and since IBM acquired Micromuse, part of IBM's Tivoli Group, Netcool remained well known brand in the market, especially in the world of telecom operators and service provider, both in wireline and wireless section.
The basic idea, which led the developers, is to consolidate the monitoring on one central platform, from which would still be possible to do analysis and correlation problems in the services. Consolidation is the key, because today, when service depends on various technologies, domains and manufacturers in a very complex way, the key to the success of the monitoring network is to have problems consolidated in one central platform. Also, the centralized monitoring is of great importance in operational terms, since the consolidation of operations and the introduction of strict procedures and process of operation of great importance for the efficiency of management, as well as reducing maintenance costs.

Netcool koncept
Netcool concept (picture above) shows a Netcool philosophy - "breaking" Fault Management to 4 parts: the collection layer, a layer of consolidation, analysis and automation layer and presentation layer (layer information).
Each of the layers is realized with different Netcool software:
-
Netcool probes & monitos - software agents to collect alarms from the network (layer collections),
-
Netcool Omnibus, consolidate powerful server - the heart of the system (the layer of consolidation),
-
-
-
-
-
Tivoli Business Service Manager (presentation layer).
The Netcool Suite from IBM Tivoli provides industry-leading fault management and event monitoring and consolidation for some of the world's largest and most complex networks and IT infrastructures. The many products within the Netcool Suite combine to help improve business efficiencies for telecommunications carriers, mobile service providers, Internet service providers, broadband service provides and corporate enterprises.
A key success factor is the fact that this system is based on a strong and scalabile architecture that enables fast and efficient platform expanding in the 24x7 NOC environments, as well as great support for almost all types of equipment that are now conventionally used by telecom operators. In addition, Netcool is recognized by the force platform, and some of the largest networks in the world entrusted their Fault Management to Netcool products (all 20 of the 20 largest carrier in the world uses Netcool).
|