netcooljpg_no1.png

Additional information:

 

IBM Tivoli Business Service Manager

Implementation of IBM Tivoli Business Service Manager (TBSM) will provide the following:

  • Service modeling - graphical presentation how the alarm is reflected on the state of the certain service
  • Root Cause Analysis visualization - Overview of the network in the TBSM GUI is given the way that user can see how service depends on every device in environment, and directly have insight into where the problem really is, which manifested as service unavailability
  • Online basic SLA Review - several options available to set SLA thresholds, and the service SLA status preview

01_no3.jpg

TBSM GUI


The basic requirement for the IBM Tivoli Bussines Service Manager software is the possibility of arbitrary creation of service models, graphic visualization and online services monitoring these models. TBSM provides, beside alarms table view (which is functionality present in the system after implementation of fault management), the possibility of insight into the services the way they really depend on the equipment in reality. Also, it will be possible to see how service instance affects the service in general, so it could be easily and quickly determined what is the main cause of the problem.

The software is scalable and adaptable to changing environment and telco service providers. New services can easily be added and also making changes to the existing ones. IBM Tivoli Business Services Manager (TBSM) service modeling gives service preview just the way service really depends on the equipment and systems, as long as possible to determine on which depends.

IBM Tivoli Business Services Manager is, of course, compatible with the system for monitoring IP-based services - IBM Tivoli Netcool solution. After service model creation, alarms are taken from IBM Tivoli Netcool ObjectServer, the central place where all alarms from the network are stored. Also, IBM TBSM can retrieve data from external data sources, which will be more clarified in the document below. TBSM integration with Netcool fault management solution is reflected also in a fact that TBSM is able to send alarms about the state of disrupted services and SLA violations in the central IBM Tivoli Netcool Object Server, where the alarms from the software are presented equally in the table with other alarms from all over system.

IBM Tivoli Business Services Manager (TBSM) has a graphical user interface, GUI, to create and review service models. Service Model provides operator insight into real time changes of the service state. GUI is functional, transparent, intuitive and easy to handle.

All settings are done in the GUI very fast and easy for an administrator, with the possibility of settings done through the console.

 

SLA in TBSM

Service Level Agreement, SLA, within TBSM means primarily in terms of prevention, because warning alarms will be received when certain thresholds are exceeded, before SLA violations happens (predefined critical value), which would, of course, lead to penalties that should be paid to dissatisfied customer.

02_no1.jpg

TBSM SLA setting


TBSM provides 3 methods for calculating the SLA's:

  • Duration based SLA - track time from the beginning to the end of the incident duration
  • Cumulative Duration SLA - monitoring the total cumulative time of service being unavailable within a specified longer period of time (month, day, hour, minute)
  • Number of violations in given time period - during a defined time interval counts the incident occurrence, regardless of how long the incident lasted.

 

Following explanation of each method SLA:

  • Duration-based violations  
    Duration-based violations, allows to define amount of time service can be unavailable before TBSM sends violation or warning event. For example, if the service is unavailable for more than 2 minutes to generate a warning event (marginal status), and if it is unavailable for more than 5 minutes violation (critical status) will be reported.
     
  • Cumulative Violation calculation
    The cumulative-time violations lets you set the amount of time a service can be unavailable within a given time period before TBSM sends a violation or warning event to TBSM ObjectServer. The default settings in the Standard SLA report a warning after 15 minutes of down time and a violation after 45 minutes of down time within the selected time period (month, day, hour, or minute). You can track SLA violations over all four time periods simultaneously.
     
  • Violation incident-count calculations
    The calculate number of duration violations in given time period section lets you set how many violations can occur in a given time period before TBSM sends a violation or warning event. This setting helps you to evaluate whether a service is having too many short-duration outages in a given time period. The cumulative time of these outages may be small, but the frequency of the outages can disrupt the service. An example of this SLA would be within half an hour should occur max. 2 violations before the challenges of the alarm warning (marginal), and after 5 incidents occurs the red alarm.


Depending on the nature of service and potential problems, ways to track SLA differs. If the customer signs a SLA contract with the ISP so that e.g. during the day customer can be max. 5 minutes without the Internet, that contract can be poor quality services guarantee for user. If the service has a flip-flap problem, the user will often loose connection. So, in this case, incident-count SLA violations contract would be much better solutions for user to sign.

It is possible to define a number of different SLA's, and assign them to particular service. For example, the SLA named "standard" would have thresholds set to following: during the half hour 2nd violation will lead to marginal status, 3rd violation will "produce" bad status. But also SLA named "Gold" can be created, that would have more stringent criteria: 1 incident – marginal state, 2 for bad during the half hour.

Of course, all the alarms generated by TBSM will be together with all other alarms in the central Netcool Object Server. Alarms from a TBSM will inform regarding the status of the service, how long is the status of the service bad (violation) or marginal (warning)…

TBSM provides possibility to give greater importance to some SLA warnings, so prioritization in case of multiple critical alarms can be achieved. Setting "weight coefficient" to the SLA, you will know which of the several received critical alarm requires attention first.

Important is to mention also that all the settings in TBSM are done through the web GUI, which is very user-friendly and intuitive.

 

Penalties (penalty due to SLA violation)

SLA penalty settings allow cost estimates for each hour in which the services of interest is unavailable. Calculation is based on unit-price per hour from the moment when SLA violation happened for that service instance. Therefore, penalties are charged only when cumulative duration SLA is set.

On the first picture in document, in the lower right corner opened SLA tab can be seen. Immediately, using the color (yellow or red) user has insight into how seriously SLA is violated. Also, there is information about how long the incident has lasted and how much time is left up before the SLA violations happens (time with the sign "-" tells how long SLA has been violated), date, and column Penalty. (Of course, the alarms in this table are among the "regular" alarms in Netcool Webtop)

 

SLA Charts

Right-click on the desired service instance and selecting the necessary options, graphical representation status is shown. SLA chart can be used as a quick insight to service state. Note that for such a review of the SLA service status, of course, cumulative duration SLA needs to be set.

Graphic displays either 2D or 3D, there are few tables with information regarding the amount of penalties for SLA violation (in dollars $), and another table with columns stating a critical element that caused the violation of SLA (e.g. root cause of status violation), a current state of critical elements (e.g. "still down"), or the time when the situation is resolved (or is "not resolved"), and total time duration of "accidents".

Graphic display provides information for the time interval by setting start and end dates of interest for daily or review by the hour. Also, there are "on-click" options: Hourly, Daily or Monthly for the time-range view in the picture.

For example, can choose day view, from hour to hour, or within one month from the day to day etc…

Also, any time you can switch views from daily to monthly view with one click.

 

Maintenance Window

Of course, there are moments when company performs maintenance of their equipment, and this time definitely should not enter into the calculation of SLA violations. TBSM has the ability to set a time period when reported incidents will not affect the SLA. It is possible to set, for example, that the SLA will not be calculated every Monday from 2-4 am, when certain maintenance tasks on equipment are performed. It is possible, for example, to set SLA to "works" only from Monday to Friday, etc.

While under maintenance, status on GUI display is seen as blue (as opposed to the usual green, yellow (marginal) or red (bad) colors) and with the icon that represents that service instance is under maintenance.

As long as the service instance is blue (under maintenance) any incoming event has no impact on the status of those service instances.

 

Service Heartbeat

Except needs for information whether service instance has marginal or bad state, also the question is what happens if service is down without being able to send any alarm?

Therefore, situation where no red alarm occurs is not absolutely sure sign that the service is really ok. Adjusting service instance heartbeat requires from service to regularly (e.g. every 2 minutes) refers its status even if the status is good. If you do not get any information from the service, its status is displayed as Unknown, which should also alarm the staff to check the status of that service instance.

 

Custom Views

Each segment of the service viewer can be customized to display information in a way appropriate to particular service instance. Many different views can be defined to be available on a mouse click in the drop-down list. It is possible to fully customize icon that represent a service instance to display chosen information related to the service. For some service instances it will be sufficient, perhaps only the status (color), while other service instances require more data, including those related to the SLA service.

Also, it is possible to define filters so that, when threshold is exceeded, visual effect of flashing appear in order to attract more attention.

There is also ability to define condition where violated threshold will result such that certain service instance icons appear on the GUI screen. If the service instance status is good and it is not necessary to always have insight on particular service instance, it is convenient not to have that service instance on GUI screen (very useful for large models with many service instances).

03_no2.jpg

Service instance icons examples with different amount of data displayed

 

User Management

It is necessary to define access permits to certain people or group of people. Only certain employees should be allowed to have full control over all aspects of TBSM, while others may only have access to view the customized displayed data. Control of access rights is done at the level of creating a group with special privileges, and then assign TBSM user in certain groups. Assigning roles it is possible to control almost every possible case: to allow the user e.g. to edit the template, but not to create a template, or just to see the service instancess without further opportunities to influence on them...

Regardless to previously set acces rights, acces rights can be set directly on certain template or service instance.

 

External data sources

IBM Tivoli Business Service Manager supports remote data sources. Data from external sources can be used, typically, to set thresholds for the bad, marginal, good status, but also can be used to fill the numerical attributes, so service status can be determed from those values (e.g. response time, number of open tickets, etc.. ). These data can be, equally with the regular events, placed in a table in the TBSM GUI.

External data source data are taken either once a day in a given time, or polling frequency needs to be set.

TBSM supports integration with the following external data sources:

  • DB2
  • Informix
  • MS_SQL
  • MySQL
  • Oracle
  • PostgresSQL
  • Sybase
News
© 2006. IBIS instruments. All rights reserved   |   designed & produced by MASSVision, powered by cMASS