Andy Reitz (blog)

 

 

RUG: SLM317: Adding Value with Automated Trouble Ticketing

The focus of this talk is on improving incident management, more for resource failures than end-user requests. Focus of this talk is on SIM. In the past, auto-generated tickets haven't been correlated, and there has been duplicate tickets submitted by users. Technology of Event Manager and Help Desk is advanced that it is worth another shot. Help Desk has more automation capabilities, Event Managers more dynamic.

Central issue is that alerts say what physical resource is broken, not what service is affected. Not possible to automatically notify users.

Solution: In the CS tradition, insert another layer in between EM and Help Desk. This is SIM (Service Impact Manager). Event Management can reduce event flow (filtering, duplicate detection, enrichment, etc.). Correlation not required by SIM model. Needs work to define service model -- can use discovery to determine infrastructure & some config/topology, but need to define actual user-preceived services by hand. Can do master/child tickets automatically. List of services affected in ticket can be dynamic (as additional services go down or get fixed).

IDEA: event suppression? Change tickets that you cut in HD could have CI information in them, and that could then flow into EM, to automatically suppress alerts during change.

My summary: The idea of a SIM seems like a reasonable one. I didn't get a lot of details about BMC's product, so I can't say if that is something that I would want to see in our environment or not. But I think that there is a lot of potential in the EM/SIM/HD combo for doing automation (which is my bread and butter at EDS).