Managed Availability

With the advent of Exchange 2013, we see a move away from the traditional monitoring techniques that were used with previous versions.  This change in emphasis has been driven by Microsoft’s experiences with Office 365, where user experience is far more important than individual performance metrics.

Managed availability runs on all Exchange 2013 servers, where you see it represented as the Health Management Service process (MSExchangeHMHost.exe) and the Health Manager Worker process (MSExchangeHMWorker.exe)

In order to improve the user experience, Managed Availability uses three components which work together:  Probes, Monitors and Responders.

Probes

Probes are probably the simplest component within Managed Availability.  Probes take measurements at regular intervals.  Health Mailboxes are used in databases to measure the database health.

  • Can generate artificial transactions
  • Can depend on Performance Monitor Counters
  • Can measure the health of a protocol (e.g. OWA or ActiveSync)
  • Can measure end-to-end user experience
  • Run at varying intervals (20 seconds to 20 minutes)

Monitors

Assesses the data from the probes and compares it against normal levels of activity.  If a difference is detected, the monitoring system determines if the difference warrants action being taken. If action is warranted, the monitor alerts a responder.

  • Recognises patterns and problems
  • If a problem is detected, invokes a responder

Responders

Responders are equipped with a range of actions and escalation paths that can be taken to resolve problems. Actions include.

  • Restarting underlying service
  • Recycling an Application pool within IIS
  • If initial action is unsuccessful, monitor escalates its action
  • Ultimate escalation is to a human being through a SCOM alert. If you don’t use SCOM, you may need to build your own monitoring tool using the Managed Availability cmdlets.

Managed availability does not attempt to determine root cause of failure, just to detect problems and take action to restore that component to a working state.

Although it is an automated process and does not usually require configuration, you can examine some aspects of Managed Availability by running some Exchange cmdlets and by examining some of the crimson channels in the Applications and Services logs area of Event Viewer.

To examine recent Managed Availability Actions you can run the following command in PowerShell:

Get-WinEvent -LogName Microsoft-Exchange-ManagedAvailability/RecoveryActionResults | Format-List

The following IDs are used in this log:

  • ID 500 – Starting a recovery action
  • ID 501 – Recovery Action Completed Successfully
  • ID 502 – Recovery Action failed

To make things more manageable, Exchange uses Monitoring Identities or Healthsets to group the individual elements together.  These can be listed using the following command:

 Get-WinEvent -LogName Microsoft-Exchange-ActiveMonitoring/MonitorDefinition | `
Foreach-Object  {[xml]$_.toXML()}).event.userdata.eventxml | `
Select-Object  -ExpandProperty ServiceName –Unique

This will give values such as:

  • ActiveSync
  • OWA
  • OWA.Protocol
  • OWA.Proxy
  • MailboxTransport

An overview of the current server health can be seen by running:

Get-ServerHealth

This can be filtered to show relevant items only with:

Get-ServerHealth | where {$_.CurrentHealthSetState -ne  ‘NotApplicable’}

Or to see the information for a particular component:

Get-ServerHealth | where {$_.HealthsetName -eq  <componentID>}

To see the Probes, monitors and responders for a health set you can run:

Get-MonitoringItemIdentity -Identity <componentID>