Most professional software teams are using monitoring and alerting to ensure their servers and applications are healthy. At Geronimo we are no exception. We aggregate events and data from all of our data centres to create dashboards that we use to monitor server health at a glance. We also use these metrics to generate alerts as soon as something unusual happens, which means we can solve anomalies fast before they impact our services.
In order to guarantee delivery of vital events, such as lead forms, we queue the messages to enable multiple retries. If there are any problems completing the communication, the message remains in the queue and is re-sent. This prevents customer requests being “lost in the system” and guarantees delivery of these messages. These queues are included in our monitoring strategy, with both queue size and queue age being used to raise alarms if messages get delayed.
But monitoring is not limited to purely technical numbers. We use custom metrics that collect important business statistics that we can use to track the impact of releases, new features, and content adjustments.
For example, if contact form submissions drop to a lower level after a release, there may be a change in the release that is negatively impacting this key website feature, so the feature will need to be re-visited – or even returned to its prior state – while the drop is analysed. Dramatic changes can also cause alarms to be raised to ensure the change is not missed.
Business metrics can appear on the same dashboards as technical data and are just as important when it comes to keeping business healthy.