Uptime is extremely important to every website. Tracking Uptime data allows you to have more awareness and control over your site. It doesn’t matter how large your company is, no one is immune to downtime. Even now we see major companies such as Amazon, Twitch and Gov UK hit with significant outages.
Before going into the data gathered by the monitors we will first look at monitor set-up and our recommendations. We will look at what to track and the frequency we suggest the monitors should run.
What should I be tracking using Uptime monitors?
In your monitoring suite, you can never have too much data or too many pages to track. Essentially if a page is important to you, monitor it.
As a minimum we recommend monitoring your homepage as well as service impacting pages or key products.
For example, If you are a mid-level E-commerce platform selling coats here are the pages you might initially monitor:
- Category Pages
- Most Popular Coat
- Second Most Popular Coat
- Delivery Page
- Payment Page
These pages are quite universal for any Ecommerce business and are revenue impacting if they were to fail. Credits are allocated so you can always add or remove monitors if you change your mind in the future.
How frequently should I be tracking Uptime?
In your RapidSpike account, you can track Uptime from one minute to hourly. When choosing a frequency, the more frequent the checks the faster you will be notified of a failure. We suggest setting the frequency to 1 mins to ensure more accurate data. Frequent monitoring will also help with your alerting to ensure you are notified fast. If you do experience downtime it is a race against time to get ahead of the situation. We have seen a few #InternetShutdown trending tags recently and it is clear the companies that care about their monitoring.
Finally, before we look at the data that your monitors are collecting it’s worth noting that you can also create custom Uptime monitors. Customer uptime provides you with the option to check for status codes other than a 200. You can also alter the Method and add a Check Content, this looks for a specific string within the response or even check for a word/phrase.
For example this could be a redirect or you could maybe even to be alerted when a 404 is resolved at which time the monitor can be deleted.
Should you need any assistance in creating an Uptime monitor, check out our knowledge base content on ‘Uptime’.
How can I get the most out of my Uptime data?
Now you have all of your Uptime monitoring set up you will start to gather data and statistics around your reliability. The longer a monitor is running for, the more useful the data which will fuel your reporting and database of statistics.
Below is our Uptime dashboard for a site looking at a 30 day time period.
To explain each section of this dashboard we will start at the top and work downwards.
High Level Stats
At the top are your high level stats will give you a snapshot if your Uptime is currently Passing/Failing. The percentage is a reflection of Uptime your selected date rage. If the status is failing it means that we are currently flagging your site as down or with issues. Common reasons for this are:
- Your site is down
- Your site is resulting in a different status code than expected
- You are blocking our monitor
You can gather more information about the latest test further down in the latest results.
Next are some average statistics for average response and latest response timings as well as any status changes flagged.
You will see in the screenshot below that the response time is showing as as 180.8 ms. The recommended response time to aim for by Google is under 200 ms. 100 ms is optimal and anything over 500 ms is a potential issue.
The response graph provides a visual representation of your Uptime over the time period you are currently checking. If you want to investigate further you can hover over the dots at different intervals to see the response time. This is great for quickly viewing spikes or intermittent issues rather than going through each individual test.
The latest results section will automatically show you the last few test results. You can increase the data range to go further back in time to compare more data. Unlike the other sections this panel provides you with the ‘Test Location’. If there is an issue with one particular location this is where you would be notified.
Timeline / Status Changes
The Timeline and Status Changes sections are specifically for tracking when downtime occurred. Looking further into the event we also track how long it persisted and when the website was accessible again.
Now that you are familiar with the Uptime dashboard and understand the data presented it’s time to look at your alerting. This will just cover Uptime alerting however if you are looking for a more comprehensive guide in using our alerting suite, check out these articles.
For the purposes of this article we will assume the web team have three separate alert delivery groups, an immediate response team to investigate an event as soon as it happens, a secondary response team if the issue persists and finally a critical alert group if the issue has exceeded a reasonable time (that alerts the wider team).
Uptime is slightly different from other alerts as the frequency is applied in the ‘Alert Delivery Group’ panel. You can change this while setting up or editing the group. Alerts are highly personal depending on the structure of your team however we recommend setting up as standard three alerts along these guidelines:
- 1 min alert – Immediate Response Team
- 5 min alert – Secondary Response Team
- 10 min alert – Critical Alert
This example structure means that a team is alerted as soon as downtime occurs. The alert is then escalated if the issue isn’t resolved or if the first team is unavailable. Uptime will be important to multiple teams so it is worth collaborating with colleagues to ensure everything is set up as effectively as possible.
At this stage you will be monitoring all of your key pages and have alerting to notify the cprrect teams. The final thing you need to set up is scheduled reporting. This will condense the data into specific date ranges and make communicating the data easier.
You can keep track of how your Uptime is performing as all of our reports can be set-up as scheduled reports (Weekly, Monthly or Quarterly). For uptime monitoring we have four types of reports available.
Platform League Table – This report allows you to select a server/device and compare the average response times for a given period of time.
Uptime Overview – This report provides information on how a website/server has performed when it comes to Average Response, Uptime, Status Changes/Events, Downtime. You are also able to compare two websites/servers in the same report for a given time period.
24h Uptime Data – This report provides you with the previous 24hr Uptime Data for one of your servers. You can report this as Results or Events and even change the monitor type to Network, Services or Applications.
Uptime Events – This report outlines any events that occurred during a given time period (Uptime/Downtime).