We have successfully completed our Scheduled Maintenance. During this time we allocated two hours and completed the migration in just over three hours. During the update we had the following successes:

  1. Upgraded our Postgres database four major versions
  2. Added a set of other security improvements
  3. Achieved zero downtime on sites

How we communicated the downtime

To communicate in our app, we use Intercom. Intercom offers us a way to send a message to all active users. The logged in users were sent a message and then again once it was completed.

On our marketing sites we added a banner that linked to our blog. To do this we updated a data file in our jekyll theme called _data/banner_notification.yml. This file looks like:

enabled: true
text: Scheduled Maintanence today at 5:00pm NZDT
url: https://cloudcannon.com/operations/2018/11/13/scheduled-maintenance/

This file tells our default layout to add a clickable banner on the top of all pages:

{% if site.data.banner_notification.enabled %}
  <div class="banner-notification">
    <p>
      <a href="{{ site.data.banner_notification.url }}">
        {{ site.data.banner_notification.text }}
      </a>
    </p>
  </div>
{% endif %}

Our sites are built using CloudCannon suite, to build our sites locally we run gulp dev. Using iterm2 we build all of our sites simultaneously and the suite even watches the local Jekyll Theme repository.

Once built we can see that the banner is live on all sites.

To get this live we push the updated Gemfile.lock to master and publish to production via CloudCannon.

Improvements for next time

Here are the updates we are going to make the next time we schedule maintenance:

  • Inform earlier
  • Update our status page and integrate this with our app
  • Integrate our change log in the app

Why three hours not two?

Once we completed the database upgrades with 15 minutes to spare. We turned the app back on to point at the new database and the performance was abysmal. We needed to run the SQL command VACUUM ANALYZE on our database. This solved all of our issues but took some time. A nasty surprise at the end of a fairly seamless migration.