Deployment & Monitoring

The Final Post in our 4-Part Series on our Managed Services Platform Upgrades

Previously, I covered two parts of our new platform – provisioning and configuration management – and in this post I’ll cover deployment and monitoring.

Deployment is an interesting topic because like the vi/emacs debate for technical people, or the DH/No DH debate for baseball people, or anything else that sparks passion, your deployment tools can lead to some very intense conversations. For us, that means supporting two different deployment pipelines, and sometimes, two different tools.

Why two tools, though? Simple: different functions. The tool we use internally – and guide our clients towards if they have no current deployment pipeline – is Jenkins. With a 6 year independent history behind it, and 12 total years (prior to 2011, when it was forked from Hudson), it is a cornerstone of continuous integration in the open source world and broadly used in all forms of software development.

This choice was simple – we wanted a tool that was well understood and supported. Ease of use was on our list, but not primary to the cause. The second tool, however, is the current industry powerhouse: Atlassian’s Bamboo. This is done solely for the fact that many people have gotten roped in to the Atlassian suite of products and there was already enough customer demand to use it for code deployment.

Between these two products, we are able to manage the core of our deployment processes and integrate fully with the rest of our platform, including our monitoring suite.

The new monitoring suite is far more interesting than the actual deployment pipelines – it uses a combination of Sensu for the actual monitoring and reaches out to OpsGenie for phone/text/push alerting to our on-call staff, to our slack instance for chat-ops notifications, and our service desk tool, FreshService, for incident management. OpsGenie will also soon allow us to manage incoming client notifications and handle direct connection to our on-call engineers for high priority incidents.

Finally, we now have the ability to monitor differential metrics. Differentials are important because they allow us to alert on trends – prior to our new monitoring system, we were stuck with static metrics, which are better than what came before, but not so great as being able to notice that your memory usage has crept up 30% over the past 10 minutes or that your system load is increasing but hasn’t reached a critical threshold yet.

So how does all of this work together? When we deploy our infrastructure with Jenkins using Terraform, it then bootstraps in to Chef, which completes the configuration process. At the same time, the appropriate checks are created and the instances are registered with Sensu, allowing us to begin monitoring from the very beginning, at the initialization of an environment, with absolutely zero human touch required.

Fewer human touches mean fewer places where things can get missed or misconfigured, and fewer misses and misconfigurations are better for everyone.

All of this wouldn’t work very well without some technical glue that we’ve developed in house. Platforms are tricky things – and the way the pieces interconnect are as important as the pieces themselves – but that glue is the important part I can’t tell you about!

That concludes our new platform overview. It’s my hope to find time to technical deep dives with various aspects of the platform in the next few months, along with some of our senior engineers and architects from both our managed services and core engineering teams.  We’re interested to hear what you want to learn more about.  Send a question to us and we’ll try to address it in our next posts.