Continuous Delivery: From Bot to Drone (part 1)

It’s always nice to have someone around to do things for you. But, when it comes to repetitive and delicate tasks, humans are error prone. At Stylight, we do these kind of tasks in an automated and more repeatable way.

And that’s how our Slack Bot was born. Originally forked from SlackHQ’s Rtmbot, it’s a simple automation tool that performs delicate and automated tasks precisely. It gives us confidence that things will less often go wrong (bots can be as crazy as humans after all).

We’ve hosted this bot as simple EC2 instance. Over time its successes have made it more sophisticated, complex and business critical. What happens if the instance it’s running on disappears over night?

when it comes to repetitive and delicate tasks, humans are error prone. At Stylight, we do these kind of tasks in an automated and more repeatable way.

The classic chicken and egg problem: automation code which needs to be automated! Ultimately, we’d like to maintain and extend the codebase in a more automated way. Tests become more crucical to ensure we don’t introduce regressions which break “must have” functionality. In short, you need to make the bot self aware and failure resistant.

One of my first tasks after joining Stylight was to write some new tasks for the bot. But I saw an important opportunity to refactor the code, create regression tests and automate its deployment. Welcome to the world of continuous delivery!

Drones are free to move and not attached to a specific body. After these modifications our bot can freely fly across our EC2 fleet and not attached to any specific EC2 instance.

Diagram

Action plan

We need an Action plan to tackle this. To make things more structured and stable, I split the process into 5 steps and kept them as isolated as possible.

Regression tests (unit tests)
Building manageable artifacts (Docker + versioning)
Life cycle management (bot downtime sucks)
Configuration management (keep sensitive AWS keys out of github!)
Deployment (Continuous Delivery)

I’m using some very cool technologies here so watch out 😀

Here I’ll cover the first 2 steps and the rest will be covered in the second part.

1. Regression tests:

Good test coverage is the foundation of continuous delivery because we need assurance that our code actually works before deploying it live.

Testing against 3rd party web services is always a pain. It’s slow, error prone and complicated. As our slack bot mostly interacts with AWS, we needed a proper way mock out AWS services.

There is an interesting library called Moto which provides mock calls to AWS. Leveraging Moto allowed us to easily test our code without actually hitting AWS services.

I chose one of our core modules to write unit tests against the most critical paths. Refactoring was much easier since I could run tests to ensure everything still worked as expected. If you’re refactoring unfamiliar code, I strongly recommend TDD – your colleagues will thank you for not breaking their code!

2. Building manageable artifacts

Code usually isn’t very helpful until you build it. Some languages require both compile time and dependency management but with interpreted languages we only focus on dependency management. No matter what the build process requires, the resulting artifact should be safely stored and correctly versioned:

Use the exact same artifact that was built and successfully tested in other environments. Don’t rebuild and push an unknown artifact to production!
Versioning artifacts enables rolling back the deployment if something goes wrong. Don’t rebuild a previous version of source code for the rollback (refer to the point 1).

Docker to survive

If you use languages like Go or Java the binary build result might be an acceptable artifact. But interpreted languages like Python or Ruby require all package dependencies in one file. In days of yore, we wrapped them up in TAR files or DEB packages.

Today, docker containers make a very attractive alternative, and, if done properly, it provides a nice way to lock down our deployment environment.

Our bot needs some credentials to manage AWS resources and Slack integration. I used environment variables for some of these, but I’ll explain more in Life cycle management below.

Our build starts off by running unit tests, creating a tagged image out of the working environment and pushing it to the AWS EC2 Container Registry.

This process is fully automated using the CircleCi docker integration. Every check-in to the master branch kicks off the build steps mentioned above.

Automated Deployment