Two lessons the Infrastructure team stole from Ops, Dev and other departments at STYLIGHT

Posts by SysAdmins are usually rants. Or patronize you about someone’s philosophy. This blogpost won’t be either. It definitely could be both, but I will instead admit to shamelessly stealing from my colleagues. In the end, we use some pretty nice technologies and I’d like to give you a glimpse of what we’re doing at STYLIGHT Infra. These are two of the lessons I learned (stole and dubbed to be my own) from my peers.

Lesson 1: Configuration management is not exclusively for servers.

In Ops, our colleagues used Puppet for quite a while. They still use it, like it and will promote it if asked. However, as powerful as Puppet is, it has an equally steep learning curve and took some time until support for Windows arrived. Taking a closer look at the competition, Chef might have set the bar for integrating Windows by collaborating with Microsoft on their Desired State Configuration System for Powershell, but it is still to be considered a static Configuration Management System.

Being more recent players in the game, Ansible and Salt created some buzz as they took what was good from Puppet/Chef and extended that for a Remote Execution Engine. In a scenario that requires you not only to maintain reliable configurations but also ad hoc troubleshooting and fast access to remote machines, this comes in more than handy and seemed right away perfectly suited for our environment. For reasons I can’t fully remember anymore I ignored Ansible at the time entirely, spun up a Salt-Master and pushed the first Salt-Minions to my testing OU (GPO + script = wohoo). After the promising testing phase, I pushed it to all Clients in the Active Directory– even our (ever increasing) fleet of Macs are now Minions.

Right now Salt serves us in two ways:

Leveraging Salt States, we push custom applications to people and apply base sets of software to certain departments on login. We also set up a couple neat helpers such as an ELK stack or prototypes of potential tools.
The real juice, however, lies in the Remote Execution Engine. Troubleshooting in a Windows world usually involves the GUI, and while it is nice to stay in touch with your colleagues by simply swinging by their desks, sometimes you just don’t want to/can’t afford the time. Powershell is neat for automating repetitive troubleshooting, and the Salt-Master serves as the central storage for the scripts. When necessary, we simply use Salt to run scripts on client machines to fix problems of all sorts. Installing fonts (and every Windows admin can tell you about the pain of doing so remotely!), installing/uninstalling Office or other applications. Even fine grained configurational changes can be either achieved out of the box using one of the (great) modules of Salt or by writing a script in Powershell.

The usage of Salt is not yet as extensive as it could be– that has to be admitted. The possibilities are there, and we look to migrate step-by-step from GPOs to Salt states where possible.

Lesson 2: Agile, SCRUM and Kanban preserve your sanity.

Agile principles and SCRUM are by their origins nothing SysAdmins, especially working on help desk or infrastructure tasks, would naturally see as something tailored for their daily work. You would see the developers at that fancy startup across the street sticking Post-it after Post-it to their walls and windows and keep wondering what the hell they are doing.

Well, not for us. The nature of the job dictates the distinction between projects and help desk. This is fairly easy– project related work screams to be tracked and managed using SCRUM (our awesome analog corner board pictured below) whereas a help desk is begged to be handled in a (digital) Kanban board.

For both we use JIRA. Emails from our colleagues requesting help will be converted automatically to tickets in our help desk Kanban and all projects are handled using some parts of the SCRUM tool palette. To be honest with you, our Agile coach told me multiple times already that we created some mutated and only remotely related offspring of SCRUM, but for us it still works and has had a couple of positive effects for us:

The split solves the problem of justifying your attention. Should you handle help desk tickets first? Or is project work more important? Help desk tickets can be now tracked using a SLA and your project work is prioritized so you should always be in the clear which fire to extinguish first.
Transparency and awe for users. Walk-ins are often baffled by the projects currently displayed on our physical board and show appreciation.
Staying in sync with the business. A sprint planning every two weeks helps the product owner and other stakeholders to get their projects reprioritized when needed and still allow the bigger picture to be kept in mind.
Getting rid of hazardous processes. Even though the help desk runs as Kanban, the periodic retrospectives are used to reflect on the last two weeks.

As a final note I want to put a strong emphasis on something often simply forgotten by a lot of people working in the industry. As a SysAdmin your colleagues are a constant source of work. They might not understand networks (at all), probably bitch about the WiFi and crash your file server. However, don’t doubt their competence in their field of expertise – learn best practices and processes from your surrounding departments and evaluate them. Chances are the silver bullet is still for you to forge, but they might give you all the resources you will need.

Lesson 1: Configuration management is not exclusively for servers.

Lesson 2: Agile, SCRUM and Kanban preserve your sanity.

Leave A Reply Cancel Reply