Kops: Reliable and Highly Available Kubernetes clusters for fast-paced environments
By Diego Nogues
If you work with Kubernetes, or have the intention to work with it, you have likely heard about GKE, EKS, AKS, DOKS, LKE, and so on. Also, you may have heard about kubeadm, kubespray, Rancher, kops, and a ton of other options.
What do all these things have in common? All are tools, or services in the case of the first ones, to create and maintain a Production Ready Kubernetes cluster.
This article is gonna focus on one of them: kops. And why it was our choice more than three years ago at Pipefy.
What is kops?
As stated on their docs:
We like to think of it as `kubectl` for clusters.
‘kops’ helps you create, destroy, upgrade and maintain production-grade, highly available, Kubernetes clusters from the command line. AWS (Amazon Web Services) is currently officially supported, with GCE and OpenStack in beta support, VMware vSphere in alpha, and other platforms planned.
NOTE: There is also alpha support for Digital Ocean.
In other words, kops is a command-line tool that helps you to create and maintain production-grade Kubernetes clusters. It’s responsible for the whole lifecycle of your cluster, and when I say the whole lifecycle I mean it will manage from the provisioning of the required infrastructure components on the supported cloud of your choice to the rolling-update process required to complete a Kubernetes version upgrade. You can check other nice features here.
Yes, it’s a Swiss-Army knife for managing Kubernetes clusters.
Why did we choose kops at Pipefy?
By the beginning of 2018, I and my colleague of the brand new SRE team received a mission: migrate the whole infrastructure of Pipefy from a well-known PaaS to AWS + Kubernetes.
At that time, the only mature Kubernetes managed service in the market (as far as I can remember) was GKE – Google Kubernetes Engine. So all the other someletterKsomeletter abbreviations in the first sentence of the article were just dreams and plans of the other cloud providers. Or were in their earlier stages of development and were not Generally Available.
In fact, AWS EKS reached General Availability by the middle of 2018 – I can remember the service announcement at the AWS Summit in São Paulo/Brazil, the subject was so “obscure” that even the AWS Architects have had more questions than answers.
With our provider out of the game to provide us with a managed service we had to choose one of the remaining tools and do it by ourselves. Which one?
Some questions/requisites raised:
Does the tool support HA Master setup?
A good practice for Production Clusters is to have at least three masters running and would be better if they’re placed on different AZs.
Surprisingly, at that time, many tools just supported single-master setups.
Is that a community-driven project? If so, does it show signs that it will last for a long time?
Sometimes we may make bad choices and the projects just are discontinued even when we are keen to support them.
Does it manage the whole lifecycle of a Kubernetes cluster? Provisioning, resizing, upgrading, etc.
Kubernetes has many moving parts. Any kind of automation would be really appreciated.
Well, after doing our research and a few proofs-of-concept with the few tools that answered the previous and other questions we ended up with the winner: kops.
It does not mean the other tools were not good, we just saw a good fit between kops and our needs, and I have to admit that three years later kops has proven its robustness.
Pros and Cons after three years running Kops in Production
As stated previously, kops has proven its robustness. Here are some pros and cons about it.
– Pretty easy to create a Kubernetes cluster using kops.
– The project is very well documented.
– You have full control over the Kubernetes settings.
– Supports many CNI plugins (Nine at the time of writing).
– A good cadence on new releases following closer to new Kubernetes releases.
– The process of updating and upgrading both kops and Kubernetes versions simply works. It’s very reliable, even in high-traffic production environments.
– Very good terraform support.
– It lacks a real (stable) multi-cloud support.
– Even with some templating options, it’s tricky to embed kops within a CI/CD Pipeline to automate everything.
Yes, much more pros than cons after a few years of using it on a daily basis. Actually, to be fair, we had once a problem that could have caused a downtime during a Kubernetes upgrade, but that problem was not caused by kops itself.
It was related to the etcd2 to etcd3 upgrade. As this upgrade was disruptive to masters, the recommended approach was to perform the upgrade steps and roll out the changes as fast as we could, in other words, apply the changes and restart everything.
If you are looking for a very mature tool to manage a Production Grade Kubernetes cluster on AWS, that will give you full control over it and keep you confident even on complex tasks, give a chance to kops.
Featured image source: Aalianshaz