Tuesday 6 June 2023, 22:30PM

Kubernetes as a Platform

Since its inception, Kubernetes has been known and thought of as being a "container orchestration platform". Whilst this is still undoubtedly true, its also not the full picture anymore - Kubernetes can be used for so much more.

Since its inception, Kubernetes has been known and thought of as being a "container orchestration platform". Whilst this is still undoubtedly true, it's also not the full picture anymore. In fact, the direction that Kubernetes is heading towards can now be thought of as a full-fat, highly configurable infrastructure platform.

I previously wrote about my experiences diving into Kubernetes for the first time, and the various pros and cons that I encountered. Everything I opinion I had there still holds true, however after some further experimentation and tinkering, I’m even more inclined to go all in on Kube now - not just as a platform for running applications, but for running everything one would need in a cloud environment.

Managed Kubernetes Services that give you everything

In the early days of Kubernetes, you had to manage everything yourself. Install kubeadm on some bare-metal machines or EC2 instances, and upgrade control plane nodes, all whilst trying to minimize downtime and interruptions - you get the idea. You needed not just a good understanding of how to use Kubernetes, but also how it worked under the hood to administrate and troubleshoot a cluster.

These days, the story is very different. Every major cloud provider now has a "managed" Kubernetes service that handles almost everything for you - from control plane management, worker node provisioning, and version upgrades.

If you don't want to go with managed cloud solutions, various alternative "distributions" to kubeadm have also cropped up, i.e. Microk8s, k3s, Rancher. These distros have massively simplified installation and management processes if you wanted to run on baremetal or your own self-managed cluster, often allowing you to get a fresh node up and running within a few minutes.

Sure there are still some complicated bits, but it's nothing like it used to be - and over time, these complexities are only going to get easier and more abstracted away from end users. I’ll be talking mostly in terms of running inside a managed cloud environment for the rest of the article, but what I say should also apply with on-prem installations with a few extra steps.

The beginning

First, you need to run a stateless application - say, a super basic “Hello World” HTTP server. You set up a deployment and a service. You then expose your service via an Ingress. Your Ingress of course needs a LoadBalancer to make it publicly accessible, so you provision your favorite one as well (e.g. NGINX or Traefik). Bam, your cloud provider has detected this and has automatically created a cloud load balancer for your Kubernetes load balancer as well.

Then, you realize you need to attach a custom domain to this endpoint. Traditionally, you’d go into the DNS provider of your choice, and manually attach an A or CNAME record to map the custom domain. But wait! If you have multiple environments/domains, or like to spin new environments up and down on demand, doing this manually isn’t a very scalable method.

There is where our first operator, External DNS comes in. External DNS does the following:

It can interface with, and control external DNS providers (i.e. Route 53, Godaddy, etc).
It can watch your internal Kube resources such as ingresses to decide which domains should map to which load balancers, and automatically manage DNS records for you.

Databases and Storage

Having some stateless containerized applications and automatic DNS management is great, but that’s not going to be enough for your next billion-dollar startup. You need persistent storage, such as databases or object storage.

In the early days, the consensus was that you shouldn’t put anything stateful inside Kube at all, because Kubernetes itself doesn’t have any mechanism of persistent storage or backups. For databases, you’d be advised to spin up a cloud-managed database like RDS or Cloud SQL, whilst if you wanted an object store you’d gear towards S3 or equivalent.

Whilst Kubernetes itself hasn’t necessarily improved its built-in storage management capabilities (well, it doesn’t really have any) - the general knowledge and understanding around handling storage via Kubernetes has improved drastically, and so have the tools and plugins in the ecosystem. Managed Kube services also have improved in terms of automatic provisioning of storage volumes (i.e. EFS/EBS on AWS). This video summarizes pretty well what’s changed in the past few years in this space.

Take PostgreSQL for an example - we can use Cloud Native PG operator to provision high-availability, multi-replica PostgreSQL installation within a cluster. All that needs to be done is to configure what PersistentVolumeClaims/StorageClasses to use, and where to back up data. I won’t go into too much specific detail about different operators, but you can also do things with CloudNativePG that may not be possible out of the box with cloud-managed databases like RDS - for example, setting up an entire replica cluster of any other arbitrarily-hosted Postgres database.

Of course, Postgres isn’t the only option here - whatever your database needs, there’s probably a production-ready operator or Helm chart for it:

Redis
MySQL has an official operator
MongoDB of course also has one

Well, what about object storage? Turns out there’s also MinIO - an open source, S3-compatible object storage system that - you guessed it - can be self-hosted inside Kube.

Message Queues and Streams

You can also rejoice with your message queue and pub/sub needs, as the following are also available:

RabbitMQ Cluster Operator has an official Kubernetes operator and Helm chart
Kafka has plenty of choices for operators
Apache ActiveMQ

Observability and Logging

For observability, few people have not heard of the holy Prometheus and Grafana combo. Whilst that stack is sometimes known for being difficult to configure and setup, it's actually surprisingly easy to do so inside of Kubernetes using this Helm chart, which bootstraps a significant amount of work for you. For logging, Loki integrates well with that stack and (obviously) also has a Helm chart.

On the other end, ElasticSearch and Kibana are also available together with this Helm chart.

If you prefer to use an external managed service, Dynatrace and DataDog both have operators installable via Helm as well.

Honorary mention - data pipeline tools

I’m not a data engineer, so my knowledge of data ETL/pipeline tooling is pretty sparse. That said, a quick Google search yielded that the following tools can also be self-hosted in Kubernetes

There's plenty more, but I won't dive too deep as I'm not too familiar with these.

Networking and Service Meshes

In large productionised clusters, you may want to use service meshes to facilitate and monitor service-to-service communication. Fortunately, these tools have been around for years in the Kubernetes world:

If you want to know more about service meshes, Istio has a pretty good explanation available here.

Infrastructure Access

So, now you’ve got a number of clusters set up. How are your engineers going to access these clusters, and how are you going to manage their permissions. The Kubernetes-native way is to setup signed certificates and RBAC roles per user, per cluster, or you can use your cloud platform to generate access kubeconfigs. Neither of these is particularly automatic or idiosyncratic, but that’s where “Infrastructure Access Platform”s come in. IAPs basically aim to simplify access control over a number of different pieces of infrastructure (i.e. Kube clusters, or SSH servers), and usually via accessible methods like SSO. There are a number of IAPs out there, some of which you can self-host inside your own cluster:

I use Teleport, as it’s open source yet has really good support for self-hosting, and can also manage non-Kubernetes infrastructure like Postgres databases as well as having a bunch of other features like auditing and a nice UI for management as well as a Kubernetes operator.

CI/CD

By this point, you’ve already got at least one cluster running, with quite a few resources. Do you know what would save money? Hosting your CI/CD pipelines within one of your clusters, on hardware that you’re probably already paying for! Once again, our options are plenty more:

GitHub Actions are free for self-hosted runners, which you can host with the Actions Runner Controller chart.
GitLab is entirely self-hostable, and that of course comes with GitLab CI/CD
Unsurprisingly, Jenkins can run in a self-hosted cluster too
CircleCI runners can be self-hosted
Tekton and Concourse are other good examples of up-and-coming Kubernetes-native CI solutions
A quick Google search yields a large number of other options you may want to explore.

In addition to “traditional” CI/CD platforms, it’s also worth mentioning the Kubernetes-specific CI platforms ArgoCD and Flux (which are of course, self-hostable).

If you use Terraform, you might also want to think about using Atlantis as well.

Version Control

By this point, since you’re self-hosting so much already, why not throw in Git platforms?

GitLab, as previously mentioned
Gitea
GitBucket doesn't seem to have a Helm chart, but it does have a docker image that you can probably create a deployment out of.

Operators and CRDs

I’ve used the word “operator” quite a bit throughout this article. If you already know what a Kubernetes operator is, then you can skip this bit. Otherwise, this might be the most important takeaway from this article for you. The best definition of Kubernetes operators I've seen is from Red Hat:

A Kubernetes operator is an application-specific controller that extends the functionality of the Kubernetes API to create, configure, and manage instances of complex applications on behalf of a Kubernetes user. It builds upon the basic Kubernetes resource and controller concepts, but includes domain or application-specific knowledge to automate the entire life cycle of the software it manages.

In other words, operators allow you to define the application or resource that you want, instead of the usual lower-level building blocks (i.e. deployments, services). The operator will then work behind the scenes to make sure that the specified application or resource exists. The key bit - operators can operate on existing Kube primitives, but are not limited to them - meaning that they can manipulate any resource outside of Kubernetes as well! That's exactly how External DNS works, and is incredibly powerful - as we'll see in the next section.

Crossplane

Say you don’t want to put everything inside a cluster, or have vendor-specific services like SQS that you can’t afford to migrate away from. By this point, almost all of your infrastructure is in Kubernetes already, so having some configuration outside of Kubernetes might be a little annoying at this point. Worry not - because with the power of Kubernetes operators, we can manage resources external of Kubernetes from inside Kubernetes!

Crossplane is a tool that creates CRDs that allow you to manage vendor-specific cloud services. More importantly, it also allows you to create an abstraction layer for those who need specific, pre-defined setups inside your company or team. For example, if you want to allow your software engineers to provision new stacks that consist of several vendor-specific services that interact in a certain way, you can define a CRDs that act as templates for this, so that they don't even have to touch the AWS or GCP console, or even write any vendor-specific configs.

What's holding it back?

Good, reusable and sane IaaC

This isn’t a Kube-specific problem, but in my opinion, infrastructure-as-code tooling could be a lot better than it is currently. Yaml-based manifests are great for versatility and quick testing, having no type-checking or validity guarantees at static time in 2023 feels a little outdated. Raw yaml also offers no way to share or re-use constructs. Terraform HCL offers an improved experience in some ways, but 1.) can still feel clunky in other ways and 2.) the Kube community seems to have a conceptual dislike of using Terraform for Kube.

Helm and Kustomize do help on the reusability aspect but does not solve the typechecking/static-time validity issues. Furthermore - Helm charts are awkward as hell to write, with weird {{ .Values.something.someOtherThing }} syntax everywhere in yaml. There’s a great discussion here where someone mentions the fact that Helm charts were designed this way because of the pre-existing templating tooling the Go standard lib, rather than it being a user-focused design choice.

Cloud Development Kits, like AWS CDK, Terraform CDK and CDK8s have grown rapidly in popularity over the last few years and enables infrastructure to be described in a programming language of your choice i.e. TypeScript or Go. This provides numerous advantages, such as:

Static type checking on infrastructure constructs
Easier code reuse and templating options.

However, CDKs are still very young and still suffer from early-stage issues:

Each only has one target, i.e. Terraform, AWS, or raw Kubernetes manifests, and there’s currently no good way to use them together without having to write multiple scripts.
A plethora of bugs and unclear documentation
No good way to pull in CRDs as custom constructs. (Note - CDK8s is an exception to this, but is still very awkward to do this and requires some manual specifying of CRD templates, which is a huge pain if an operator has declared multiple different CRDs in different files).

All-in-one CI/CD platform.

As previously mentioned, many CI/CD tools can work with and be deployed onto Kube. However, none of them can handle everything that is needed in a real environment. Common setups usually involve using one platform for building and testing applications (CI, i.e. Jenkins) and another for handling deployments or infrastructure changes (CD, i.e. ArgoCD, Flux or Terraform Cloud). Whilst I don’t disagree with the rise of GitOps processes that prioritize reproducibility and snapshotting/rollbacks by separating CI from CD, I do think it is a shame that no single tool can handle both together in a GitOps manner (think, if you could run GitHub actions in Argo!). GitOps itself isn’t a particularly complicated idea, but looking at this video trying to explain the tools and processes required, it certainly does feel like a heavier commitment than it should be.

This is definitely neither a show stopper nor a big deal in the grand scheme of things, and many of you reading this have already successfully overcome these obstacles before. I just think it would be a nice bonus that makes it easier for people to adopt GitOps and Kube together as a platform.

Secrets Management

One thing Kube isn’t great at yet is secrets management. To be clear - Kube does offer Secrets in its native API. However, the secrets aren’t encrypted by default. You can set up encryption at rest in Kube, but it’s not entirely a straightforward/managed solution like HashiCorp Vault or AWS Secrets Manager. Secure secrets management is, generally speaking a pretty difficult beast.

That said, the ecosystem does have a number of great tooling around secrets:

Random secrets allow you to generate random secrets
Sealed secrets. Sealed secrets are very secure - in fact so secure that after creation, not even cluster admins can view them afterward. This is great for security, but not always practical.
Reloader, for refreshing your pods whenever secrets are updated

TL;DR

Just self-host every damn thing you need inside Kubernetes, it's pretty easy and will also help you avoid vendor lock-in.

Enjoyed this read? Comment and support me ❤️

If you enjoy the above article, please do leave a comment! It lets me know that people out there appreciate my content, and inspires me to write more. Of course, if you really, really enjoy it and want to go the extra mile to support me, then consider sponsoring me on GitHub or buying me a coffee!

Latest Posts