It was 01:30. About to go to bed. Next day would fly with my daughter to holidays.
I checked the apps of a bunch of clients. None of them loaded. I was like what...
I checked the server. Everything down.
I'd been running Kubernetes on Digital Ocean. And Digital Ocean forced a Kubernetes update that was incompatitable with mine at night.
Took me 8 hours to fix it. No sleep. Ended up moving it back to a good old VPS. And throw away K8s.
Now to be fair, I had been getting warnings with deployments. But I was used to that, Kubernetes has 10 updates per week. I dont have time to update K8S or my helm files every week.
So yeah it was my fault, but I was used to good old VPS hosting. There is an old php application I've build 5 years ago with Laravel, that never needs anything. I did some updates and patches, but it always just works.
Im used to running node on apache or nginx, even though a bit less stable, still almost never crashes.
Kubernetes there is always something. I guess there are reasons to choose it, but it's not stability.
I ended up taking the plane, and my daughter was super kind and patient. But no more kubernetes for me.
My experience so far is that k8s itself is relatively stable. It's when you start using vendor specific addons/plugins to actually do stuff like provision PVs, modify Citrix LB settings, etc outside of k8s everything quickly becomes a burning tire fire. :-/
Something is always the problem. With a VPS, you don’t usually have to worry about DO being a problem. You pay your bills, your server stays up. Maybe if you don’t update you might get hacked, but killed by provider update is not a failure mode of VPSs. I use k3s on bare metal. I’ve tried to run k8s so many times before and they all failed at some point in ways that didn’t inspire confidence. Mostly etcd became a mess, so this time I’m running k3s with a Postgres datastore that I can actually look at, reason about, backup and independently make HA, with a plus of not having to run arcane commands to manage the cluster that runs my cluster.
I once tried nomad because it claimed to be simpler. I stopped when I realized I’d also have to use multiple clusters to run a cluster (nomad+consul).
I use k3s because I build apps for multiple clients and I like the flexibility that it gives me in running servers like cattle.
My point is that these orchestrators need to start decreasing the enormous amount of risk they add to any project.
There isn't a single managed vendor of Kubernetes that won't force an upgrade (or shut down your entire cluster if you don't upgrade). It is too costly for these companies to maintain so many old versions of Kubernetes that are not supported.
If you want a Kubernetes cluster that never gets forcibly updated or shut down, you should not be using a managed service, period.
I may have not expressed it well but what I wanted to say was that GKE communicates very clearly when you will be hit with a forced non-patch upgrade, not that upgrades are undesirable. Running a cluster without updates is a deathwish.
I hope you're right, I haven't had time to look at the same warning for 6 months now. There was a hard deadline on it in February and it came and went. The backwards incompatible Kubernetes api updates are a PITA. I know they said "beta", but if you literally have a version field in your manifest format, why ever deprecate a way of configuring something? They could just have operators translate the configs automatically on the fly.
>They could just have operators translate the configs automatically on the fly
How do they test that across the infinite number of permutations of configurations and deployments of K8S in the field though? It'll work for some people on the happy path, but it's really hard to maintain over time. Worse, it'll break randomly at some point in the future that is hard to predict, instead of at some publicly announced point in time where the breaking change is deployed (how it happened this time).
I don't know any software library that introduces breaking changes as often as kubernetes does. Devops used to be leaning more to stability then devs, it's weird. Kubernetes devops are very expensive so I guess they love being able to spend more time on their tool
In my experience, helm is the worst offender. Somehow, every second chart update has a breaking change preventing the update, defaults that don't mind wiping the persistent volume and a discontinued somewhat consistent 'stable' central repo make me seriously regret using helm charts for anything that is not ephemeral.
Yep. My problem with a managed cloud is that kubernetes drivers are vendor specific. I hit this landmine too when the ingress I was using got a breaking update (as far as I could tell).
Still worked on my test cluster. It was a throw away site, so I didn’t even bother fixing it. And, I honestly can’t tell you what went wrong. Just deploying docker compose, often with ansible, is the most reliable for me at most scales.
Yeah, that's the annoying part. You get around 80% of parts that just work anywhere and rest is entirely dependant on what's underneath k8s or on what cloud it is running.
When going in I assumed there are some kind of standard way to implement ingresses/loadbalancers but no, it's just different plugins each with different syntax and features.
I feel your pain and i would lie if a k8s update never caught me off guard, but there are still policies you can set for the autoupdate of k3s or trigger them manually at will.
have a look at k3s and maybe you will like kubernetes more again. There is no magic to it. Have a faulty node? spin up a new one.
And if your hoster is kind enough, there will even be apis with cloudinit for you to do that.
Im not trying to imply that you need a managed k3s with this post, but rather trying to show how easy kubernetes can be if you leave the big clouds and try not to overcomplicate things.
In late 2021 we had EKS nodes shit the bed for no apparent reason. Max out CPU and couldn't SSH in. They'd get recreated eventually by EC2 (multiple hours) despite being tainted in k8s pretty quickly. We ran with N+2 nodes to deal with it.
After dutifully updating EKS releases and AMIs it seems to have fixed itself last year.
EKS seems to be fine generally, but it would be nice if they didn't release new AMIs with missing commits?? [1] and especially [2]
well, I'm the only one with access to the cluster and I didn't do it, with Lens IDE I barely check the Nodes tab, I checked the Nodes because I noticed no app pods were scheduling in such node and I found it cordoned.
there is a good talk from kelsey hightower on when it makes sense to ue k8s and the number was like 20 servers. i think with what you are describing i would not even think about using k8s.
I tried k3s several times in the past few years but I still can't use it in my homelab:
1. the project claims to be production ready and support HA control plane setup, but there's no solution for API load balancing out of box. How do you bring up a new node(either control plane or worker node)? You write down the join token produced by the first control plane node, and hardcode the token and the existing control plane's IP in the new node's systemd unit file. Btw, if you use the official installation script, that file is going to have permission 755 and everyone on the server can just read that token.
2. And how do you bring up the first control plane anyways? The official instruction is to `curl` a bash script and pipe into a shell. You can probably translate that script into some ansible playbook, but the whole running-a-bootstrap-script-and-passing-along-secrets approach make the whole process difficult to be converted into some something that's supposed to be idempotent.
All the problems can be worked around, in fact I was half way there, but then I suddenly started thinking: "didn't I choose k3s because I thought it was easy?"
I've gone through this a few times recently and have it in my homelab and at the office. What works good for me using kube-vip to a VIP on the control plane, and then MetalLB to dish out private addresses in the respective networks, or even statically assigned addresses. I have been turning them all up with k3sup which works like a charm.
turn up the first node, install kube-vip, switch config to point to the vip, turn up all my other master nodes, then turn up my workers, install metallb, setup my subnet, install rancher, expose it with a LB, install longhorn. then start deploying things. here is an example of what i use to turn up the first one with k3sup. all of the servers are turned up and configured with ansible doing minimal updates, users, sudo access, etc..
Thank you for sharing this and saving me time in exploring k3, it's shocking how common it is that an evangelized tool is impractical to setup and use in even a simple homelab configuration.
Knowing what NOT to investigate because it isn't "ready" can be one of the biggest time sucks.
Truth be told yes that tricky but can be managed with ansible easily for instance.
1) Your main problem though would probably be the need for a haproxy or bgp which does load balancing for you. There are other solutions like kube-vip but they are more a "failover" solution that HA. Which would be fine for a homelab and is for instance how Rancher Harvester (kubernetes for virtual machines) does it.
2) you have to pass a parameter called --cluster-init for the first node and then join the other nodes. once the cluster is running you dont need any node with that parameter anymore and its common practise to create the first node wiuth --cluster-init then join 3 other ones and take down the first node
And on a personal node, you sound like you would be happy with rancher harvester. check it out its bascially turnkey
Yup, when I said that I was half way there, I was trying to use ansible to deploy a nginx systemd unit on every node to load-balancing requests to API servers. Many other distributions can do this with a static pod managed by local kubelet, but that's not possible in k3s.
I know there are solutions to my problem, but I can either implement them by myself, or like you mentioned, I can just use a different distribution and not having these problems.
1. My homelab has a lot of crappy hardware with zero enterprise support. And when a server fails, I need to be able to easily swap it out. And I don't want to make it a full time job for myself to maintain it.
2. I did knock some control plane nodes down a few times. It was rare, but still not fun.
3. Also, because why not? HA control plane is not particularly challenging these days. I know exactly how to do it. I just don't have the bandwidth to learn how to do it in k3s, especially when it comes for free in some other distributions.
Been using k3s in production for nearly four years now and would recommend it to anyone. Super lightweight and easy to deploy. Opinionated about stuff I don't care about while allowing for customization of network stack, backing database, and ingress controller if you want to do it yourselves. Their embedded etcd is way, way easier to set up than a custom etcd distribution.
I have built clusters for clients with k3s too and it has always been a charm.
More recently i was able to bring up a Dualstack cluster for a VOIP company that now can basically scale endlessly with gitops on top.
The fact that you can integrate existing Sysadmin Teams because they will understand that a program that runs a service with a binary and a config is all it takes, is worth its weight in gold.
They know their Loadbalancers and haproxies as well as how to provision true raid systems that are not software based which almost makes disk failure go away and maintenance really sheduleable.
In addition to K3s, I've used managed K8s, custom-rolled K8s, as well as various other K8s distributions. K3s has - by far - provided the least friction for most of my use cases, and is what I incorporate in any initial cloud design.
Of course, other stakeholders and constraints may eventually mean that we adopt something else before it gets implemented, but K3s is what I start with for many of the same reasons outlined in this article.
I'm a big fan of K3s, however managed Kubernetes from a large cloud vendor with a track record has a lot to offer when it comes to reducing management and the need for an SRE for K8s itself.
Folks might also be interested in two free resources:
1 - K3sup https://github.com/alexellis/k3sup - the author mentions HA K3s - K3sup is an easy way to get that using SSH. It's also a good pairing for K3s with Raspberry Pi
2 - Kubernetes at the Edge with K3s (CNCF / LF course) - I was commissioned to write this and I talk a lot about the differences and also the origin story of K3s and what Darren was aiming for.
Have fun with Kubernetes - whichever flavour you go for.
IMO K3s (and distros like it) are the future of self managed Kubernetes. The same way Linux distributions brought simplification and sane, opinionated defaults to Linux in an era where compiling your own kernel and throwing user space together, K3s does the same for vanilla Kubernetes. It’s a joy to deploy and manage over vanilla.
Title is misleading. k3s is a deployment stack/distribution that builds off various Kubernetes modules. It must pass a certain test suite to conform to Kubernetes standards.
What you might be trying to compare is kubeadm which is the official deployment stack provided by Kubernetes.
im not trying to compare it with kubeadm (which is a more a setup script https://kubernetes.io/docs/reference/setup-tools/kubeadm/ ) but with the fact that vanilla kubernetes comes with moving parts that have to be configured and maintained and also updated separately.
you can actually setup "kubernetes" which is often referred to as vanilla kubernetes without it too. See "Kubernetes the hard way" by kelsey hightower.
Is there any advantage of running k3s if you want to keep etcd? I understand that most k3s performance gains come from etcd being replaced by sqlite but if you still want a HA control plane, sqlite won't cut it.
We've been using k3s' embedded etcd for as long as its existed, and it's great. Setting up the etcd cluster is dramatically simplified; let the first node generate a token and feed it to all the other nodes. Tons of other advantages to k3s; the single-binary deploy process, the built-in networking stack (which you can secure with Wireguard out-of-the-box), built-in ingress controller if you want one.
you can easily still use external etcd if you really need to.
But in general k3s can be HA without issue and scaled just as well as vanilla k8s.
The main advantage of it is that everything comes neatly packed into a single binary whereas the alternative would mean to have a multitude of services running for cluster provisioning.
Kubernetes in the end is basically an API server with multiple componets and k3s puts a nice bow around all of them.
I always chuckle to myself when things like k3s, microk8s and so on claim to be "lightweight" kubernetes. Lightweight compared to what exactly? Because pure, upstream, vanilla kubernetes (kubeadm) is the lightest possible, it doesn't come with a CNI, ingress, or any of the additional stuff these distros do. Additionally, why make your life harder by adding an additional layer on top of kubernetes. For troubleshooting then you get to track down if this is a bug in your distro, or actually kubernetes itself. Just run the real thing.
When kubernetes-lens came out it was the greatest software to see what is going on in a clusters.
I have suggested it to basically everyone working with k8s saying its a must have tool and also made many teams install it.
It was a great time for half a year and a lot of the community contributed to it, only to have them turn around and force accounts on all users: https://github.com/lensapp/lens/issues/5444
this turned into an application that now comes with a signed hidden binary, could not be build through the open source repository any more and much ohter things related to this.
All in all Mirantis is not to be trusted, not to mention the fact that the tool wants to share your kube-config amongs your team which is a clear antipattern and a high security risk.
Been using k3s(kube-vip + MetalLB) and Longhorn for both my personal projects and work projects for a while without any issues. It's a pleasure to use instead of the alternatives.
you might be fooled by the meaning of the word „distribution“ in the kubernetes vs. linux context.
in the linux context, a distribution would be an opinionated build of user-land tools around the linux kernel which might also bring patches done to the kernel by the distributor.
in the (cloud) kubernetes context, a distribution would be an opinionated (cloud) implementation of core kubernetes binaries (EKS/AKS/etc.pp.) on top of a linux-distribution of your liking.
to my humble understanding k3s is a stripped down and optimized (IMHO opinionated) build of kubernetes binaries derived from the official source.
kubeadm is the official projects (opinionated?) idea of arriving at a vanilla cluster, leaving you with the freedom to make choices for lots of the needed components involved.
in the linux context, „linux from scratch“ would be an analogy I suppose.
more knowledgeable people should please correct me if I‘m wrong in my understanding here.
Well, for me the whole replace-letters-with-numbers thing is idiotic and requires some inside knowledge to understand which letters are replaced by all the numbers in words like a11y i19n k8s, etc.
On the bright side you can put whatever you want there. Personally I read k3s as "kerts", because why not. Similar to "thirds" but with the "t" from kubernetes.
We wanted an installation of Kubernetes that was half the size in terms of memory footprint. Kubernetes is a 10-letter word stylized as K8s. So something half as big as Kubernetes would be a 5-letter word stylized as K3s. There is no long form of K3s and no official pronunciation."
It is a pretty decent name since you can easily search for information about it, unlike say “kind” (which I typically use for development) which is absolutely un-googleable
Kinda off topic but what’s the actual word for k3s? We have Kubernetes k8s, addreesen horowitz a16z, internationalisation i18n, founders f6s. What is k3s?
We wanted an installation of Kubernetes that was half the size in terms of memory footprint. Kubernetes is a 10-letter word stylized as K8s. So something half as big as Kubernetes would be a 5-letter word stylized as K3s. There is no long form of K3s and no official pronunciation.
I checked the apps of a bunch of clients. None of them loaded. I was like what...
I checked the server. Everything down.
I'd been running Kubernetes on Digital Ocean. And Digital Ocean forced a Kubernetes update that was incompatitable with mine at night.
Took me 8 hours to fix it. No sleep. Ended up moving it back to a good old VPS. And throw away K8s.
Now to be fair, I had been getting warnings with deployments. But I was used to that, Kubernetes has 10 updates per week. I dont have time to update K8S or my helm files every week.
So yeah it was my fault, but I was used to good old VPS hosting. There is an old php application I've build 5 years ago with Laravel, that never needs anything. I did some updates and patches, but it always just works.
Im used to running node on apache or nginx, even though a bit less stable, still almost never crashes.
Kubernetes there is always something. I guess there are reasons to choose it, but it's not stability.
I ended up taking the plane, and my daughter was super kind and patient. But no more kubernetes for me.