Managing and Troubleshooting AWS EKS Access
let me tell you guys this history before starting with the boring technical stuff.
I’ve been working for a couple of months ago on a project that has a lot weird integrations and very little or nothing of infrastructure management automation and governance, the team and I arrive with all this and unfortunately did not had the pleasure to meet the former platform team to ask some question about this (awesome-crazy-weird) Kubernetes implementation.
Having said that, I was there watching at my monitor thinking about how to automate some tasks with GithubActions and I end up performing some manual modifications, a couple of minutes later we started to receive a lot of notifications from our slack channels, this turned on the alarms in that moment I realize that I broke the access to the whole system, even me with my administrator access, and this is how I leave approximately one hundred of angry devs blocked with their deployments on the dev environment.
IAM user and role access to your cluster
Now let’s talk about how AWS handle the access to your cluster using AWS IAM entities, basically they use AWS IAM Authenticator for Kubernetes which read the configuration from a ConfigMap aws-auth this one can be found in the kube-system namespace, this configmap should look like this at the very begining.
And you can map additional users or roles to the system master or any other role into the cluster, all what you need is follow the step by step documentation
I applied a malformed configmap
If you apply a malformed configmap automatically all the roles and users defined will lost access to the cluster an you will get the following error while trying to access the K8s API with kubectl or any other way.
Unauthorized or access denied
What can I do?
Ok, do not panic, by default the user or IAM role that you used to create the Cluster is automatically granted system:masters permissions in the cluster’s role-based access control (RBAC) configuration in the Amazon EKS control plane.
So, what you can do is perform an STS:AssumeRole with this Role or if it was an IAM user you can just create some programatic access and log in the cluster and fix the configMap.
sounds easy, right? until this point yes it should not be that messy since you can fix the access so fast.
I don’t know what is the IAM entity which was used to create the cluster.
Ok this is exactly what happen to me and it was scary
ok let’s read this Paragraph from the official AWS documentation.
This IAM entity doesn’t appear in any visible configuration, so make sure to keep track of which IAM entity originally created the cluster.
OMG this is not good, you just close the door and leave the keys inside.
What can I do?
ok, cross your fingers, I hope you have enabled the loging on your control plane, because AWS cloudwatch is going to save your soul.
you can execute the following query to search the Kubernetes-admin user or role in the logs and you will be able to the see the creator on the query results.
fields @logStream, @timestamp, @message | sort @timestamp desc | filter @logStream like /authenticator/ | filter @message like "username=kubernetes-admin" | limit 50
now that you know which role was used, you can just assume that role and fix the mess.
Here you can find a lot of useful queries to run on your control plane logs.
https://aws.amazon.com/premiumsupport/knowledge-center/eks-get-control-plane-logs/
I don’t have the logging enabled, what can I do?
Hope you have a support plan with AWS, let’s create a ticket with them and wait for some solution.
Originally published at https://www.linkedin.com.