One of the great things about Kubernetes is that it abstracts away the underlying compute so that we only have to worry about our application workload running on the cluster. But there might be a chance that you need to connect directly to the underlying nodes in your Kubernetes cluster. If you manage your own cluster, that’s most likely as easy as just SSH’ing into your nodes. But when working with a managed cluster, such as Azure Kubernetes Service (AKS), it could be more complicated than that.
The Microsoft docs illustrate how to SSH into AKS nodes. One glance at that thorough documentation and you’ll realize that this is not a trivial operation. In fact, it’s quite long and tedious. Imagine having to do that more than once…
So instead of doing this manually, I decided to write a script to automate this process: az-aks-ssh (GitHub).
Note: this script is current in alpha and should not be run in a production environment.
Note 2: this currently only supports virtual machine scale set agent node pools.
The basic design of this process (whether manual or automated with the script) is that you create a pod in the AKS cluster and then kubectl exec
into the pod, and then SSH into the desired agent node. I added a few “features” to this script. One of those is unique SSH key generation and usage by node. It’s not a good idea to reuse SSH keys for multiple hosts/purposes, so this script takes care of that.
Here is a visual illustration on how this process works:
The usage of the script can be displayed by running it with no parameters:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
$ ./az-aks-ssh.sh
Usage:
SSH into an AKS agent node (pass in -c to run a single command
or omit for an interactive session):
./az-aks-ssh.sh \
-g|--resource-group <resource_group> \
-n|--cluster-name <cluster> \
-d|--node-name <node_name|any> \
[-c|--command <command>] \
[-o|--output-file <file>]
Delete all locally generated SSH keys (~/.ssh/az_aks_*):
./az-aks-ssh.sh --clear-local-ssh-keys
Delete the SSH proxy pod:
./az-aks-ssh.sh --delete-ssh-pod
Cleanup SSH (delete SSH proxy pod and remove all keys):
./az-aks-ssh.sh --cleanup
You have the ability to run a non-interactive command into the AKS cluster node:
1
2
3
4
5
$ ./az-aks-ssh.sh \
--resource-group thstringaks1 \
--cluster-name thstringaks1 \
--node-name any \
--command "hostname"
You’ll see similar output (as you can see below, this script is verbose by design so the user can see the process as it happens):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Selected 'any' node name, getting the first node
Using node: aks-nodepool1-36584864-vmss000000
Found VMSS(es):
aks-nodepool1-36584864-vmss
aks-nodepool2-36584864-vmss
Found aks-nodepool1-36584864-vmss000000 in aks-nodepool1-36584864-vmss
Key doesn't exist. Creating new key: /home/trstringer/.ssh/aks_ssh_aks-nodepool1-36584864-vmss000000
Instance ID is 0
Access extension does not exist or new key generated, adding to VM
Instance IP is 10.240.0.4
Error from server (NotFound): pods "aks-ssh-session" not found
Proxy pod doesn't exist, setting it up
pod/aks-ssh-session created
Waiting for proxy pod to be in a Running state
Waiting for proxy pod to be in a Running state
... apt output removed for brevity ...
Running command non-interactively
Warning: Permanently added '10.240.0.4' (ECDSA) to the list of known hosts.
Authorized uses only. All activity may be monitored and reported.
aks-nodepool1-36584864-vmss000000
You can see that the last line of the output is the command result (hostname
in this case).
Likewise, you can run an interactive session by omitting the --command
parameter:
1
2
3
4
$ ./az-aks-ssh.sh \
--resource-group thstringaks1 \
--cluster-name thstringaks1 \
--node-name any
You will then be in an interactive SSH session with the AKS node:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Selected 'any' node name, getting the first node
Using node: aks-nodepool1-36584864-vmss000000
Found VMSS(es):
aks-nodepool1-36584864-vmss
aks-nodepool2-36584864-vmss
Found aks-nodepool1-36584864-vmss000000 in aks-nodepool1-36584864-vmss
Instance ID is 0
Access extension already exists
Instance IP is 10.240.0.4
NAME READY STATUS RESTARTS AGE
aks-ssh-session 1/1 Running 0 2m58s
... message of the day output removed for brevity ...
No command passed, running in interactive mode
Last login: Sun Apr 18 18:48:00 2021 from 10.240.0.7
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.
azureuser@aks-nodepool1-36584864-vmss000000:~$
Now you can see that I am in an SSH session with this agent node. Like always, run exit
to leave the session.
One final note is that there are elements put in place to allow this process. Namely these are the generated SSH keys for connection to the nodes, as well as the aks-ssh-session
proxy pod that is used. You can remove these items by passing --cleanup
to the script:
1
2
3
4
$ ./az-aks-ssh --cleanup
Clearing local keys
Deleting SSH pod aks-ssh-session
pod "aks-ssh-session" deleted
Hopefully this script can make the task of SSH’ing into AKS nodes much simpler and automated for you as well!