Securing Multi-Cluster ArgoCD
I'm on the final chapters of Kubernetes: An Enterprise Guide, 3rd ed. Just as in previous editions, we're finishing the book with building out a platform based on what we learned through out the book. In the first two editions, we built out a GitOps platform using ArgoCD in a single cluster. In the 3rd Ed, we're using three separate clusters: control plane, development, and production. Each tenant in our cluster will get a namespace in both development, and production and each namespace will run a virtual cluster. In previous attempts at a similar architecture, I deployed an ArgoCD with each vCluster. While this gave quite a bit of control to tenants, it makes for a lot of extra resources and is an additional system that needs to be maintained in each tenant. We were already deploying a centralized Vault and OpenUnison. It would be better to deploy a single, centralized ArgoCD which could then manage each tenant's individual vClusters remotely.

High level platform architecture from Kube: Enterprise Guide 3rd Ed
ArgoCD already has the ability to add a remote cluster for management, but there's a critical issue with how ArgoCD does this, it uses a ServiceAccount and an associated token to authenticate to the remote cluster. We've blogged before about how this is an anti-pattern. ServiceAccount tokens were never designed to be used from outside the cluster. Since 1.24, the standard has been to generate a token that has an expiration, but that means you need to have a rotation strategy and if you were to lose that token, there's no way to invalidate it. We ideally want to be able to generate an identity for our remote cluster based on an existing identity in the control plane cluster.
I wanted my control plane ArgoCD to communicate with clusters via a very short lived token. The token should be tightly scoped and the remote cluster should be able to accept the token without a pre-shared secret. I'd also like to be rotate the key used to sign the token on a regular basis without having to update the downstream cluster. After doing some digging into how ArgoCD works, it became apparent that I had all the pieces I needed, I just had to make them work together.
Part I - Cluster Management
The first component I needed was a secure way to generate a token for my remote cluster. I had already deployed OpenUnison's Namespace as a Service to the control plane cluster, and integrated my development cluster. OpenUnison needs to be able to call the remote API in the same way ArgoCD does, and we accomplish this by deploying a kube-oidc-proxy on the managed cluster that trusts the control plane OpenUnison. This way, OpenUnison can generate a token that can manage the remote cluster. The remote cluster trusts the control plane OpenUnison the same way an on-premise cluster trusts a remote identity provider, by ingesting an OIDC discovery document that includes the public keys needed to validate all tokens generated by OpenUnison.

When OpenUnison needs to call the remote cluster's API, it generates a one minute lived token that's scoped to the kube-oidc-proxy. The proxy is only able to impersonate a specific identity that has cluster-admin access. The API server request is authenticated by the proxy based on the short lived token before injecting the impersonation headers into the request. This lets us securely manage remote clusters without need to have a long lived token.
Now that we have a way to generate tokens that match our criteria, we need ArgoCD to know how to use them.
Part II - ArgoCD and Remote Cluster Authentication
After digging through ArgoCD's documentation, I found how the ApplicationSet operator can create a remote cluster in ArgoCD based on a Secret. ArgoCD uses the client-go SDK for Kubernetes, and supports configuring credential plugins. This is how you might use kubectl with your cloud hosted clusters if you're using their native IAM integration and the docs provide examples for the major clouds how to do this with ArgoCD. There are some drawbacks to relying on a cloud's IAM for Kubernetes:
If you're clusters aren't all on the same cloud, you won't get very far with this approach
Cloud IAM permissions don't always line up well with Kubernetes RBAC
Cloud IAM won't work for on-premises clusters
You, as the Kubernetes team, may not have the ability to manage cloud IAM roles on your own
Since ArgoCD already has a mechanism to use custom credentials and an identity provided by its cluster, I next needed to figure out how to get the identity and tell ArgoCD to use it. Luckily, OpenUnison has me covered!
Part III - Getting a Token
OpenUnison makes it really easy to create an API that I can use to generate a token. Almost any OpenUnison component can be customized via JavaScript. In this case, we're going to define an Application object (in OpenUnison, not ArgoCD) that will take the name of a remote registered cluster, generate a token, and return it back as the response to the call. If you're thinking "wow, that could really be abused", you're right! In order to make sure that only ArgoCD can all our service, we'll want to validate ArgoCD's token to make sure its bound to a running Pod using a TokenReviewRequest. Thankfully, OpenUnison does this right out of the box! That's how OpenUnison validates Prometheus when it calls the OpenUnison metrics endpoint. We're now going to create a pretty simple API:
---
apiVersion: openunison.tremolo.io/v1
kind: Application
metadata:
name: get-target-token
namespace: openunison
spec:
azTimeoutMillis: 3000
isApp: true
urls:
- hosts:
- "#[OU_HOST]"
filterChain:
- className: com.tremolosecurity.proxy.filters.JavaScriptFilter
params:
javaScript: |-
GlobalEntries = Java.type("com.tremolosecurity.server.GlobalEntries");
HashMap = Java.type("java.util.HashMap");
function initFilter(config) {
}
function doFilter(request,response,chain) {
var targetName = request.getParameter("targetName").getValues().get(0);
var k8s = GlobalEntries.getGlobalEntries().getConfigManager().getProvisioningEngine().getTarget(targetName).getProvider()
response.getWriter().print(k8s.getAuthToken());
}
uri: /api/get-target-token
azRules:
- scope: filter
constraint: (sub=system:serviceaccount:argocd:argocd-application-controller)
authChain: oauth2k8s
results: {}
cookieConfig:
sessionCookieName: tremolosession
domain: "#[OU_HOST]"
secure: true
httpOnly: true
logoutURI: "/logout"
keyAlias: session-unison
timeout: 1
scope: -1
cookiesEnabled: false
This Application has a single endpoint that uses JavaScript to get the target (cluster), and generate a token. The authChain makes sure that the API is authenticated by a valid ServiceAccount token from a running Pod, and the authorization rule makes sure that only the ArgoCD controller can call this endpoint. For instance if someone were to gain control of the ArgoCD UI, which has its own identity, they couldn't call this service to get tokens.
Once I have an endpoint, now I need a way for kubectl to call it. Turns out you can write a credential provider in anything, even bash! So I wrote a really simple provider that just uses curl to call our endpoint with the right data for our remote cluster using ArgoCD's identity:
#!/bin/bash
REMOTE_TOKEN=$(curl -H "Authorization: Bearer $(<$3)" https://$1/api/get-target-token?targetName=$2 2>/dev/null)
echo -n "{\"apiVersion\": \"client.authentication.k8s.io/v1\",\"kind\": \"ExecCredential\",\"status\": {\"token\": \"$REMOTE_TOKEN\"}}"
The credential plugin just passes in arguments like any other command line would. I built a simple kubectl configuration file to test and it worked great!
apiVersion: v1
kind: Config
users:
- name: openunison-control-plane
user:
exec:
command: /path/to/remote-token.sh
apiVersion: "client.authentication.k8s.io/v1"
env: []
args:
- k8sou.idp-cp.tremolo.dev/api/get-target-token
- k8s-kubernetes-satelite
- /tmp/token
installHint: |
copy shell file
provideClusterInfo: false
interactiveMode: Never
clusters:
- name: kubernetes-satelite
cluster:
server: https://oumgmt-proxy.idp-dev.tremolo.dev
extensions:
- name: client.authentication.k8s.io/exec
contexts:
- name: openunison-control-plane@kubernetes-satelite
context:
cluster: kubernetes-satelite
user: openunison-control-plane
current-context: openunison-control-plane@kubernetes-satelite
Now that I have my credential plugin, I need to get it into ArgoCD.
Part IV - Deployment
The great thing about containers is, well, they're self contained. This is also a problem. Thankfully, ArgoCD has a few ways to make the deployment easier. It turns out that in addition to the bash script, I needed to download curl too. The helm chart makes it easy to add additional volume mounts, so I updated my values.yaml so that my controller can download curl and copy in my script to the appropriate place in the container:
controller:
volumes:
- name: custom-tools
emptyDir: {}
- name: remote-tokens
configMap:
name: argocd-remote-tokens
volumeMounts:
- mountPath: /custom-tools
name: custom-tools
initContainers:
- name: downloadtools
image: alpine
command: [sh, -c]
args:
- wget -O /custom-tools/curl https://github.com/moparisthebest/static-curl/releases/download/v8.7.1/curl-amd64 && chmod +x /custom-tools/curl && cp /remote-tokens/remote-token.sh /custom-tools && chmod +x /custom-tools/remote-token.sh
volumeMounts:
- mountPath: /custom-tools
name: custom-tools
- mountPath: /remote-tokens
name: remote-tokens
I can run my script manually and get a token from inside of the ArgoCD controller pod! Finally, it's time to tell ArgoCD to sync some yaml!
Part V - Configuration
The first step to integrating with out cluster is to generate a Secret that stores our cluster connection information:
---
apiVersion: v1
kind: Secret
metadata:
name: k8s-kubernetes-satelite
namespace: argocd
labels:
argocd.argoproj.io/secret-type: cluster
tremolo.io/clustername: k8s-kubernetes-satelite
type: Opaque
stringData:
name: k8s-kubernetes-satelite
server: https://oumgmt-proxy.idp-dev.tremolo.dev
config: |
{
"execProviderConfig": {
"command": "/custom-tools/remote-token.sh",
"args": ["k8sou.idp-cp.tremolo.dev","k8s-kubernetes-satelite","/var/run/secrets/kubernetes.io/serviceaccount/token"],
"apiVersion": "client.authentication.k8s.io/v1"
},
"tlsClientConfig": {
"insecure": false,
"caData": "LS0tLS1C..."
}
}
We're telling ArgoCD to use our script and the information it needs to generate a token. The great thing is, that while this is stored in a Secret, there's nothing really secret here! No credentials, keys, etc.If you just create this Secret though, you won't find our new cluster in the ArgoCD interface. You still need to deploy an ApplicationSet for the operator to pick it up. Here's my ApplicationSet:
---
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: test-remote-cluster
namespace: argocd
spec:
goTemplate: true
goTemplateOptions: ["missingkey=error"]
generators:
- clusters:
selector:
matchLabels:
tremolo.io/clustername: k8s-kubernetes-satelite
template:
metadata:
name: '{{.name}}-guestbook' # 'name' field of the Secret
spec:
project: "default"
source:
repoURL: https://github.com/mlbiam/test-argocd-repo.git
targetRevision: HEAD
path: yaml
directory:
recurse: true
destination:
server: '{{.server}}' # 'server' field of the secret
namespace: myns
The magic happens because we specify a cluster via label matching. The labels in the ApplicationSet line up with the label in our cluster Secret. Now that all of our objects are in place, we can test to see if this process works.
Part VI - Synchronization
I waited a minute to let everything catch up (eventual consistency is a lie!). I logged into ArgoCD and BAM! I now have an Application object, a cluster, and a synchronized repository!

ArgoCD synchronizing from a git repo to a remote repository using no static keys or credentials

ArgoCD registered a remote cluster without static credentials
So what's the full process? Look at the below diagram:

ArgoCD syncing to a remote cluster with a short lived token.
ArgoCD runs a sync process and needs to interact with the remote cluster, since the cluster is configured using a client go-sdk credential plugin, the application controller called our API with the Pod's projected ServiceAccount token.
OpenUnison generated a TokenReviewRequest to validate the token.
The API server responds with a response. If the token expired, or the token was associated with an expired Pod, this step would fail.
OpenUnison generates a token signed by its private key that will be trusted by our cluster's kube-oidc-proxy.
The client-go SDK uses the token returned by the credential plugin to synchronize our git repo into the remote cluster.