Asimov's mission is to program living cells to create previously impossible biotechnologies. We have created a cell engineering platform combining synthetic biology, artificial intelligence and design automation.
As a new startup, one of our first challenges was setting up a software infrastructure where building, launching and scaling services is a simple process for developers. To this end, we chose to launch a managed Kubernetes cluster atop Google's Kubernetes Engine (GKE).
Kubernetes makes it quite easy to deploy Docker containers and expose them as services to the internet at large, and to scale that instance over time, using a concept called “Ingress”. However, in many situations we will want to launch services with limited access, such as Asimov employees accessing internal tools or dashboards.
Traditionally, securing an application like this is achieved through some combination of the following technologies:
Virtual Private Network (VPN): Users connect to the cluster, provide some credentials and are then able to access internal tools.
Single Sign-On: A tool like Kerberos allows you to use the same account across various components.
Home-grown user accounts: You implement an authentication system and users have a separate username/password for your computing infrastructure.
However, following the BeyondCorp security model, we’d like to support access to our infrastructure from any machine, without installing VPN clients, Kerberos clients, etc. For this, we can use Google Identity-Aware Proxy (IAP).
IAP is a beta service which requires requests to your service to authenticate with a Google Account. Once a user has authenticated and has their account checked against an authorized list, the request is forwarded to your service.
Finally, we want this process to scale as we create additional publicly available web applications. When it comes to URLs, we’ll want to be able to host an arbitrary amount of applications at *.asimov.io domains. We also want to make sure that we can reuse this authentication and certificate infrastructure for each new endpoint. For this, we’ll use Ambassador, a Kubernetes API gateway by Datawire.
Publicly facing Kubernetes applications are typically exposed through a resource called an Ingress. An Ingress is an abstraction for a gateway pointed at some Service in your cluster. Cloud providers of managed Kubernetes typically implement a controller using primitives available on their platform, such as Load Balancers. An example of a typical Ingress declaration might look like this:
apiVersion: extensions/v1beta1 kind: Ingress metadata: name: internal-ingress spec: backend: serviceName: my-service servicePort: 80
This is a very basic Ingress that is backed by an HTTPLoadBalancer.
The LoadBalancer will receive a dynamic IP address - you can visit your service by entering that address into your browser URL. The IP address can be found using kubectl get ingress internal-ingress or by locating the load balancer in your cloud console.
The next step is to create and import certificates. Certificates will help us ensure that:
A user’s browser session is secure.
Our load balancer is a candidate for IAP, as it requires a load balancer.
There are a variety of Certificate Authorities who can provide these certificates. We'll use Let's Encrypt, a free provider of certificates. This organization is simple to use, but we have some prerequisite tasks before we can request our certificates.
We mentioned earlier that we want to support a potentially very large number of applications hosted at *.asimov.io domains. Rather than create a certificate for each subdomain, we can create a “wildcard certificate”, which will be valid for all subdomains of asimov.io. Let’s Encrypt offers such certificates, on the condition that you can prove you own the parent domain.
Additionally, we plan to route traffic for each subdomain through a single Ingress, so we need to make sure that requests to <subdomain>.asimov.io end up hitting the same Ingress. As a result, we’ll have to do some configuration to stabilize our IP Addresses.
Static IP Address
First, we need to give our Ingress a reserved IP Address (for more on reserved addresses, see this document). By default, an Ingress is assigned an ephemeral address, which may change as provisioning events occur, such as recreating your Ingress. A static address is a reserved address which your Ingress can always have assigned to it.
This is DNS-provider specific, but for Google Cloud, you can go to "External IP Addresses", find the one with your Ingress name, and change the type to "Static". You can then give the IP address a name, and change your Ingress to get that address. Kubernetes Engine has an annotation which allows you to get the address by name rather than specifying the IP address directly. For example, check out the annotation used below:
apiVersion: extensions/v1beta1 kind: Ingress metadata: name: internal-ingress annotations: kubernetes.io/ingress.global-static-ip-name: "internal-asimov" spec: backend: serviceName: my-service servicePort: 80
At this step, we have an address which won't change. We also need to update our DNS to route to that static address. We created an A record which resolves the DNS name *.asimov.io to the static IP address we reserved above. This way, any address ending in .asimov.io will be routed to our Ingress.
Relevant documentation: http://docs.cert-manager.io/en/latest/tutorials/acme/dns-validation.html
Ok, at this point we have a domain that, when we visit it, points to our internal cluster. We're getting pretty close!
Next, we need to actually get the certificates. It's possible to upload and bind certificates manually. However, there is a tool called cert-manager which allows us to declare our certificate information and have it automatically renewed and imported as a secret. This is convenient, as an Ingress can terminate HTTPS connections by declaring the secret in the Ingress config.
So how do we get set up to create certificates from Let’s Encrypt?
cert-manager accomplishes this by creating resources on Kubernetes. The resources of interest are issuers and certificates.
Here's our issuer, which is responsible for creating/renewing certificates and updating the related secrets:
apiVersion: certmanager.k8s.io/v1alpha1 kind: Issuer metadata: name: letsencrypt namespace: default spec: acme: server: https://acme-v02.api.letsencrypt.org/directory email: email@example.com # Name of a secret used to store the ACME account private key privateKeySecretRef: name: letsencrypt # ACME DNS-01 provider configurations dns01: # Here we define a list of DNS-01 providers that can solve DNS challenges providers: - name: prod-dns clouddns: # A secretKeyRef to a google cloud json service account serviceAccountSecretRef: name: cert-dns-service-account key: cert-manager-key.json # The project in which to update the DNS zone project: my-project
The prod-dns provider declares a provider with name "prod-dns", and tells the issuer where to find the secrets that allow that provider to manipulate DNS records. This service account must be able to modify the cloud DNS records to prove to Let's Encrypt that we own the domain.
A note on providers: Let's Encrypt requires DNS01 challenges, which limits the cloud providers one can use for wildcard certificates. Other certificates can use a much broader range of challenges. See this doc for a listing of supported providers.
Next, let's look at a certificate:
apiVersion: certmanager.k8s.io/v1alpha1 kind: Certificate metadata: name: wildcard-asimov-io-tls spec: secretName: wildcard-asimov-io-tls issuerRef: name: letsencrypt dnsNames: - '*.asimov.io' acme: config: - dns01: provider: prod-dns domains: - '*.asimov.io'
This resource declares a certificate. We tell it to store certificates in a secret named “wildcard-asimov-io-tls”, that the issuer responsible for this certificate is named “letsencrypt”, to get us a wildcard cert for the “*.asimov.io” subdomain, and to use the provider named “prod-dns” (see above) for that domain.
Once these resources are configured and the certificate has been imported as a secret, we can update our Ingress to terminate HTTPS connections:
apiVersion: extensions/v1beta1 kind: Ingress metadata: name: internal-ingress annotations: kubernetes.io/ingress.global-static-ip-name: "internal-asimov" spec: tls: - secretName: wildcard-asimov-io-tls backend: serviceName: my-service servicePort: 80
By adding this tls section, our Ingress will now support HTTPS connections.
Relevant documentation: https://cloud.google.com/iap/docs/enabling-kubernetes-howto
Now that we have HTTPS and a domain, our Ingress is eligible for IAP!
Go to "Identity-Aware Proxy" in your cloud console. The left panel should list all "eligible" endpoints, essentially LoadBalancers which are of the type HTTPS. On the right panel, you can control who has access to resources protected by IAP. You can add individual users, or an entire organization. We've added "asimov.io" so that anybody with an account in our organization can access these systems.
You can toggle the IAP button to "on" for your ingress (ours is labeled "my-service", which you can see as the configured backend in our Ingress).
Routing and Verifying Requests
So we're done, right? Not quite. We need to verify that requests are coming from IAP.
The linked document has some examples of how to do this. However, we're setting ourselves up for a process that isn't very scalable. For each exposed service, we have to create an Ingress (which takes time), turn on IAP, and implement verification of the request? This isn't great.
If you're anything like me, your yak shaving instincts have already pushed you down the path of designing and implementing a proxy server. You'll have your Ingress point to that service, that service will verify the request, and then route the request to the system behind it. But then you start thinking that how maybe it's not the right idea to reinvent the wheel, and about how you're going to fan requests out to the appropriate backend services, and how somebody else must have had this problem...
Thankfully, somebody has. Datawire has an excellent tool called Ambassador which does all of these things and more:
I consider Ambassador the key piece which makes IAP a scalable authentication system. Here's how we use it at Asimov:
Instead of putting each backend service behind an Ingress, we put an Ambassador service behind it. Then, each new service we spin up which needs to be exposed on the public internet requires only two steps:
Add an Ambassador annotation to the service definition.
Add the new subdomain to the IAP console (this under "Edit Oath Client" in the IAP page).
Here's an example of a service with an ambassador annotation:
apiVersion: v1 kind: Service metadata: name: landing-page annotations: getambassador.io/config: | --- apiVersion: ambassador/v0 kind: Mapping name: landing-page-mapping prefix: / service: landing-page.default.svc.cluster.local.:4567 host: internal.asimov.io spec: ports: - protocol: TCP port: 4567 selector: app: landing-page
Here, we're creating a service that is backed by pods with the label app: landing-page. The ambassador config says "any request that is trying to access a route on the domain 'internal.asimov.io' should be routed to the service 'landing-page.'" There are many flexible routing options, but our use case is typically to use host routing.
Heimdall (Authentication Server)
Now, I mentioned that Ambassador supports an authentication hook - this doesn't mean that Ambassador performs request verification for us. The details surrounding request authentication vary by implementation, so we need to write some custom logic for our configuration. Ambassador allows us to supply a service that receives the headers for each request, and tells Ambassador whether that request is authenticated or not.
We created a "reusable" service for this called Heimdall. Heimdall is a configurable Java service which listens for authentication requests and performs the work of verifying a JWT token (IAP sends a signed JSON Web Token (JWT) containing authentication information that must be parsed and validated).
We have a repository containing an implementation of Heimdall here.
While most of the implementation is straightforward (in fact, the authentication code itself is a slightly modified version of the excellent example here), I'll walk through some considerations for running it.
The application requires an environment variable called "com.google.iap.audience", which is in the form of: /projects/PROJECT_NUMBER/global/backendServices/SERVICE_ID. You can get this audience by visiting the IAP page. Next to the resource you've secured, click the dropdown and click "Signed Header JWT Audience". You can set this environment variable in your deployment spec.
Here is the service definition for Heimdall:
apiVersion: v1 kind: Service metadata: name: heimdall annotations: getambassador.io/config: | --- apiVersion: ambassador/v0 kind: AuthService name: authentication auth_service: "heimdall.default.svc.cluster.local.:8080" allowed_headers: - "x-iap-user-email" spec: ports: - protocol: TCP port: 8080 selector: app: heimdall
This is fairly similar to most Ambassador configs; although this config is not a route definition, but a declaration that an auth service is backed by the "Heimdall" service. Depending on your namespacing requirements, you may want to adjust the “auth_service” route. We also add an “allowed_headers” section, because Heimdall sets the “x-iap-user-email” header to the authenticated user's email for convenience.
This is mentioned in the README, but the health check situation bears explanation. Heimdall has "three" routes:
This route is to support the health checks Google Cloud makes against its load balancers. By returning 200, we're letting this request in to a service listening on this route that returns 200 OK to the health check.
By default, Google Cloud sends their healthcheck to ‘/’ on the backing service. This isn’t compatible with routing services behind a proxy server which performs authentication, because:
Most systems will want authentication on their ‘/’ route.
The health check does not include authentication.
This is why Heimdall supports a separate “passthrough” which skips authentication. As a result it is necessary to adjust the destination of the load balancer health checks to equal this route - by default it hits '/' which will mean that you can't turn on authentication for requests to that same URL, which probably isn't desirable.
It is also necessary to have some system backing this route. If you don't, the load balancer will stop serving traffic, even if your other services are doing fine. The easiest path to making this change that I’ve found is:
Go to "Load Balancing".
Click the name of the balancer containing your ingress name.
Under “backend services”, click the “health check” link to the right.
Edit the path to be “/load-balancer-health” and save.
This is a simple health check for the system to prove it is up.
By default, every other route is a request for authentication. Ambassador will forward requests that come in for your proxied resources to this route, which will unpack and verify the JWT tokens.
This infrastructure has made exposing various web interfaces a breeze, and the components involved have been impressively robust. We ended up with something like this:
There are still some ways this setup could improve:
The IAP access list is a great start for controlling access to resources, but if you need more granular control on a per route basis, you'll need to invest in a more flexible solution.
It would be really nice if IAP could be modified programmatically. As of this writing, there's not a great way to set up IAP via the API.
It can be turned on, but you have to generate the oath credentials yourself.
Additionally, you can't modify the "supported routes" to add a new route to the backend via the API.
https://cloud.google.com/iap/docs/sessions-howto is a great resource for managing the authentication session for more dynamic setups such as applications making use of AJAX; we see occasional hiccups with third-party software with dynamic interfaces.
Ambassador is excellent and improving every day, but we do wish we could turn off or down the access logs - Google Cloud health checks generate very frequent traffic, so these logs are fairly verbose.
We also found that the statsd container that ships with Ambassador causes issues for our DNS if we don't have a stats sink deployed. They recommend removing the container if you aren't collecting stats, and I'd reinforce that point.
Oathkeeper is an interesting self-hosted alternative to IAP for more granular control with less vendor lock-in.
Communications between services within the cluster should also be secured.
Finally, if you'd like to discuss infrastructure or are interested in an engineering position at Asimov, feel free to reach out to me at "brian at asimov dot io".