Token invalidation in distributed systems

Building fast systems can be tricky and distributed computing is often the approach taken to improve performance. This introduces unique challenges in the realm of security - specifically handling Authentication & Authorization. OAuth 2's JSON Web Tokens solve this but introduce risks with revocation.

Distributed Authentication at SEEK

SEEK is no stranger to the distributed computing model. Almost everything you access is being served up by dozens of micro-services written in different languages and maintained by different teams.

Sidecar - a cohesive task attached to the primary application - typically placed in its own process or container - providing a homogeneous interface for services irrespective of language.

At SEEK we solved this by introducing a sidecar alongside our applications. The sidecar handles token validation, acceptance & rejection of inbound requests and passes context of the user to the application.

To serve the many languages that exist across applications at SEEK the homogeneous interface of choice is HTTP. It has an implementation in nearly all the languages that we use and the headers provide a good strategy for amending requests with additional information.

Another way to look at the Authentication Sidecar is as a proxy.

An example of the sidecar design pattern with two task-sets operating in an auto-scaling cloud environment. In this scenario, you can see a successful request is passed through the sidecar to the Application task & an unauthorised request being terminated at the sidecar.In this scenario the Authentication Sidecar utilises a cached JSON Web Key Set (JWKS) to validate the signature of the tokens. This enables distributed token validation without needing to needing to call to the Identity Provider on every request.

This model is fast and scalable but unfortunately introduces a risk whereby a blocked user can still perform requests up until the expiry of the JSON Web Token (JWT). Although the specification doesn't recommend an expiration time, it's typically common practice for them to be short-lived - around 15 minutes.

Token Revocation

Part of the OAuth 2 standard is RFC7009 - the Token Revocation specification. It details the fairly basic process associated with revoking tokens.

The specification details an endpoint that a client can call and submit the token it wishes to revoke. Subsequent token validation requests will then be rejected as the token is now considered invalid.

This specification only works with opaque tokens - not JSON Web Tokens. This challenge exists at lots of companies- including SEEK. Our distributed system architecture makes it extremely difficult to efficiently verify tokens against a central source. This is why we choose to sign tokens with a short-lived expiry and accept the risk that a blocked user can still make requests up until the expiry of that token.

Another challenge is providing a mechanism to match the token. JSON Web Tokens contain all the information needed to identify the user. They're packed, signed & sent - not stored. This means that an additional claim must be added to identify the token - for this we have the JWT ID.

Distributed Token Invalidation

Invalidating tokens in a distributed system can be tricky, but we can use a bloom filter to significantly reduce the overhead of checking invalidated JWT IDs against a central source.

Tokens are typically short-lived and in any situation we wouldn't expect to have to block 100,000,000 tokens which can easily fit into ~500MBs of memory in the form of a bloom filter with a 1 in 999,925,224 chance of a false positive.

To solve for this scenario, I have been working on oauth-revokerd. The purpose of this service is to maintain a list of revoked JWT IDs and distribute a bloom filter for this list accordingly.

The service provides an in-memory database for short-term storage of revoked tokens - up to the expiry of the token itself. The general idea of this service is to be stood up as quickly as possible with minimum configuration, provide a management API to revoke tokens and a query API to verify revocations in the case of false positives.

In this concept, the sidecar is extended to consume the bloom filter.

Concluding

That's pretty much all on this for now. I'm hoping to do more with this project in the future and will post updates on progress on my website.