You would expect Kubernetes to have achieved the top rung in the Cloud Native Compute Foundation (CNCF) maturity progression. You can understand why Prometheus would also given the popularity of collecting application telemetry via the ELK stack. Teams are using Prometheus and Grafana to monitor a containerized ELK.
But surprisingly, the only other graduated project is a service proxy that started at Lyft called Envoy. The cloud-native community has embraced service mesh really quickly. Both Envoy and its orchestrator companion, Istio, had their first commit just over 2 years ago. Prometheus was over 6 years and Kubernetes nearly 5 years ago.
Service mesh adoption is not just about Networking. Service discovery, load balancing, identity management are application services. Service mesh also provides networking services like translation of identity into IP addresses, encryption, and managing connections.
Now Istio/Envoy are leading a field with some other really strong entrants like Hashicorp, NGINX, and another CNCF project that is older than Envoy called Linkerd that popularized the term service mesh.
In my last post, I introduced the first 3 leaps forward that service mesh brings to networking.
#1 Software-only overlay
#2 Every application gets its own network
#3 Identity based networking
Let’s look at the other 3 leaps, that you might say are only leaps in the context of application-centric networking.
#4 A distributed proxy alongside every workload – Let’s unpack that one. Proxy servers started as a resource for web clients to contact and/or retrieve resources from many servers. This evolved as critical for sophisticated application load balancing.
In a service mesh, every workload has a companion, a sidecar proxy. The application workload become its client and the other proxies become the many resources it makes requests of. This proxy goes further than before, moving control of many key networking services into the application domain: IP address management, firewall, routing and telemetry.
So what’s the great leap forward? Service proxies responsible for networking are in the application domain. It’s a leap because it enables DevOps to do networking in the application domain to eliminate development and debugging dependencies that slow them down. It’s great because the networking services and policy can be specific to that workload wherever it appears.
#5 Orchestrated – The internet was designed so that the defense agency network could withstand a nuclear strike that took out major links. There is no central control; behavior is dictated by standard protocols and the units of management are end to end (TCP/IP) connections and the health of each device. This remains a good idea for the underlying packet moving networks.
But modern applications infrastructure has moved to an orchestrated model, with Kubernetes being the most relevant here. An orchestrator has telemetry on the global state of its member elements as well as on the lifecyle of all the services running across those elements. It measures and ensures system health including restarting and scaling out as necessary. But an orchestrator is not controlling every action; healthy elements execute as programmed.
So what’s the great leap forward? An orchestrated network gives a single, coherent view of the state of the network upon which the orchestrator can act to ensure services and policy. It’s a leap because attributes #1-5 make a closed loop system feasible to implement and comprehensible to operate. It’s great because the team designing and building the application based on intensive customer feedback can also design and operate the security and networking policies to best meet customer and business needs.
#6 Observable from application view – Once you decide your key organizing unit is the application, you want all telemetry you receive organized this way. Recall the high value of NewRelic and AppDynamics.
But until now, networking telemetry you got was for the portions of the network over which your application packets are traveling, along with lots of other traffic.
So what’s the great leap forward? Attributes 1-5 make it possible to give a view of the network that is application specific. It’s a leap because DevOps no longer have to tease telemetry out of networking elements that vendors try to differentiate based on their components’ data exhaust, it is just another set of data from your application. And it’s great because it enables the DevOps team to set and debug the entire system performance. And the performance view of the network lines up perfectly with a performance view of the application services.
Service mesh adoption is set continue. Now there is another question coming to the front. One underlying assumption of a service mesh is that it is essentially a flat network in a single VPC, cluster, trust domain or administrative domain. How do we get DevOps for networking if a given application crosses these domains?My colleagues at Bayware and I will explore this in future posts.