Those who don’t work in distributed computing or micro service world , there are possibility they never heard service mesh term. So let me start with its definition so that you can get some background about service mesh and its importance and how it is solving traceability, log and observeability challenges in distributed micro service world.
What is Service Mesh and why we need it?
“Service mesh is infrastructure component that offloads operational work requirements from your application“. The term service mesh is used to describe the network of microservices that make up such applications and the interactions between them. As a service mesh grows in size and complexity, it can become harder to understand and manage. Its requirements can include discovery, load balancing, failure recovery, metrics, and monitoring. A service mesh also often has more complex operational requirements, like A/B testing, canary releases, rate limiting, access control, and end-to-end authentication.
“If you don’t have service mesh(istio/linkerd) implementation on your micro service infra then you will totally go blind during timeout and performance troubleshooting problems and i am 100% sure you can’t troubleshoot and will come up with unknown RCA’s.”
An Istio service mesh is logically split into a data plane and a control plane.
- The data plane is composed of a set of intelligent proxies (Envoy) deployed as sidecars. These proxies mediate and control all network communication between microservices along with Mixer, a general-purpose policy and telemetry hub.
- The control plane manages and configures the proxies to route traffic. Additionally, the control plane configures Mixers to enforce policies and collect telemetry.
There are 2 major open source players in service mesh implementation as control plane
- Istio (Written in Go)
- Linkerd (Written in Rust)
There are 2 major open source players in service mesh implementation as data plane
- Envoy (Mostly used as side car proxy for service to service communication)
- Nginx (Mostly used as front proxy for external to service communication)
NOTE: Control plane run’s in its own K8s/Openshift name space(istio-system) where as data plane side car proxy(envoy) runs in each namespace where you implements data plain proxy and all policy/rule/traffic communication is controlled by control plane containers running in istio-system namespace.
Following image shows control plane supporting docker containers running inside istio-system name space on openshift and similar output you will get in K8s while running kubectl get pods command.
Both Istio and Linkerd has very similar architecture but both are competitive in market with their features, maturity level but overall goal and architecture are same so we will use Istio to describe service mesh and will go deeper on comparison and implementation in another series of blogs on it.
The following diagram shows the different components that make up each plane(Ref: Istio):
An Linkerd service mesh is also logically split into data plane and control plane but has its own thin footprint proxy component to handle data plane part:
Following diagram shows how 4 micro services are communicating with each other using side car proxy data plane:
What is current State for Istio and Linkerd and which service mesh should i use?
Istio is more mature in terms of features where as Linkerd is coming as more performance oriented side car proxy as it has its own data plain as proxy component but currently Istio is winner but Linkerd is better option in long run as it will help you in reduction of side car proxy cost due to its thin footprint data plain option.But service mesh and its market is still evolving so your team can pick any one of them to start with, but Istio is recommended as of now in production environment and you can start Linkerd in your Dev environment.
Data plane side car proxy gets attached to each micro service and in case there are thousands of service then you have to run thousands of data plane proxy and that will lead to increase in cost so you should be very much mindfull where to implement it in service mesh architecture.
Istio Control Plane component:
Mixer: Mixer is a platform-independent component. Mixer enforces access control and usage policies across the service mesh, and collects telemetry data from the Envoy proxy and other services. The proxy extracts request level attributes, and sends them to Mixer for evaluation.
Mixer includes a flexible plugin model. This model enables Istio to interface with a variety of host environments and infrastructure backends. Thus, Istio abstracts the Envoy proxy and Istio-managed services from these details.
Pilot: Pilot provides service discovery for the Envoy sidecars, traffic management capabilities for intelligent routing (e.g., A/B tests, canary deployments, etc.), and resiliency (timeouts, retries, circuit breakers, etc.).
Pilot converts high level routing rules that control traffic behavior into Envoy-specific configurations, and propagates them to the sidecars at runtime. Pilot abstracts platform-specific service discovery mechanisms and synthesizes them into a standard format that any sidecar conforming with the Envoy data plane APIs can consume. This loose coupling allows Istio to run on multiple environments such as Kubernetes, Consul, or Nomad, while maintaining the same operator interface for traffic management.
Citadel: Citadel provides strong service-to-service and end-user authentication with built-in identity and credential management. You can use Citadel to upgrade unencrypted traffic in the service mesh. Using Citadel, operators can enforce policies based on service identity rather than on network controls.
Istio provides behavioral insights and operational control over the service mesh as a whole, offering a complete solution to satisfy the diverse requirements of microservice applications. It provides a number of key capabilities uniformly across a network of services:
- Traffic Management. Control the flow of traffic and API calls between services, make calls more reliable, and make the network more robust in the face of adverse conditions.
- Observability. Gain understanding of the dependencies between services and the nature and flow of traffic between them, providing the ability to quickly identify issues.
- Policy Enforcement. Apply organizational policy to the interaction between services, ensure access policies are enforced and resources are fairly distributed among consumers. Policy changes are made by configuring the mesh, not by changing application code.
- Service Identity and Security. Provide services in the mesh with a verifiable identity and provide the ability to protect service traffic as it flows over networks of varying degrees of trustability.
In addition to these behaviors, Istio is designed for extensibility to meet diverse deployment needs:
- Platform Support. Istio is designed to run in a variety of environments including ones that span Cloud, on-premise, Kubernetes, Mesos etc. We’re initially focused on Kubernetes but are working to support other environments soon.
- Integration and Customization. The policy enforcement component can be extended and customized to integrate with existing solutions for ACLs, logging, monitoring, quotas, auditing and more.
Envoy as data plane proxy for Istio:
Envoy is a high-performance proxy developed in C++ to mediate all inbound and outbound traffic for all services in the service mesh. Istio leverages Envoy’s many built-in features, for example:
- Dynamic service discovery,Load balancing
- TLS termination,HTTP/2 and gRPC proxies
- Circuit breakers,Health checks
- Staged rollouts with %-based traffic split
- Fault injection,Rich metrics
Envoy is deployed as a sidecar to the relevant service in the same Kubernetes pod. This deployment allows Istio to extract a wealth of signals about traffic behavior as attributes. Istio can, in turn, use these attributes in Mixer to enforce policy decisions, and send them to monitoring systems to provide information about the behavior of the entire mesh.
Once you will implement service mesh then following are the outputs that you can see on observeability metrics, traceability and service communication graph:
Following Graphana graph shows metric among inter communication between services in addition to global success rate on KPI’s.
Fig 1: Observeable Metric Example
In tracing domain there are 2 key players like Zipkin and Jagger and following Jagger graph shows traceability among various micro services and time taken during each span so that you can work on various communication and performance related problem in micro service world.
Fig 2: Traceability example with Jagger
Following is service graph that easily helps you to draw your service interaction/dependency graph so that you can have review about how your micro services are communication with each other and how latency they have on each path.
Fig 3: Service communication graph
Service mesh technologies provides option to use plugable tools in its architecture so that you can use toolset according to your setup. I will be writing further blogs on Service mesh about how it can be implemented on Openshift or K8s and how its features can be utilized in traffic routing, Canary deployment etc.
There is much information available on internet about service mesh, istio, linkerd, k8s, openshift, envoy and some of the links will be much useful like thenewstack.io etc.
We also covered this important topic in our Practical Site Reliability Engineering book as its very important and integral part of micro service world and those who are interested can have a look around its content and comment to get more clarity around this topic.
Part 2 is available at : https://lnkd.in/dsgSh5U