Kubernetes servicemesh
Do we need a service mesh ?
few years ago i started to evaluate this feature fitting in an existing infrastructure
There are many concept to consider and many mistake the people usually think
Better to start with what is NOT a work for a service mesh
- is not an apigw (even if could share some components)
- is not the place to put firewall rules
- is not something magic that boost the applications
- is something that if not used with a know scope could generate a mess
So what is ...
well short answer
- is the missing link in the infrastructure observability
- is a way to handle in a structured way the application routing
- is an internal ratelimit / anti ddos / infrastructure layer (be careful)
- could be a clever way to improve some application limits (expanded next)
Anyway is this something that we can add in our infrastructure ?
There no YES/NO , however we can evaluate the company and the maturity of microservices
internal rate limit , is usually a feature that could safe the infrastructure during snowball effects
however means that if the infrastructure is SYNC (no decoupling) have the rate limit can just
stop the application to serve requests , and this will propagated to the others below.
result: no answer
threads safe
Sometimes it's better to have a strategic business login using the a circuit breaker that bring a huge complexity in configuration.
The other point related to rate limit is .. who will maintain those values ?
Should be part of deployment pipeline and directly correlated with the application scope
In a 200+ micro services infrastructure this could be a huge problem:
- project that lost ownership
- projects not well maintained
- new legacy projects
So my idea about rate limit is to use it in a specific "strategic" applications and should not indiscriminately added to the whole infrastructure
About the routing feature, we can consider as a more detailed and customized blue green deployment , this specific case it's really useful when we have to deploy new features in production and canary deployment is not enough to cover the business measurement we need.
This feature could be used to keep a specific affinity within the microservices and this is the real feature that some of you can consider, imagine a strict dependencies between application an cache (as usual)
So the application Pippo is using the cache Paperino
Pippo is a namespace composed by 10 pods
Paperino is a cache composed by 6 pods
Imagine that we have the cache as a replication/sharded and we have 2 availability zones
With service mesh we can use labels to say to Pippo to use the cache Paperino only in the availability zone where the call start from Pippo av, this will reduce dramatically the roundtrip and the answer
I played a bit with service mesh in order to answer some questions,
however related to firewall rules the right answer is Cilium :-)
With this, I'd like to say that service mash give you a great value only
if your infrastructure is able to embrace it and only if you know what you are doing with this infrastructure.
- observability
- routing purpose (this is strictly related to the microservice architecture)
- rate limitng
Service Mesh sample lab
This lab is provided to discover and test the functionality w'd like to implement in our environment
Basic setup
- minikube v1.6.2
- Kubernetes v1.17.0 on Docker '19.03.5'
curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
chmod +x minikube
sudo install minikube /usr/local/bin/
minikube start --memory=3000 --cpus=3
- network (since in production we are using flannel that is not able to manage network policy we can start using no CNI to have the environment as much as close to lmn environment)
Namespaces
a → traefik (ingress)
b → apache
c → application
d → redis
b is a namespaces that manage an apache used for some rewrite rules
c is a python application that is connected to a redis database, provide some function to set a time, get a time with /set context and /get context
d is the redis database
c code
"""
Example app that integrates with redis and save/get timing
"""
from os import environ
from datetime import datetime
import json
import redis
from flask import Flask, redirect
VERSION = "1.1.1"
REDIS_ENDPOINT = environ.get("REDIS_ENDPOINT", "redis-svc.d-redis.svc.cluster.local")
REDIS_PORT = int(environ.get("REDIS_PORT", "6379"))
APP = Flask(__name__)
@APP.route("/")
def redisapp():
"""Main redirect"""
return redirect("/get", code=302)
@APP.route("/set")
def set_var():
"""Set the time"""
red = redis.StrictRedis(host=REDIS_ENDPOINT, port=REDIS_PORT, db=0)
red.set("time", str(datetime.now()))
return json.dumps({"time": str(red.get("time"))})
@APP.route("/get")
def get_var():
"""Get the time"""
red = redis.StrictRedis(host=REDIS_ENDPOINT, port=REDIS_PORT, db=0)
return json.dumps({"time": str(red.get("time"))})
@APP.route("/reset")
def reset():
"""Reset the time"""
red = redis.StrictRedis(host=REDIS_ENDPOINT, port=REDIS_PORT, db=0)
red.delete("time")
return json.dumps({"time": str(red.get("time"))})
@APP.route("/version")
def version():
"""Get the app version"""
return json.dumps({"version": VERSION})
@APP.route("/healthz")
def health():
"""Check the app health"""
try:
red = redis.StrictRedis(host=REDIS_ENDPOINT, port=REDIS_PORT, db=0)
red.ping()
except redis.exceptions.ConnectionError:
return json.dumps({"ping": "FAIL"})
return json.dumps({"ping": red.ping()})
@APP.route("/readyz")
def ready():
"""Check the app readiness"""
return health()
if __name__ == "__main__":
APP.run(debug=True, host="0.0.0.0")
Dokerfile
FROM python:3.6-alpine
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
ENTRYPOINT ["python"]
CMD ["app.py"]
requirements.txt
Flask
redis
pytest
pytest-flask
Folders structure
kubernetes/
├── 00-traefik
│ ├── A-00-traefik-ns.yaml
│ ├── A-01-traefik-rbac.yaml
│ └── A-02-traefik-ds.yaml
├── 01-apache
│ ├── B-00-k8s-apacherr-ns.yaml
│ ├── B-01-k8s-apacherr-svc.yaml
│ ├── B-02-k8s-apacherr-ing.yaml
│ ├── B-03-k8s-apacherr-dpl.yaml
│ └── B-04-k8s-apacherr-cfm.yaml
├── 02-redis
│ ├── D-00-lab-redis-ns.yaml
│ ├── D-01-lab-redis-svc.yaml
│ └── D-02-lab-redis-dpl.yaml
└── 03-app
├── C-00-app-ns.yaml
├── C-01-app-svc.yaml
└── C-02-app-dpl.yaml
Startup
kubernetes$ kubectl apply -f 00-traefik/
namespace/a-ingress-traefik created
clusterrole.rbac.authorization.k8s.io/traefik-ingress-controller created
serviceaccount/traefik-ingress-controller created
clusterrolebinding.rbac.authorization.k8s.io/traefik-ingress-controller created
serviceaccount/traefik-ingress-controller created
daemonset.apps/traefik-ingress-controller created
service/traefik-ingress-service created
kubernetes$ kubectl apply -f 01-apache/
namespace/b-apacherr created
service/apacherr-svc created
ingress.extensions/apacherr-ingress created
deployment.apps/apacherr created
configmap/apacherr-80-config created
kubernetes$ kubectl apply -f 02-redis/
namespace/d-redis created
service/redis-svc created
deployment.apps/redis created
kubernetes$ kubectl apply -f 03-app/
namespace/c-app-count created
service/app-count-svc created
deployment.apps/pythonapp created
kubectl get po --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
a-ingress-traefik traefik-ingress-controller-jkppg 1/1 Running 0 5m29s
b-apacherr apacherr-8b786b45d-g9vcl 1/1 Running 0 5m19s
c-app-count pythonapp-555d6d88cd-slhfb 1/1 Running 0 4m55s
d-redis redis-b869b89d-pf6ms 1/1 Running 0 5m12s
kube-system coredns-6955765f44-6nrdr 1/1 Running 1 74m
kube-system coredns-6955765f44-9fbgt 1/1 Running 1 74m
kube-system etcd-minikube 1/1 Running 1 74m
kube-system kube-addon-manager-minikube 1/1 Running 1 74m
kube-system kube-apiserver-minikube 1/1 Running 1 74m
kube-system kube-controller-manager-minikube 1/1 Running 1 74m
kube-system kube-proxy-cchls 1/1 Running 1 74m
kube-system kube-scheduler-minikube 1/1 Running 1 74m
kube-system storage-provisioner 1/1 Running 2 74m
make sure virtualbox 8081 port should be available
flow
inizialize the redis database
$ curl http://pippo.lan/count/set
{"time": "b'2019-12-28 20:06:33.919059'"}
test from apache to application (case 1)
$ curl http://pippo.lan/count/get
{"time": "b'2019-12-28 20:06:33.919059'"}
test from apache to redis (case 2)
$ curl http://pippo.lan/redis/GET/time
{"GET":"2019-12-28 20:06:33.919059"}
network rule example
cilium labels
ENDPOINT POLICY (ingress) POLICY (egress) IDENTITY LABELS (source:key[=value]) IPv6 IPv4 STATUS
ENFORCEMENT ENFORCEMENT
201 Disabled Disabled 32580 k8s:app=redis 10.15.182.193 ready
k8s:io.cilium.k8s.namespace.labels.name=d-redis
k8s:io.cilium.k8s.policy.cluster=default
k8s:io.cilium.k8s.policy.serviceaccount=default
k8s:io.kubernetes.pod.namespace=d-redis
k8s:track=redis
1257 Disabled Disabled 4 reserved:health 10.15.197.106 ready
1663 Disabled Disabled 54130 k8s:app=apacherr 10.15.192.41 ready
k8s:io.cilium.k8s.namespace.labels.name=b-apacherr
k8s:io.cilium.k8s.policy.cluster=default
k8s:io.cilium.k8s.policy.serviceaccount=default
k8s:io.kubernetes.pod.namespace=b-apacherr
3167 Disabled Disabled 33702 k8s:app=pythonapp 10.15.247.186 ready
k8s:io.cilium.k8s.namespace.labels.name=c-app-count
k8s:io.cilium.k8s.namespace.labels.purpose=app
k8s:io.cilium.k8s.policy.cluster=default
k8s:io.cilium.k8s.policy.serviceaccount=default
k8s:io.kubernetes.pod.namespace=c-app-count
k8s:track=pythonapp-stable
network rule
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: isolate-namespace
namespace: d-redis
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: c-app-count
egress:
- to:
- namespaceSelector:
matchLabels:
name: c-app-count
cilium/hubble https://github.com/cilium/hubble
video Cilium example --> img/cilium.mkv
video Istio + Cilium example --> img/istio.mkv
requirements
- use service mesh to segregate redis "d" to accept connections only from application "c"
expected "case 1" still working, "case 2" stop working and receive an error