kubernetes - Tim Blog

kubernetes is a container automatic operation platform, which is used to manage deployment, scheduling and scaling between node clusters

# to deploy pod on kubernetes using kind:deployment
kubectl create -f deployment.yaml
# to deploy or update pod on kubernetes
kubectl apply -f https://k8s.io/examples/pods/two-container-pod.yaml

components

job

Unlike deployments and services in Kubernetes, you can’t change the same Job configuration file and reapply it at once. When you make changes in the Job configuration file, you must delete the previous Job from the cluster before you apply it. Jobs are used to create transient pods that perform specific tasks they are assigned to. CronJobs do the same thing, but they run tasks based on a defined schedule

Jobs

creat jobs with a defined yaml file

CronJobs

execute job on a predefined schedule

apiVersion: batch/v1beta1            ## The version of the Kubernetes API
kind: CronJob                        ## The type of object for Cron jobs
metadata:
name: cron-test
spec:
schedule: "*/1 * * * *"            ## Defined schedule using the *nix style cron syntax
jobTemplate:
    spec:
    template:
        spec:
        containers:
        - name: cron-test
            image: busybox            ## Image used
            args:
        - /bin/sh
            - -c
            - date; echo Hello this is Cron test
        restartPolicy: OnFailure    ##  Restart Policy in case container failed

Work queues

pod

consist of a group of containers and volumes. container in the same pod share one namespace, so localhost communication is possible pod is temporary and stateless rather than consistent. If exception happens, kubernetes will create new pod to replace old pod

lable

lable is a key/value pair for user defined attributes transfer, which enables pod selection with Selectors and applies service or replication controller on the selection

replication controller

replication controller creates multi-copies of a pod on nodes and constently watching them

service

service defines a group of pods and an abstraction layer (virtual layer) for pod access. service can find a group of pods using lable, the access to the pod is also load-balanced managed

node

node is kubernetes worker, key-components are kubelet (master node proxy), kube-proxy(with this service can connect to pod) and docker

master node

there are three core components in master node: kubectl-scheduler，etcd and controller-manager

kubectl-scheduler: scheduling of resource objects
etcd: Responsible for persisting resource objects in the cluster
controller-manager: watch cluster status and decide for operation

volume storage

emptyDir: it is created when pod is distributed to node, all containers on this pod have read/write access to this volume. volume will be removed if pod is deleted from node(non-persistent)

apiVersion: v1 kind: Pod metadata: name: test-pd spec: containers: - image: k8s.gcr.io/test-webserver name: test-container volumeMounts: - mountPath: /cache name: cache-volume volumes: - name: cache-volume emptyDir: {}
hostPath: it mount file system on Node onto pod. when pod need to use file no node, hostPath should be used. Typical user cases are saving logs into host, accessing docker’s data on the host

apiVersion: v1 kind: Pod metadata: name: test-pd spec: containers: - image: k8s.gcr.io/test-webserver name: test-container volumeMounts: - mountPath: /test-pd name: test-volume volumes: - name: test-volume hostPath: #directory location on host path: /data #this field is optional type: Directory
gcePersistentDisk: it mounts GCE’s(google compute engine) persistent drive onto volume(need GCE VM)

volumes: - name: test-volume # This GCE PD must already exist. gcePersistentDisk: pdName: my-data-disk fsType: ext4
nfs

volumes: - name: nfs nfs: # FIXME: use the right hostname server: 10.254.234.223 path: “/”
gitRepo

volumes: - name: git-volume gitRepo: repository: “git@somewhere:me/my-git-repository.git” revision: “22f1d8406d464b0c0874075539c1f2e96c253775”
subPath:mount share volume for multi-container

apiVersion: v1 kind: Pod metadata: name: my-lamp-site spec: containers: - name: mysql image: mysql volumeMounts: - mountPath: /var/lib/mysql name: site-data subPath: mysql - name: php image: php volumeMounts: - mountPath: /var/www/html name: site-data subPath: html volumes: - name: site-data persistentVolumeClaim: claimName: my-lamp-site-data

In-tree Volume Plugin

Out-of-tree Provisioner

CSI (Container Storage Interface)

csi is an abstract interface with protobuf protocol

RPC (remote procedure call protocol)
call ID mapping
every function has its own ID, this ID is unique in all processes. During remote call this ID will be sent from client to server, so that server can call this function according to this ID
serialization and deserialization parameters will be converted to byte stream (serialization), send to server, and converted back to readable formate for server(deserialization)
network transmission using TCP, or UDP or HTTP2 for network transmission
PV and PVC

PV&PVC enable creating volume independent of pod, its lifecycle is also independent of pod
provisioning
- static: VolumeManager create PVs
- dynamic: VolumeManager define StorageClass, system will create PV and PVC combination automatically
binding
Using pod use volume to mount pvc on the container. If volume type is persistentVoulumeClaim, then the PV resource is used exclusively
release
after deletion of PVC, PV will be released. After the data on PV is removed or deleted(reclaiming process) another binding is possible
reclaiming policy to deal with remaining data on PV after release
PV(Persisten Volume)

is an abstraction of network storage. It is created and configurated by VolumeManager , and connects to share storage with plugin mechanism.
it is network storage, does not belong to any node, but can be accessed by every node
it is defined outside Pod
it can define storage capacity, access mode, storage class, reclaim Policy, storage type

apiVersion: v1 kind: PersistentVolume metadata: name: pv0001 spec: capacity: storage: 5Gi
accessModes: - ReadWriteMany persistentVolumeReclaimPolicy: Recycle storageClassName：slow nfs: path: “/data/disk1” server: 192.168.69.69 readOnly: false

PVC(PersistentVolumeClaims)

is a claim of resource consumption on PV. This claim is used by refering it unter volume in pod
both PV and PVC are restricted to namespace, only PV and PVC under same namespace can be combined, PVC can only be mounted to Pod in the same namespace
system will match PV and PVC which fulfill both storageclass and selector
if PV is not defined by VolumeManager, then a ReadWriteOnce PV will be created dynamically by system. matching PV using selector is not possible any more

#defining pvc statically kind: PersistentVolumeClaim apiVersion: v1 metadata: name: myclaim spec: accessModes: - ReadWriteOnce # description of claimed resource resources: requests: storage: 8Gi # PVC storage class requirement, only matching PV will be selected storageClassName: slow # setting selection lables to filter out PV selector: matchLabels: release: “stable” matchExpressions: - {key: evironment, operator: In, values: [dev]} #define pvc dynamically kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: slow provisioner: kubernetes.io/aws-ebs parameters: type: io1 zones: us-east-1d, us-east-1c iopsPerGB: “10” #refering pvc kind: Pod apiVersion: v1 metadata: name: mypod spec: containers: - name: myfrontend image: dockerfile/nginx volumeMounts: #####Volume config - mountPath: “/var/www/html” name: mypd volumes: - name: mypd persistentVolumeClaim: ###### claimName: myclaim ##########

Security Context

it defines the access rights of Pod and Container

Discretionary Access Control

define access rights according to UID(user ID) and GID(group ID) #security context for pod apiVersion: v1 kind: Pod metadata: name: security-context-demo spec: securityContext: runAsUser: 1000 fsGroup: 2000 volumes: - name: sec-ctx-vol emptyDir: {} containers: - name: sec-ctx-demo image: gcr.io/google-samples/node-hello:1.0 volumeMounts: - name: sec-ctx-vol mountPath: /data/demo securityContext: allowPrivilegeEscalation: false

#security context for container
apiVersion: v1
kind: Pod
metadata:
    name: security-context-demo-2
spec:
    securityContext:
        runAsUser: 1000
    containers:
    - name: sec-ctx-demo-2
        image: gcr.io/google-samples/node-hello:1.0
        securityContext:
        runAsUser: 2000
        allowPrivilegeEscalation: false #### Security Enhanced Linux (SELinux) define SELinux labels for specific container

securityContext:
    seLinuxOptions:
        level: "s0:c123,c456"

Running as privileged or unprivileged

define run as privileged or unprivileged

apiVersion: v1
kind: Pod
metadata:
    name: security-context-demo-4
spec:
containers:
    - name: sec-ctx-4
        image: gcr.io/google-samples/node-hello:1.0
        securityContext:
        privileged: true #### Linux Capabilities assign privileged rights to certain processes rather than root user

apiVersion: v1
kind: Pod
metadata:
    name: security-context-demo-4
spec:
    containers:
    - name: sec-ctx-4
        image: gcr.io/google-samples/node-hello:1.0
        securityContext:
        capabilities:
            add: ["NET_ADMIN", "SYS_TIME"] #### using sysctl safe sysctl can be directly used in pod, but to use unsafe sysctl, experimental-allowed-unsafe-sysctls needs to be activated in kubelet
# security context that belongs to safe sysctl:
kernel.shm_rmid_forced
net.ipv4.ip_local_port_range
net.ipv4.tcp_syncookies

#example of safe sysctl
apiVersion: v1
kind: Pod
metadata:
    name: sysctl-example
annotations:
    security.alpha.kubernetes.io/sysctls: kernel.shm_rmid_forced=1
spec:...

#example of unsage sysctl
apiVersion: v1
kind: Pod
metadata:
    name: sysctl-example
    annotations:
        security.alpha.kubernetes.io/unsafe-sysctls: net.core.somaxconn=65535 
spec:
    securityContext:
        privileged: true
    ... ### secret * Opaque：using base64 to save secrets, which can be decoded with base64 --decode. its safety is relatively weak

    # for pod that has access to secret data through volume
    kubectl create secret generic test-secret --from-literal='username=my-app' --from-literal='password=39528$vdg7Jb'
    # Run this in the shell inside the container
    echo "$( cat /etc/secret-volume/username )"
    echo "$( cat /etc/secret-volume/password )"

    # Define container environment variables using Secret data
    kubectl create secret generic test-secret --from-literal=username='my-app' --from-literal=password='39528$vdg7Jb'
    # in pod config yaml
    env:
    - name: SECRET_USERNAME
        valueFrom:
            secretKeyRef:
                name: test-secret
                key: username
    # or in pod config
    envFrom:
    - secretRef:
        name: test-secret
    # run this in shell inside the container
    kubectl exec -i -t envfrom-secret -- /bin/sh -c 'echo "username: $username\npassword: $password\n"'

kubernetes.io/dockerconfigjson：used to save docker registry authorization information

  kubectl create secret docker-registry atpdocker-credentials --docker-server=serverName --docker-username=User --docker-password=PW --docker-email=Email

kubernetes.io/service-account-token：it is used in serviceaccount. this token is created automatically by kubernetes when serviceaccout is created. The token will also be mounted to /run/secrets/kubernetes.io/serviceaccount when Pod is using this serviceaccount.

service

define accesss API for user, so that backend container is isolated from user

apiVersion: v1 kind: Service metadata: name: myservice spec: selector: app: myapp ports: - protocol: TCP port: 80 targetPort: 8080 name: myapp-http

service account

service account is used for process in pod to call kubernetes api and other external service. service account is only valid in its own namespace. Every namespace will create a default service account automatically, token controller will create secret for service account

Site Reliability Layers

Monitoring Layer / Metrics Server

monitoring running state of k8s

Scaling Layer / HPA

scaling the instances of your application up or down

Service Rules Layer

automate when / how your application should restart

other kubernetes

Rancher Labs (k3s)

rancher labs is a highly optimized iniature version of Kubernetes for the edge. it doesn’t compromise the API conformance and functionality.
it is a self-sufficient, encapsulated entity that runs almost all the components of a Kubernetes cluster, including the API server, scheduler, and controller

k3d

a program that run k3s in docker. there is also a vs code plugin available: https://github.com/inercia/vscode-k3d

Installation

https://k3d.io/#installation

usage

#create a cluster
k3d cluster create  
k3d cluster create mycluster --api-port 127.0.0.1:6445 --servers 3 --agents 2 --volume '/home/me/mycode:/code@agent[*]' --port '8080:80@loadbalancer'
k3d cluster create --config /home/me/myk3dcluster.yaml
#check nodes
kubectl get nodes
#check creations
k3d cluster|node|registry list

k0s

K0s packages a single binary for both amd64 and arm64 architectures. It does not require any host OS dependencies besides the kernel. the “zero” in k0s distribution as the company’s aspiration to provide a Kubernetes distribution with zero friction, zero dependencies, zero overhead, zero cost, and zero downtime

$ #Download, install, and start a k0s server
$ curl -sSfL k0s.sh | sh
$ k0s server$ #Create and add a worker node
$ k0s token create --role=worker
$ k0s worker <TOKEN>$ #Or quickly try it out in a Docker container anywhere
$ docker run -d --hostname controller --privileged -v /var/lib/k0s -p 6443:6443 k0sproject/k0s

Engine Yard Kontainers and Hitachi Kubernetes Service (HKS)

tools for kubernetes

Pinniped

cluster identity plugin for kubernetes

Carvel

a set of tools for app deployment on cluster

ytt using comments against YAML structures; customize imperatively using conditionals and loops in a python-like language called Starlark; overlay features allows the copy of the third-party config file to remain pristine and unmodified
kbld kbld looks for images within your config file, builds the images via Docker and pushes it to the registry of your choice
kapp CLI tool that calculates changes between your configuration and the live cluster state; and only applies the changes you approve
kontena lens

smart dashboard for kubernetes

monitoring

kube-prometheus(https://github.com/prometheus-operator/kube-prometheus) provides Grafana dashboard to monitor cluster health, z.B. Kubernetes API Servers and etcd. we can use Prometheus to collect time-series metrics and Grafana for graphs, dashboards, and alerts.

logging tools

scalable tools that can collect data from all the services and provide the engineers with a unified view of performance, errors, logs, and availability of components.

EFK Stack

#installation
helm install efk-stack stable/elastic-stack --set logstash.enabled=false --set fluentd.enabled=true --set fluentd-elasticsearch.enabled=true

PLG Stack (Promtail, Loki and Grafana)

Loki is designed in a way that it can be used as a single monolith or can be used as microservice.

Elasticsearch uses Query DSL and Lucene query language which provides full-text search capability.Loki uses LogQL which is inspired my PromQL (Prometheus query language). It uses log labels for filtering and selecting the log data
Both ELK and PLG are horizontally scalable but Loki has more advantages because of its decoupled read and write path and use of microservices-based architecture
compared with ELK Loki is an extremely cost-effective solution because of the design decision to avoid indexing the actual log data. Only metadata is indexed and thus it saves on the storage and memory (cache). #installation $ helm repo add loki https://grafana.github.io/loki/charts $ helm repo update $ helm upgrade –install loki loki/loki-stack –set grafana.enabled=true,prometheus.enabled=true,prometheus.alertmanager.persistentVolume.enabled=false,prometheus.server.persistentVolume.enabled=false

components

job

Jobs

CronJobs

Work queues

pod

lable

replication controller

service

node

master node

volume storage

In-tree Volume Plugin

Out-of-tree Provisioner

CSI (Container Storage Interface)

RPC (remote procedure call protocol)

PV and PVC

PV(Persisten Volume)

PVC(PersistentVolumeClaims)

Security Context

Discretionary Access Control

Running as privileged or unprivileged

service

service account

Site Reliability Layers

Monitoring Layer / Metrics Server

Scaling Layer / HPA

Service Rules Layer

other kubernetes

Rancher Labs (k3s)

k3d

Installation

usage

k0s

Engine Yard Kontainers and Hitachi Kubernetes Service (HKS)

tools for kubernetes

Pinniped

Carvel

kontena lens

monitoring

logging tools

EFK Stack

PLG Stack (Promtail, Loki and Grafana)

Stackdriver

CATALOG

FEATURED TAGS