Kubernetes cluster for manufacturing engineering: tale of an epic commissioning

kubernetes-docker-cluster

- Introduction

A quite big company that produces parts for some of the most important automotive industry companies was interested in a cloud-based system to monitor the overall efficiency of their production machines, analyze some key parameters and optimize the production activities scheduling.
Goal of the project was to connect eleven industrial manufacturing machines to the cloud, extract specific machine data and develop a web application for the visualization and management of such data, while guaranteeing information confidentiality and security.
Furthermore, one fundamental requirement of such system has been the bidirectional integration with the customer’s ERP system, in order to synchronize the production JOBs and manage their execution on the corresponding machines.
To fulfill such requirements, the project team has engineered and then implemented an hybrid edge-cloud solution in which the software has been packed into various containers that are orchestrated and managed, at the edge level, by a Kubernetes cluster.
This technology ensures an optimal load balancing between the available resources as well as a high availability in case of hardware or software failures.
While IT enterprises do not question the value of containerized applications anymore, the use of such kind of technologies within a manufacturing environment hasn’t been completely explored yet.
In the following paragraphs we will go into details on how we engineered and built the system despite all the difficulties we had to overcome.

- K8S & containers

The current trend of application development revolves around the concept of microservices: the idea is to split a big software that otherwise would be a monolithic blob into smaller independent software, all these components communicate with each other using standard protocols like MQTT or gRPC.
The next logical step people came up with was the necessity to decouple software from the hardware and operating system, the solution to this need is called containers.
Docker is the containerization software platform that managed (also thanks to the endorsement of companies like Red Hat, Amazon and IBM) to elevate itself from the crowd and became what can now be considered a standard de-facto in this field.
It allows to easily pack custom software into a “container image” which also contains all the required libraries and runtimes needed to execute the program.
Docker implements the so called OS-level virtualization paradigm, this enables containers to run on any operating system that supports Docker and doesn’t force the system administrator to install and manage all the libraries and dependencies required to run various software on the same “physical” machine.
This makes deployment and maintenance tasks easier and less prone to errors.
To manage multiple Docker containers various tools exist, the most famous of which is Docker Compose.
When the number containers grows and the need to use multiple physical servers arise Docker Compose does not cut it anymore: here is when orchestration and Kubernetes (a.k.a. K8S) comes into play.
Kubernetes is an open source software for automating deployment, scaling and management of containerized applications; the project was started by Google and is built on the experience acquired by running literally millions of containers for more than a decade.
Kubernetes takes the concept of containers to a completely new level by bringing in the features of a typical cluster to the container world: high availability, load-balancing, automatic scaling, centralized logging.

- HARDWARE ARCHITECTURE

The following schema describes how the various components of the system are logically linked and connected with each other; for the sake of simplicity a few minor details have been omitted.

hardware architecture

The cluster has to be physically close to the automation hardware (namely PLCs) because they interact directly with each other using Siemens S7 protocol, this forced us to pick rugged components that are capable of running reliably in very harsh environmental conditions (~40 C during the summer in a dusty environment).
Hardware is installed in 2 different racks, every computer and PLC sits in the same network and everything is connected by a few industrial network switches.
The only computer not part of the cluster is attached to a screen and is used as Andon; it also runs our teleservice software, this comes in very handy in case we need to perform maintenance tasks remotely on the cluster.
The hardware platform we have chosen is Siemens SIMATIC IPC127E, a very compact form factor, yet powerful, industrial computer that can be used as Internet of Things gateway.
SIMATIC IPC127E provides all the necessary features making it the perfect platform to build innovative solutions that are going to be installed in new or pre-existing production plants:

Ultra compact form factor with a volume of around 0.3 liters and integrated industrial power supply.
Built to run 24/7 with an ambient temperature of up to 50 C.
CPU Intel Atom quad core providing high performance together with a very low power consumption; other features are: system memory up to 4 GB of DDR4, 3x Gigabit Ethernet port, 4x USB 3.0 and DisplayPort.
11 years of hardware support.

Other that the hardware itself, another crucial component is the operating system.
Linux kernel based operating systems have been a valid choice for quite some time, the only downside is that, like with most open source software, there isn’t a company providing post sale support.
SIMATIC Industrial OS solves this issue, it is sold and supported directly by Siemens and is built around the requirement of being a lightweight yet reliable ground on which critical applications and software can be deployed efficiently and reliably.
It is based on Debian 9, can be easily installed using an USB drive which also includes an installation wizard and a system backup/restore tool.
During the wizard it is possible to install various software like GNOME and other lightweight desktop environments, Docker and other software that make the integration with SIMATIC products easier.
The operating system itself is covered by the same post sale policies and warranty of every other Siemens product.

- SOFTWARE ARCHITECTURE

The software is comprised of four macro-components, the first one is a cloud WEB application that is used by the plant personnel to monitor and interact with the machines.
In this particular case the WEB app is a NodeJS application built with Angular framework while the backend is written in Python.

web app node angular material design

The other three software macro-components, which are executed in edge, are:

Machine POD (one per physical machine)
Job Manager POD
Data Pushlisher POD

software architecture

What we have is a Machine POD for every physical machine, each of these is specifically tailored to interact with a particular machine using a certain communication protocol: some of the machines use Siemens S7, others expose an edge SQL database that can be used to send commands and query for the outputs, others use files to communicate with the external world.
In this regard, the microservices based architecture really shines because thanks to that it is really easy to adapt the software to every possible scenario.
Even more, to connect a new machine the only operation required is to spawn up another POD and in the worst case scenario write a custom PLC DATA GATEWAY component.
Everything else is just plug and play.
Job Manager POD is the most critical part of the system because it is the component that manage production JOBs’ execution, it uses a SQL database to store a list of production tasks; the synchronization between this database and the ERP system is delegated to a third party software.
All the production jobs are loaded into the system and can be executed on a specific machine from the cloud WEB application, the communication between JOB Orchestrator and JOB Agent of the specific machine is performed by calling a REST API.
Data Publisher POD is the component that glues everything together, it collects data from all the machine PODs and push them to the cloud so that they can be picked up and displayed by the cloud WEB application.

- SETTING UP KUBERNETES

Skipping the boring installation procedure of the OS let’s move directly to the interesting part: setting up Kubernetes.
The first step is removing Docker and anything related to it if we had it previously installed; generally speaking we want to run the same version of the software on every node of the cluster.


$ sudo apt-get remove docker docker-engine docker.io containerd runc
$ sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg2 \
    software-properties-common
$ curl -fsSL https://download.docker.com/linux/debian/gpg | sudo apt-key add -
$ sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/debian \
   $(lsb_release -cs) \
   stable"
$ sudo apt-get update
$ sudo apt-get install docker-ce docker-ce-cli containerd.io

Now it is time to install Kubernetes, this can also be done easily by adding the official repository to our system.


$ sudo apt-get install -y apt-transport-https curl
$ curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
$ cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
$ sudo apt-get update
$ sudo apt-get install -y kubelet kubeadm kubectl
apt-mark hold kubelet kubeadm kubectl

Instruct the kernel to send packets traversing the network bridge to iptables.
The above procedure has to be performed on every node of the cluster.
It might be worth looking into using an automated provisioning software to perform those tedious steps on multiple nodes.
I used a mix of bash scripts and Ansible Playbooks.

Setting up the Master

To setup the master, run the following commands as root:


$ kubeadm init
$ systemctl enable --now kubelet

Save the output of kubeadm init command, it contains the command to run on slaves to have them join the cluster (kubeadm join ...).
Since Kubernetes can be controlled only by a non root user (which is named edge in this case), setup your environment accordingly:


$ su edge
$ mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Last but not least, install a network plugin for Kubernetes; Flannel is the most battle tested one:


$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Join the cluster

On each slave node run the kubeadm join ... command you saved before; keep in mind that the cryptographic keys generated by the master are only valid for 48 hours, they can still be regenerated so it will always be possible to join more slaves.

Docker registry credentials

Kubernetes works by pulling images from a container registry, to access a private one it is required to create a secret which will contain the login credentials.


$ kubectl create secret <secret_name> --docker-server=<your_registry_server> --docker-username=<your_name> --docker-password=<your_password> --docker-email=<your_email>

Now that the cluster is running, let’s write two simple deployment script to setup two PODS.
First setup a MySQL Server deployment:

mysql.yml


apiVersion: v1
kind: Service
metadata:
  name: mysql
  labels:
    app: mysql
spec:
  type: NodePort
  ports:

    # MYSQL
    - port: 3306  # Internal port on which the MYSQL POD is listening to
      name: mysql # Name of this particular rule
      protocol: TCP
      targetPort: 3306 # Port mapped inside the cluster, can be used by other Deploymens' PODs to access the database server
      nodePort: 30006 # Port mapped outside the cluster, service can be access from any IP of every machine of the cluster

  selector:
    app: mysql

---

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mysql
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      
      volumes:
        
        # MSSQL - save persistent data (DB, logs) on disk
        # - name: mysql-data
        #   hostPath:
        #     path: /opt/persistent-data/mysql

      containers:

        # MYSQL
        - name: mysql # Name of the container
          image: mysql # No repo specified, image pulled from DockerHub
          # volumeMounts:
          #   - name: mysql-data
          #     mountPath: /var/lib/mysql
          ports:
          - containerPort: 3306
            hostPort: 3306 # Port mapped on the physical machine on which the container is running
          env:
            - name: MYSQL_ROOT_PASSWORD
              value: passwordpasswordpassword

The second deployment contains a shell with the tool nmap, a popular port scanner and security assesment command line utility.
This can be used to better understand how networking works inside the cluster.

nmap.yml


apiVersion: apps/v1
kind: Deployment
metadata:
  name: nmap
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nmap
  template:
    metadata:
      labels:
        app: nmap
    spec:

      containers:

        # NMAP
        - name: nmap # Name of the container
          image: securecodebox/nmap # No repo specified, image pulled from DockerHub

To run the deploymens copy the two files on the master node of the cluster (or run them from a workstation) and start them using kubectl:


$ kubectl apply -f nmap.yml
$ kubectl apply -f mysql.yml

To get into nmap container execute:


# Get the name of the PODs
$ kubectl get pods
# Get a shell into the specific container
$ kubectl exec -it <nmap_pod_name> --container nmap /bin/sh

I leave to the reader as homework to figure out how to save containers’ data persistently, this can be accomplished in various ways which depend on the specific platform where the Kubernetes cluster is physically deployed (GKE, EKS, on-premises, etc).
There is no golden rule when it comes to data persistence, though, a NFS volume hosted on a NAS might be a good solution for on-premises K8S clusters.
Another possibility are Persistent Volume Claims (PVC), both options are worth being explored.

- WHY K8S?

In our case Kubernetes brings a few huge advantages over running services on many different computers not part of a cluster:

High availability: imagine that a slave node is not working anymore because of an hardware (e.g. broken hard disk drive) or software failure (e.g. kernel panic).
Kubernetes detects that the node is down and reschedule all the deployments that were running on it to be executed on the other nodes.
Load balancing: resources usage is kept under control at all times, this means that if one node is overworked the cluster moves some of its PODs to the other nodes.
The system always tries to keep the nodes’ load under control and have all the services spawned on as many nodes as possible.
This keeps the average load balanced and roughly at the same level on all nodes to ensure that the services are always running in the best possible conditions.
Ease of management: it is possible to manage the whole cluster directly from a remote workstation, all it is required to do so it to have kubectl installed and have imported a few configuration files.
The easier a system is to manage, the less issues arise while administering it; having a system that generally just works means less downtime and increased added value for the end user/client.