Adventures of a Dogehttps://blog.chongya.ng/2021-09-18T19:00:00+01:00Managed Kubernetes on a Hobbyist Budget2021-09-18T19:00:00+01:002021-09-18T19:00:00+01:00C Shitag:blog.chongya.ng,2021-09-18:/managed-kubernetes-on-a-hobbyist-budget.html<p>For more than two years, I operated a <a href="https://blog.chongya.ng/running-a-personal-kubernetes-cluster-with-calico-connected-services-on-bare-metal.html">Kubernetes cluster for my personal workloads</a>, which was self-managed across the entire stack: bare-metal dedicated servers, KVM hypervisor, <code>kubeadm</code>-bootsrapped master and worker nodes from my own <code>cloudinit</code> images, <a href="https://metallb.universe.tf/">MetalLB</a> ingress, and <a href="https://www.gluster.org/">GlusterFS</a> distributed storage; eventually with the cluster <a href="https://blog.chongya.ng/running-a-low-cost-distributed-kubernetes-cluster-on-bare-metal-with-wireguard.html">spanning across data …</a></p><p>For more than two years, I operated a <a href="https://blog.chongya.ng/running-a-personal-kubernetes-cluster-with-calico-connected-services-on-bare-metal.html">Kubernetes cluster for my personal workloads</a>, which was self-managed across the entire stack: bare-metal dedicated servers, KVM hypervisor, <code>kubeadm</code>-bootsrapped master and worker nodes from my own <code>cloudinit</code> images, <a href="https://metallb.universe.tf/">MetalLB</a> ingress, and <a href="https://www.gluster.org/">GlusterFS</a> distributed storage; eventually with the cluster <a href="https://blog.chongya.ng/running-a-low-cost-distributed-kubernetes-cluster-on-bare-metal-with-wireguard.html">spanning across data centres of multiple hosting providers over WireGuard</a>.</p>
<p>This setup had worked very well for me throughout its lifetime, providing me with a unified interface for running personal workloads on the internet. However, keeping the cluster running was also no small feat: I have had to manually manage every level of the stack all the way down to physical servers, and if anything went down below the Kubernetes layer, recovery was also very manual.</p>
<p>While I am very comfortable maintaining and securing each layer of the stack by hand due to <a href="https://chongya.ng/#employments">what I do as a day job</a>, over time this exercise has become more and more wearing. Therefore I shifted my attention towards sourcing a managed Kubernetes service offering, which will continue to provide me with the flexibility of running all types of workloads as containers, but vastly reduce the time cost of maintaining it, without significantly elevating the money cost.</p>
<p><img alt="Finished production metrics overview" src="https://i.doge.at/uploads/big/809d38f60edd94e1be1260a993d9bb12.png"></p>
<p>In this article, I will go through in detail the research process to find the best managed Kubernetes offering for my requirements, the designs which shave as much off the infrastructure bill as possible, and the final product as a <a href="https://github.com/chongyangshi/budget-k8s">terraformed infrastructure on GCP</a>, which is open-source.</p>
<p><em>This blog post contains some opinions on various popular hosting and network service providers. As for all other posts on my blog, opinions expressed are exclusively my own.</em></p>
<h1>Selection principles</h1>
<p>For a managed Kubernetes service offering to be suitable for moving my personal infrastructure over, the Platform-as-a-Service (PaaS) provider would need to meet the following requirements:</p>
<ul>
<li><strong>Cost</strong>: Total running cost of the infrastructure must be reasonable for a personal project on a hobbyist monthly budget.</li>
<li><strong>Security</strong>: The cluster should work over private networking, and network access to the cluster control plane and worker nodes must not be open by default. This is irrespective of any application-level access controls.</li>
<li><strong>Reliability</strong>: Very occasional periods of unavailabilities can be tolerated if this significantly reduces regular running cost.</li>
<li><strong>Reproducibility</strong>: It should be possible to <a href="https://www.terraform.io/">terraform</a> the full infrastructure managed by the PaaS provider, and thus making recreating the infrastructure much easier in the event of an accident or provider breakdown.</li>
</ul>
<p>I will now discuss each factor in more detail below:</p>
<h3>Cost</h3>
<p>Commercial PaaS providers such as Amazon <a href="https://aws.amazon.com/">AWS</a> and Google <a href="https://cloud.google.com/">GCP</a> sell to organisations with a commercial cash flow; and their pricing practices reflect this: anyone working in the platform engineering type of jobs will have the experience of casually spinning up VM instances costing more than their monthly salary, since these costs ultimately facilitate commercial revenue for the organisation.</p>
<p>However, this model translates very poorly into the perspective of infrastructrue for personal projects, even when these personal projects only need a tiny fraction of the resources a typical commercial PaaS infrastructure requires to run. As an example, the AWS Elastic Kubernetes Service (EKS) <a href="https://aws.amazon.com/eks/pricing/">charges $0.1 an hour for the managed Kubernetes control plane</a> before any worker nodes are added. This translates to $72 a month before taxes, which would have been a tiny fraction of a typical company's PaaS infrastructure bill; but would be completely unreasonable for a personal budget financed out of our own pockets, before any workloads running on it is even considered.</p>
<p>In general, any managed services with high standing charges (costs incurred before any actual workload usage) will require a workaround or an alternative solution from the same provider. Once potential providers with high standing charges that are unavoidable have been discounted, we will still need to contend with high usage costs:</p>
<ul>
<li>CPU resources can be fairly expensive, and we need to explore any excess capacity discount options ("preemptible" or "spot" instances) available, measuredly trading off reliability for cost reductions. Some providers also offer shared-CPU options, but since Kubernetes will treat all logical CPU resources as allocatable, running the cluster on shared cores often leads to aggressive throttling or heavy CPU steals from the hypervisor. Therefore using shared cores while offering a substantial discount could have a profound negative impact on reliability.</li>
<li>Egress traffic costs can be very expensive, since it is often a significant source of revenue for major PaaS providers; and for a personal project, billing alerts should be set up to detect a run-away billing scenario before it becomes disastrously expensive. </li>
<li>Data transfer cost is often charged for traffic between internal network resources, if they are located in different availability zones, or between different managed services. We need to avoid incurring these in our infrastructure design as much as possible.</li>
</ul>
<p>It also goes without saying that Kubernetes is almost never the most cost-effective option for running personal workloads at a small scale, or even for workloads that are already containerised (think AWS Fargate). Personally I will always need a live Kubernetes cluster to test some Kubernetes-related personal projects on, and thus a reasonably-priced cluster can always fit in my hobby budget. To simply run containerised or non-containerised workloads, there are far cheaper options on the internet.</p>
<p>For the remainder of this article, I will base cost calculations around PaaS resources actually consumed by the workloads in my previous self-managed cluster setup, which is just under <strong>8 vCPUs (hardware threads) and 16GB of RAM</strong>.</p>
<h3>Security</h3>
<p>There are two classes of components in a managed Kubernetes cluster:</p>
<ul>
<li>The cluster <strong>control plane</strong>, sometimes called the master nodes, which generally runs in a virtual network that is fully managed, and not under our direct control. The Kubernetes API endpoint of the control plane however has to be exposed to the user in some way to allow the cluster to be managed, and how this is implemented by different providers has significant security implications.</li>
<li>The <strong>worker nodes</strong>, which are virtual machines that are generally under the user's direct control, albeit normally assisted by the provider's automatic provisioning and scaling features. They run workloads according to instructions from the cluster control plane's API endpoint.</li>
</ul>
<p>Most commercial users find exposing the Kubernetes control plane on the internet unacceptable for production use, for both practical security and compliance reasons. There has been <a href="https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=kubernetes">a constant stream of vulnerabilities</a> affecting master nodes and the control plane endpoint, and it is not wise to expect the managed Kubernetes provider to be able to patch the control plane before your cluster is impacted by a critical zero-day vulnerability. After all, with botnet-controlled scanners keeping a tight watch on publicly-accessible Kubernetes control planes exposed on TCP 443 all over the internet, attackers can always exploit a critical vulnerability faster than you can patch them. </p>
<p>Depending on the provider, the worker nodes either talk to a private cluster control plane endpoint using their private IPs within a "Virtual Private Cloud" (VPC) network; or to a publicly-accessible control plane endpoint, after travelling a short distance over the internet using public IP addresses assigned to each worker node. Some providers implement both options and the choice belongs to us, but the default is often the less-secured public network option.</p>
<p>Within the design of Kubernetes, it is completely unnecessary for the control plane and the worker nodes in a Kubernetes cluster to communicate over the internet, and running Kubernetes worker nodes with any public IP address assigned at all remains a poor security design even with PKI-based authentication and encryption: in addition to workloads on all worker nodes generating egress traffic from arbitrary IP addresses, provisioning worker nodes with a public IP often causes them to become a hard dependency for the control plane endpoint to remain publicly-accessible.</p>
<p>Furthermore, with worker nodes having public IPs, any <code>NodePort</code> or <code>LoadBalancer</code> Service definitions will automatically expose a backend application on the internet, unless prevented by a firewall rule, which is often not enabled by default. Even where it is enabled by default, usability designs often trump security concerns. For example, in DigitalOcean's offering, the instance firewall will <a href="https://docs.digitalocean.com/products/kubernetes/resources/managed/#worker-node-firewalls">automatically open any port</a> that is allocated to a <code>NodePort</code> Service, unless explicitly opted out by the user using an annotation on the Service. It is not a great argument that when a user creates a <code>NodePort</code> Service, they intend for the Service to be publicly accessible from anyone on the internet. The opposite often happens unintentionally, such as when applying Helm charts with poor defaults, and can lead to the user accidentally exposing unsecured workloads to the internet which were intended to be internal-facing.</p>
<p>All things considered, I would only choose managed Kubernetes offerings where there is an option for the control plane to be accessible only over the private VPC network and specifically authorized public IP ranges (such as personal VPN ranges or a home IP). Additionally, the Container Network Interface (CNI) needs to support enforcing <a href="https://kubernetes.io/docs/concepts/services-networking/network-policies/">Network Policies</a> or a CNI-specific equivalent, in order to provide additional isolation for traffic within the cluster network.</p>
<p>Beyond network access controls as the primary concern, many providers also offer other security features, such as managed encryption for <a href="https://kubernetes.io/docs/concepts/configuration/secret/">Kubernetes Secrets</a> stored at rest and virtual machine disks (both minor concerns given the hops through which an attacker needs to jump to access the raw data); traffic logging and access auditing; RBAC access integration; and system integrity protection. They are not essential features for personal workloads, and the principle in deciding whether to enable these features largely depends on their costs versus benefits. For example, flow logs and audit logs are generally very pricy; and if my personal cluster without other people's data is hacked, being able to know who did it and what was taken would likely not be worth the storage and processing fees for maintaining such logs.</p>
<h3>Reliability</h3>
<p>For personal workloads which have no strict uptime or reliability requirements, a managed Kubernetes cluster hosting them requires fewer guarantees than provided by the high-availability options of many PaaS providers.</p>
<p>Some providers divide regions into multiple availability zones (AZs) powered by separate physical data centres, which is an essential feature for businesses requiring uptime guarantees during rare disaster scenarios which can affect entire data centres -- even a major hosting provider with significant resources can have a <a href="https://www.reuters.com/article/us-france-ovh-fire-idUSKBN2B20NU">data centre catch fire once in a while</a>. </p>
<p>For personal workloads however, I would rather host all resources in a single AZ and accept the risk: dozens of gigabytes of traffic are generated each month <a href="https://kubernetes.io/docs/concepts/architecture/control-plane-node-communication/">simply by the Kubernetes control plane talking with its worker nodes</a>, and most providers whose managed Kubernetes offering can run over multiple AZ also charge for every gigabyte of traffic sent <em>between</em> these AZs. Additionally, in GCP's case, Google Kubernetes Engine only <a href="https://cloud.google.com/free/docs/gcp-free-tier/#kubernetes-engine">waives the cluster control plane fee</a> when the managed cluster control plane runs over a single AZ ("Zonal"). Thankfully, the blast radius when distributing workloads and data in a single AZ is already significantly better than hosting all application and data on a single self-managed server.</p>
<p>Another trade-off between reliability and cost is related to the managed load balancers available from each PaaS provider, which are often directly integrated with a custom ingress controller in the managed Kubernetes control plane. These integrations automatically create load balancers based on <code>Service</code> or <code>Ingress</code> specifications configured by us in Kubernetes. The resulting managed load balancers are generally designed to be automatically scalable for processing and forwarding hundreds or thousands of requests per second, which is way over-kill for personal projects. </p>
<p>Each managed load balancer tends to cost tens of dollars a month just on the standing charges, which becomes a significant cost barrier for personal workloads, for which different projects sharing the same cluster tend to require separate internet-facing endpoints. In the old integration model, each endpoint would require a separate load balancer, but for HTTP/HTTPS ingress, many providers are now offering custom controllers which can route ingress traffic for different backend services over the same Layer 7 load balancer. For example the <a href="https://aws.amazon.com/about-aws/whats-new/2020/10/introducing-aws-load-balancer-controller/">AWS Load Balancer Controller</a> for their Elastic Kubernetes Service (EKS). However, even if all our ingress workloads are HTTP/HTTPS-based and can therefore share a single Layer 7 load balancer, it will still cost at least <a href="https://aws.amazon.com/elasticloadbalancing/pricing/">$25 a month on AWS</a> and <a href="https://cloud.google.com/vpc/network-pricing#lb">at least $20 on GCP</a>; not to mention data processing fees charged per gigabyte, which turns free ingress traffic into billable usage.</p>
<p>Instead, if the cost of running managed load balancers provided by the PaaS provider will become a significant part of our monthly bill, we will have to run our own ingress instance using a low-cost virtual machine with a static public IP attached. This instance will then be responsible for routing all traffic to applications intended to be exposed to the internet, via the internal network through a <code>NodePort</code>, or for some providers via Pod IPs with VPC-native networking.</p>
<h3>Reproducibility</h3>
<p>As discussed in the previous section, we will trade off some high-availability features in our design to reduce its running cost. If a disaster does happen, either due to circumstances beyond our control (such as fire or blood) or due to our accidental mishap, we don't want to have to spend a huge amount of time re-building the infrastructure.</p>
<p>Thankfully, most PaaS providers with managed Kubernetes offerings also have a <a href="https://www.terraform.io/docs/language/providers/index.html">Terraform Provider</a> available either maintained by themselves or Hashicorp. This allows us to define the infrastructure for personal projects as code, which makes its configurations more reusable and allows us to quickly spin all the PaaS components back up if we ever lose it.</p>
<h1>Choosing a platform</h1>
<p>With the above principles in mind, I set out to study the pricing model and available features for each of the popular PaaS providers with a managed Kubernetes offering:</p>
<ul>
<li>AWS <a href="https://aws.amazon.com/eks">Elastic Kubernetes Service</a> (EKS)</li>
<li>GCP <a href="https://cloud.google.com/kubernetes-engine">Google Kubernetes Engine</a> (GKE)</li>
<li>Microsoft <a href="https://azure.microsoft.com/en-gb/services/kubernetes-service/">Azure Kubernetes Service</a></li>
<li>Linode <a href="https://www.linode.com/products/kubernetes/">Kubernetes Engine</a> (LKE)</li>
<li>DigitalOcean <a href="https://www.digitalocean.com/products/kubernetes/">Managed Kubernetes</a></li>
<li>OVHcloud <a href="https://www.ovhcloud.com/en-gb/public-cloud/kubernetes/">Managed Kubernetes Service</a> (OVH)</li>
</ul>
<h3>AWS EKS</h3>
<p>The managed Kubernetes offering from AWS can be populated with <a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances.html">EC2 Spot Instances</a> to bring dedicated CPU cores down to an affordable price for a small cluster on a personal budget, with preemptions relatively rare; but this is pretty much the only thing associated with AWS EKS that can be managed on a personal budget.</p>
<p>While traffic within the same availability zone are free, it is mandatory to run EKS over two availability zones, thus a decent amount of cross-AZ traffic cost is generated just from the cluster's internal background traffic. In terms of standing charges, the cluster management fee is not waivable at $72 a month, in addition to the standing cost of a NAT Gateway instance at $36 per availability zone if we want to avoid public networking for worker nodes. </p>
<p>Due to the high standing charges which is unique among our options, <strong>it is infeasible to run an AWS EKS cluster on a personal budget.</strong></p>
<p>This is quite a shame, as AWS has implemented cluster security fairly robustly: the cluster control plane can be configured to only use private network to talk to worker nodes, and the control plane supports IP restrictions when made accessible from the internet. Managed encryption via AWS KMS is supported for Kubernetes Secrets and disks at a relatively low cost; and RBAC integration with IAM is built-in. Logging is optional at additional cost via AWS CloudWatch Logs.</p>
<h3>GCP GKE</h3>
<p>As the original author of Kubernetes, Google have put a fair amount of effort into building a managed Kubernetes product that is mature in its core features. The pricing model of GCP is also more accessible to a personal budget than that of AWS:</p>
<ul>
<li><a href="https://cloud.google.com/compute/docs/instances/preemptible">Preemptible VM Instances</a> making dedicated CPU resources relatively affordable with rare preemptions.</li>
<li>Cluster management fee is <a href="https://cloud.google.com/free/docs/gcp-free-tier/#kubernetes-engine">waived on one single-AZ cluster per account</a>; before mid-2020 this was free for all clusters types. In any case we want to avoid any cross-AZ traffic cost, so we only want to use a single-AZ cluster anyway.</li>
<li>Low standing charge for NAT Gateways, at the cost of only a few dollars a month for a gateway handling little traffic.</li>
<li>There does not seem to be a noticeable cost in the cluster's integrated logging with StackDriver.</li>
<li>GCP participates in Cloudflare's <a href="https://www.cloudflare.com/bandwidth-alliance/">"Bandwidth Alliance"</a>, and offers a <a href="https://cloud.google.com/network-connectivity/docs/cdn-interconnect">discount on egress traffic fronted by Cloudflare</a> and some other CDNs. For traffic exiting EU regions this is down from anywhere between $0.085 and $0.12 per GB to $0.05 per GB. This is still very pricy, but for personal budgets, any reduction in egress pricing is helpful.</li>
<li>Managed encryption via GCP KMS is available for Kubernetes Secrets and disks, at the cost of a handful of dollars a month in a small cluster. </li>
</ul>
<p>One area where GCP's standing cost is higher than desired for a personal budget is the managed load balancers. The standing charge for up to five endpoint hostnames sharing a Layer 7 load balancer is around $20 a month, which is very expensive for the little traffic it will process. And additional costs are payable if you need separate Layer 4 load balancers or need to terminate more than five endpoint hostnames for your personal projects. To make it cheaper at the cost of reliability, we will need to configure and run a self-managed ingress load-balancing instance to forward traffic to the cluster.</p>
<p>GCP implements robust cluster security features: the control plane and the worker nodes can talk over the private network; and while control plane access is not yet integrated with their Identity-Aware Proxy, source IP restrictions can be applied to accessing the control plane from the internet. Worker node system integrity protection and secure boot are available via <a href="https://cloud.google.com/kubernetes-engine/docs/how-to/shielded-gke-nodes">Shielded Nodes</a> for free. <a href="https://github.com/google/gvisor">Container sandboxing via gVisor</a> is also available for free, but it will disable hardware hyper-threading to mitigate related hardware vulnerabilities, hence reducing allocatable computing resources by half. Other advanced security features at additional costs include binary authorisation and memory encryption ("Confidential Workers").</p>
<h3>Azure AKS</h3>
<p>Microsoft's managed Kubernetes offering supports node pools with <a href="https://docs.microsoft.com/en-us/azure/aks/spot-node-pool">spot instances</a>, which brings the price of preemptible dedicated CPU instances to a comparable level with AWS and GCP. However, in Azure AKS the spot instance pool cannot serve as the default instance pool for the cluster, despite the fact that whether workloads can actually be scheduled on available worker nodes has no bearing on the control plane's health.</p>
<p>Therefore at least one permanent instance must be scheduled if using AKS, and we have the option of either running an expensive persistent worker node instance with dedicated CPU resources, or using a cheaper, smaller worker node shape which basically cannot run any workloads.</p>
<p>In terms of other standing charges, five endpoint hostnames sharing a Layer 7 load balancer is around $20 a month just like GCP. But Azure also has high standing charges for NAT Gateways: starting at $32 a month. </p>
<p>While AKS does offer decent security options such as <a href="https://docs.microsoft.com/en-us/azure/aks/private-clusters">private network clusters</a> and <a href="https://docs.microsoft.com/en-us/security/benchmark/azure/baselines/aks-security-baseline?context=/azure/aks/context/aks-context">control plane network access restrictions</a>, high standing charges from the default worker node pool and the NAT Gateway means <strong>it is infeasible to run an Azure AKS cluster on a personal budget.</strong></p>
<h3>Linode LKS</h3>
<p>Linode is one of the oldest providers of virtual private servers (VPS's), pre-dating the PaaS market; and their all-inclusive pricing model has been well-liked by personal users and small business customers. While Linode have since stepped into a more PaaS-style product strategy to compete with more recent entrants into the market, their managed Kubernetes offering continues with their pricing-focused selling strategy by including <a href="https://www.linode.com/pricing">a very generous egress traffic allowance</a> for each worker node instance launched in the cluster. They also charge no cluster management fee.</p>
<p>Linode charges $20 a month for each instance of 2 vCPUs and 4GB of RAM if using shared CPU cores, or $30 if using dedicated CPU cores. Unlike the aforementioned AWS, GCP, and Azure options, these prices are for persistent instances, which removes the potential downtimes we could suffer occasionally if using preemptible VMs from one of the major PaaS providers. Additionally, reasonable sizes of instance boot disks are included in the price.</p>
<p>Their load balancers (called "NodeBalancers" with Kubernetes controller integration) each costs a fixed monthly price of $10, but <a href="https://www.linode.com/docs/guides/getting-started-with-load-balancing-on-a-lke-cluster/">does not support Layer 7 connection sharing</a>. It is however possible to run one load balancer on Layer 4 mode fronting a Layer 7 reverse proxy like <a href="https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/">Nginx</a> or <a href="https://traefik.io/traefik/">Traefik Proxy</a>, which will also terminate TLS.</p>
<p>The main drawback of using Linode's managed kubernetes offering is network security: as far as I can tell there is currently no way to apply a source IP restriction on the control plane endpoint exposed on the internet. On the worker node side, keeping inter-node private IP communications fully private also requires <a href="https://www.linode.com/community/questions/11484/top-tip-the-linode-private-networkip-is-not-private-at-all">configuring a firewall feature that is not enabled by default</a> (albeit fairly easy to configure and turn on). On balance of these factors, I'm not happy with the security model of Linode LKS.</p>
<h3>DigitalOcean Managed Kubernetes</h3>
<p>Having scaled up dramatically over the past few years with lots of venture capital funding, DigitalOcean is now the main competitor to Linode in the personal and small business VPS market. Like Linode, they too offer a managed Kubernetes service with a similar pricing model:</p>
<ul>
<li>Free cluster management.</li>
<li>2 vCPUs and 4 GB of RAM cost $20-$24 on shared cores, or $40 on dedicated CPU cores. Boot disk storage is included.</li>
<li>No preemptible or "spot" instances available.</li>
<li>A generous free egress allowance per worker node instance that is similar to Linode.</li>
<li>A "small" load balancer costs around $10 a month, which can front a Layer 7 reverse proxy like Nginx or Traefik Proxy running in the cluster.</li>
</ul>
<p>On pure pricing terms, DigitalOcean is slightly more expensive than Linode for both shared and dedicated CPU options, but their pricing models are otherwise highly comparable, which is to be expected given their state of competition. </p>
<p>DigitalOcean unfortunately seems to carry the same security design as Linode for the control plane endpoint exposed on the internet: there is no way to apply a source IP restriction for the public endpoint. Their worker node firewall model is better automated than Linode, but as mentioned earlier would be ideal not to <a href="https://docs.digitalocean.com/products/kubernetes/resources/managed/#worker-node-firewalls">automatically open any port</a> of a <code>NodePort</code> or <code>LoadBalancer</code> Service by default. On balance of all these factors, I'm also not happy with DigitalOcean's security model.</p>
<h3>OVHcloud Managed Kubernetes</h3>
<p>OVH is among a number of French and German providers traditionally providing low-cost virtual private servers and dedicated servers. Many of these providers have pivoted into PaaS-style offerings, with OVH branding theirs as "OVHcloud", and they have also been quick to build a managed Kubernetes offering. Their pricing model is somewhat similar to Linode and DigitalOcean:</p>
<ul>
<li>Free cluster management.</li>
<li>2 vCPUs and 7 GB of RAM (minimum) costs around $29.2 on dedicated cores, or 2 vCPUs and 4GB of RAM of their "Discovery" instances with shared cores for around $12.5. Boot disks are included in the price.</li>
<li>No preemptible or "spot" instances available.</li>
<li>Egress is completely free in most regions at a bandwidth sufficient for hosting personal projects.</li>
<li>An load balancer costs around $16 a month, which can front a Layer 7 reverse proxy like Nginx or Traefik Proxy running in the cluster.</li>
</ul>
<p>Prices for dedicated cores on OVH (even the non-computing-optimised ones) is slightly cheaper than Linode and somewhat more so than DigitalOcean, but still broadly similar. On the security front, OVH supports private IPs for nodes, but according to their control plane, even with private IPs enabled "the public IPs of these nodes will be used exclusively for administration/linking to the Kubernetes control plane", and Pod networking appears to use the deprecated <a href="https://github.com/gravitational/wormhole">Gravational wormhole</a>. Source IP restriction is however available on the internet-facing control plane endpoint. This is the important security feature which Linode and DigitalOcean have not implemented.</p>
<h3>The choice</h3>
<p>After turning over the pricing and security models of six providers with managed Kubernetes offerings, two viable candidates have emerged: GCP GKE and OVHcloud Managed Kubernetes. </p>
<p>To achieve the level of computing resources required (8 vCPUs and 16GB of RAM) in London (or as close to London as possible), using preemptible instances on GCP works out to be a little more expensive than using persistent instances with shared-CPU resources on OVH, depending on the user's sales tax status. The price difference primarily accounts for storage and egress traffic costs, both of which are free on OVH and are a few dollars extra on GCP. Because OVH instances are persistent, they theoretically offer better reliability guarantees than preemptible instances on GCP, but the OVH option will also involve shared-CPU instances with variable performance.</p>
<p>Their managed load balancers are similarly priced, and the option to use a small, self-managed persistent instance as ingress proxy works out to be cheaper than a managed load balancer on both platforms. For a personal project, both platforms check the same security boxes I need: private network clusters and control plane network access restrictions. GCP offers better integrated managed encryption and logging solutions, but these are of little consequence in this use-case.</p>
<table>
<thead>
<tr>
<th>Managed Kubernetes Provider</th>
<th>GCP GKE (London)</th>
<th>OVH</th>
<th>Linode LKE</th>
<th>DigitalOcean</th>
</tr>
</thead>
<tbody>
<tr>
<td>Estimated total monthly cost* with persistent^ VMs</td>
<td>$266</td>
<td>$130</td>
<td>$137</td>
<td>$180</td>
</tr>
<tr>
<td>Estimated total monthly cost* with preemptible^ VMs</td>
<td>$83`</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td>Estimated total monthly cost* with shared-CPU VMs</td>
<td>N/A+</td>
<td>$62</td>
<td>$97</td>
<td>$100</td>
</tr>
<tr>
<td>Meets my security requirements</td>
<td>Yes</td>
<td>Partially</td>
<td>No</td>
<td>No</td>
</tr>
<tr>
<td>Cluster over multiple AZs in the same region</td>
<td>Supported but not used</td>
<td>Likely single-DC</td>
<td>Likely single-DC</td>
<td>Likely single-DC</td>
</tr>
<tr>
<td>Reproducible infrastructure with Terraform</td>
<td>Yes (by Hashicorp)</td>
<td>Yes (provider-maintained)</td>
<td>Yes (provider-maintained)</td>
<td>Yes (provider-maintained)</td>
</tr>
</tbody>
</table>
<p><em>Footnotes</em>:</p>
<ul>
<li>*: <em>including any separately-billed costs for storage and realistic egress usage</em></li>
<li>^: <em>only vitual machines which has access to dedicated CPU resources when running, shared-CPU options listed separately</em></li>
<li>`: <em>using a custom shape of 2/4 vCPUs + 4/8 GB of RAM on N2D instance type</em></li>
<li>+: <em>GCP offers shared-core VMs but will reduce the amount of allocatable CPUs by half, and therefore impractical to use</em></li>
</ul>
<p>After considering both options, <strong>I decided to go with GCP GKE</strong> despite the potential reliability concerns in using preemptible instances. This is because the minimal reliability requirements of personal workloads would allow me to take advantage of some cost savings, even with realistic storage and egress traffic consumptions considered. OVH provides better value if using the shared CPU option, but their control plane features look somewhat less mature, and private networking is not fully supported within the cluster, with the deprecated <a href="https://github.com/gravitational/wormhole">wormhole</a> as the overlay network with a <a href="https://docs.projectcalico.org/getting-started/kubernetes/flannel/flannel">"canal"</a> (Calico & Flannel) setup.</p>
<p>If Google decides to change their pricing model in the future and no longer waive cluster management fees on a single-AZ cluster, or starts to actually preempt my instances more often, there is always the option to move to OVH (or Linode / DigitalOcean if they opt to implement better network security).</p>
<h1>Infrastructure Design</h1>
<p>With cost reduction, security, and reliability prioritised accordingly, I arrived at a design as shown in the diagram below:</p>
<p><img alt="final GCP infrastructure design" src="https://i.doge.at/uploads/big/ba2364096616174794f0e1be4d1b9e18.png"></p>
<h3>Network and cluster layouts</h3>
<p>The GKE Kubernetes control plane runs in a GCP-managed VPC within a single AZ, which is connected to the primary cluster subnet with <a href="https://github.com/chongyangshi/budget-k8s/tree/main/terraform/base">all of its worker nodes</a> in the same AZ via an automatic peering connection. They communicate through private VPC networking. This setup both waives the cluster management fee on GCP and ensures that we pay no cross-AZ data cost, as the control plane and worker nodes are in the same AZ at all times. In the rare event an entire AZ does go down, everything will be unavailable temporarily. This mode of failure is similar to managed Kubernetes providers whose PaaS platfroms run in single-data-centre regions.</p>
<p>Worker nodes are distributed between two worker pools, running two of <code>n2d-custom-2-4096</code> preemptible instances (2 vCPUs and 4GB RAM) and one of <code>n2d-custom-4-8192</code> (4 vCPUs and 8GB RAM) preemptible instance respectively to meet my resource consumption and shape requirements. By distributing preemptible instances across two node pools, we attempt to reduce the likelihood of simutaneous preemptive terminations somewhat.</p>
<p><a href="https://kubernetes.io/docs/concepts/workloads/pods/">Pods</a> form the lowest-level network primitive within Kubernetes, and they are allocated <a href="https://cloud.google.com/kubernetes-engine/docs/concepts/alias-ips">VPC-native</a> IP addresses from a secondary IP range of the cluster subnet by the Container Network Interface (CNI) integrated in GKE. <code>ClusterIP</code> <a href="https://kubernetes.io/docs/concepts/services-networking/service/">Services</a> routed by <code>kube-proxy</code> are allocated native IP addresses in another secondary IP range of the same subnet. This setup means that Pod IPs selected by Services in the cluster can be reached from anywhere in the VPC directly -- even from outside the worker nodes, as long as the VPC firewall rules and Kubernetes <a href="https://kubernetes.io/docs/concepts/services-networking/network-policies/">Network Policies</a> allow such traffic.</p>
<p>As no VM instance or Kubernetes Pod in the cluster network has a public IP for reasons of good security practice, they cannot originate traffic to the internet -- such as to pull Docker images from a public registry. Instead, we need a <a href="https://cloud.google.com/nat/docs/overview">managed NAT Gateway</a> for the VPC, which conveniently only costs a few dollars on GCP including the standing charge and the anticipated low volume of egress traffic. It appears that the static IP assigned to the NAT gateway does not incur any costs, and it makes the egress IP from the cluster predictable for authorising access elsewhere.</p>
<h3>HTTP/HTTPS ingress</h3>
<p>Due to the high standing charge of a managed load balancer on GCP, my design makes no use of the native load balancing integration in GKE. Instead, we place a <a href="https://doc.traefik.io/traefik/">Traefik Proxy</a> ingress controller <em>outside</em> the cluster and in a self-managed, persistent Google Compute Engine instance, using an economical shape of <code>e2-micro</code>. This takes advantage of the following facts:</p>
<ul>
<li><a href="https://cloud.google.com/kubernetes-engine/docs/concepts/alias-ips">VPC-native</a> cluster networking means that a non-Kubernetes VM which has both a public IP reachable from the internet and a private IP, can forward traffic to Pod IPs of front-end services running in the cluster (without going through a NodePort), as long as such traffic is allowed under VPC firewall rules.</li>
<li>The GKE control plane exposes a private network interface in the designated master node range (by default <code>172.16.0.2</code>), and this is also reachable from anywhere in the VPC as long as <a href="https://cloud.google.com/kubernetes-engine/docs/concepts/private-cluster-concept#overview">the control plane's authorized networks list includes the source private IP range</a>. This allows us to run ingress <a href="https://kubernetes.io/docs/concepts/architecture/controller/">controllers</a> outside the cluster, as long as they hold <a href="https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/"><code>ServiceAccount</code></a> credentials issued from within the cluster with appropriate permissions.</li>
<li>We can reserve a <a href="https://cloud.google.com/compute/docs/ip-addresses/reserve-static-external-ip-address">static public IP address</a> from GCP for a few dollars a month, and reassign it to the new instance after we destroy an old one. The ingress instance can hence keep the same IP even if we need to recreate it from time to time, removing any need for a dynamic DNS service.</li>
<li>We can produce a (mostly) reproducible operating system setup for the ingress instance by specifying a fixed image "generation" and a <a href="https://cloud.google.com/compute/docs/instances/startup-scripts/linux">start-up script</a>.</li>
</ul>
<p>Based on some existing works by others (linked in code), I created a <a href="https://github.com/chongyangshi/budget-k8s/tree/main/terraform/ingress/instance_resources">systemd setup</a> for running the Traefik Proxy binary in the persistent ingress instance outside the GKE cluster. The setup process is <a href="https://github.com/chongyangshi/budget-k8s/blob/main/terraform/ingress/instance_resources/bootstrap.sh">scripted</a> to make it as reproducible as possible via Terraform, and replacing the instance is as simple as re-applying Terraform. An <a href="https://cloud.google.com/compute/docs/instance-groups">instance group</a> with a launch configuration is not used, as there is no easy way to assign a single static IP to instances managed by an instance group (we are supposed to use a managed load balancer to achieve that, which defeats the purpose of saving cost).</p>
<p>Configurations through the <a href="https://registry.terraform.io/providers/hashicorp/kubernetes/latest/docs">Terraform Kubernetes Provider</a> will automatically load minimally-privileged credentials from the managed cluster into the ingress instance for use by the Traefik Proxy runtime. Traefik Proxy will then continuously watch the cluster control plane via the private VPC network for <code>Ingress</code> objects created in its designated <code>IngressClass</code>, and set up hostname-based traffic forwarding paths accordingly for all front-end applications intended to be accessible from the internet.</p>
<p>For each front-end application, its intended external-facing hostnames are registered with Traefik using the <code>hostname</code>s specified in their <code>Ingress</code> rules. Using the standard router configuration, Traefik will then terminate and forward TLS traffic intended for hostnames registered to each <code>Ingress</code>, by automatically issuing <a href="https://letsencrypt.org/how-it-works/">Let's Encrypt ACME</a> certificates. These certificates are trusted by most clients (even though it is better to also put a CDN with TLS support in front of the endpoints for cost and security reasons), and their issuances are validated by <a href="https://doc.traefik.io/traefik/https/acme/0">Traefik automatically redirecting HTTP challenge requests to an endpoint it manages internally</a>.</p>
<p><img alt="Using a generated service proxy to avoid exposing sensitive backend Kubernetes Secrets to Traefik due to controller namespace access requirements" src="https://i.doge.at/uploads/big/386c9e5d976d9aad92b2524320d2f725.png"></p>
<p>Due to some <a href="https://github.com/traefik/traefik/issues/7097">stubborn design limitations of Traefik Proxy's controller</a> with regarding to storing and accessing existing TLS secrets backed by Kubernetes secrets, all namespaces where <code>Ingress</code> objects are watched by Traefik must allow Traefik's <code>ServiceAccount</code> to read all Kubernetes Secrets within them, even if Traefik has no business in doing so (for example, when it already provisions TLS certificates using ACME internally). Since Kubernetes Pods can only mount Secrets within their own namespace, and in our setup Traefik runs in an internet-facing instance outside the cluster, this constraint significantly weakens the security of any Kubernetes Secrets intended for use by backend services if they run in the same <code>ingress</code> namespace.</p>
<p>It is possible to target a Kubernetes <code>Service</code> at an arbitrary hostname using the service type <a href="https://kubernetes.io/docs/concepts/services-networking/service/#externalname"><code>ExternalName</code></a>. One might attempt to target an internal <code>Service</code> hostname such as <code>secret-api.another-namespace.svc.cluster.local</code> using an <code>ExternalName</code> service in the <code>ingress</code> namespace, and exposing that to Traefik. However, due to how Kubernetes networking works, Services cannot be reached from outside the cluster through VPC networking -- they are, after all, just <code>kube-proxy</code> iptables forwarding rules on Kubernetes nodes. </p>
<p>To solve this, I implemented a light-weight service proxy for forwarding traffic from Traefik to sensitive backend services inside other namespaces in the cluster, by <a href="https://github.com/chongyangshi/budget-k8s/tree/main/terraform/ingress/service_proxy">generating NGINX deployments as a Layer 4 proxy</a> each targeting a specific backend service. This service proxy is abstracted through an <a href="https://github.com/chongyangshi/budget-k8s/blob/main/terraform/ingress/gke_ingresses.tf.example">easy-to-use module</a>, which only provisions a service proxy (instead of having Traefik target front-end service Pods directly) if the target namespace differs from the front-end <code>ingress</code> namespace. </p>
<h3>Managing the cluster</h3>
<p>As configured, the cluster on GCP has both a public and a private endpoints for the control plane. <a href="https://cloud.google.com/kubernetes-engine/docs/concepts/private-cluster-concept#overview">Access to both endpoints</a> are controlled by the "authorised networks" list, which firstly allows private network connections from the ingress subnet so that Traefik could reach the private endpoint to load information about <code>Service</code>s receiving ingress traffic; and secondly allows public network connections from the user's source IPs for <a href="https://kubernetes.io/docs/reference/kubectl/overview/"><code>kubectl</code></a> access.</p>
<p>Local <code>kubectl</code> credentials are <a href="https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-access-for-kubectl#viewing_kubeconfig">configured</a> using GCP's command line client. There is <a href="https://cloud.google.com/kubernetes-engine/docs/how-to/iam">a native integration</a> between the GCP IAM and the cluster RBAC, and the user's Google identity is by default already bound to the cluster admin role. Given this is a single-user cluster, further configurations of access will not be very meaningful to security.</p>
<p>Something I'm not particularly happy about in this access solution -- in common with other managed Kubernetes offerings which provide source IP access restrictions -- is that while the source IP of a user would be predictable if they have a static home IP address or a VPN server; this security model will be much harder to use by those relying on dynamically-allocated or NAT home IPs. And the GKE control plane can only be accessed via IPv4, therefore using IPv6 is entirely out of the question for those with IPv6 static IPs only. </p>
<p>A possible solution which I've experimented with is to set up a <a href="https://doc.traefik.io/traefik/routing/routers/#configuring-tcp-routers">TCP Route</a> in Traefik running on the ingress load-balancing instance, whose ingress port is only exposed to the GCP Identity-Aware Proxy, and whose backend ("service" in Traefik concepts) is configured as the GKE control plane's private IP. Through the IAP, the user can then set up <a href="https://cloud.google.com/iap/docs/using-tcp-forwarding">TCP-forwarding</a> from their local command line environment in one terminal, and connect to the GKE control plane in another terminal using <code>kubectl</code> with a slightly-modified <code>kubeconfig</code> file, which has the public IP of the control plane endpoint replaced with the private one. </p>
<p>However, while the connectivity is completely achievable this way, <code>kubectl</code> does need to validate the IP Subject Alternative Names (SANs) on the certificate presented by the control plane endpoint. That certificate is managed by GKE and does not include <code>127.0.0.1</code> as a permitted IP SAN. It is certainly possible to run <code>kubectl</code> with <code>--insecure-skip-tls-verify=true</code>, but I felt that at this point the degraded security practice becomes worse than just putting the control plane on the internet. Traefik can alternatively obtain a valid TLS certificate for an SNI hostname on the TCP route (which has to be DNS-validated rather than HTTP-validated given it is behind IAP), but it will then require a local DNS override to use <code>kubectl</code>, which is also very awkward to use.</p>
<p>Therefore, the best option going forward is to wait for GCP to implement native IAP support for private GKE control planes. In the meantime, a possible workaround is to add an <a href="https://blog.chongya.ng/mixed-ikev2-ikev1-cisco-ipsec-vpn-server-with-no-user-certificates.html">IKEv1 IPSec VPN server</a> to the ingress instance, and open the required ports in the VPC firewall. This will allow the private control plane endpoint of the cluster to be reached via the VPN. </p>
<h3>Managed services supporting the cluster</h3>
<p>Being on GCP means we have access to a range of relatively mature managed services, which removes a lot of management overhead for resources outside the cluster at a relatively low cost. These include:</p>
<ul>
<li>The <a href="https://cloud.google.com/iap/docs/using-tcp-forwarding">Identity-Aware Proxy</a> for SSH access and TCP forwarding to the ingress instance and worker nodes, when required. This service is currently free.</li>
<li>The <a href="https://cloud.google.com/nat/docs/overview">NAT Gateway</a> for the VPC enabling private cluster networking, which costs a handful of dollars a month at our scale.</li>
<li>The <a href="https://cloud.google.com/security-key-management">Key Management Service</a> for encrypting cluster <code>etcd</code> database and instance disks, which is a handful of dollars a month even though not strictly necessary.</li>
<li>The <a href="https://cloud.google.com/container-registry">Container Registry</a> for hosting private Docker images used in the cluster, backed by Google Cloud Storage and at our level of usage the monthly cost is in pennies.</li>
<li><a href="https://kubernetes.io/docs/concepts/storage/persistent-volumes/">Persistent volumes</a> attached to stateful workloads such as Prometheus are backed by <a href="https://cloud.google.com/compute/disks-image-pricing">GCP Persistent Disks</a>, as is the case for boot disks of VM instances. Altogether my provisioned volumes cost around $10 a month using the cheapest tier, which is sufficiently performant for my requirements.</li>
</ul>
<h1>Comparison</h1>
<p>In the final part of this article, I will look at whether the resulting solution has met the various goals set out earlier, which are summarised in the table below:</p>
<table>
<thead>
<tr>
<th></th>
<th>GCP GKE (new)</th>
<th>Self-managed (old)</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Hardware</strong></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Computing Resources</td>
<td>8 vCPUs and 16GB RAM</td>
<td>12 vCPUs and 22GB RAM</td>
</tr>
<tr>
<td>Utilisation</td>
<td>~84%</td>
<td>~56%</td>
</tr>
<tr>
<td>CPU Models</td>
<td>AMD EPYC 7742</td>
<td>Intel i7-4770 (4c8t) & E3-1220v5 (4c4t)</td>
</tr>
<tr>
<td>Single-Core Passmark</td>
<td>2174</td>
<td>2175 & 2006</td>
</tr>
<tr>
<td><strong>Costs</strong></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Monthly Egress Usage</td>
<td>10GB</td>
<td>50GB (Control Plane and Prometheus communicate via WireGuard over public egress</td>
</tr>
<tr>
<td>Monthly Egress Cost</td>
<td>$0.05/GB</td>
<td>>15TB flat allowance included</td>
</tr>
<tr>
<td>Persistent Storage</td>
<td>Managed PD on HDD</td>
<td>GlusterFS cluster on HDD (self-managed)</td>
</tr>
<tr>
<td>Storage Cost</td>
<td>~$11 ($0.048/GB/mo)</td>
<td>Included in server cost</td>
</tr>
<tr>
<td>Total Monthly Cost</td>
<td><strong>~$83</strong></td>
<td><strong>~$57</strong></td>
</tr>
<tr>
<td><strong>Security</strong></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Private Cluster Networking</td>
<td>Yes</td>
<td>Yes (local and WireGuard)</td>
</tr>
<tr>
<td>Network Policies</td>
<td>Yes (managed CNI)</td>
<td>Yes (Calico)</td>
</tr>
<tr>
<td>Encrypted etcd</td>
<td>Yes (managed)</td>
<td>No (not particularly important)</td>
</tr>
<tr>
<td>Encrypted disks</td>
<td>Yes (managed)</td>
<td>No (not particularly important)</td>
</tr>
<tr>
<td>OS Integrity Protection</td>
<td>Yes (managed)</td>
<td>No (not particularly important)</td>
</tr>
<tr>
<td><strong>Reliability</strong></td>
<td></td>
<td></td>
</tr>
<tr>
<td>When control plane fails</td>
<td>If the AZ goes down</td>
<td>If hypervisor or server hardware goes down</td>
</tr>
<tr>
<td>When worker nodes fail</td>
<td>If the AZ goes down</td>
<td>If hypervisor or server hardware goes down, or if WireGuard disconnects</td>
</tr>
<tr>
<td>Worker node creation</td>
<td>Automatic</td>
<td>Manual</td>
</tr>
<tr>
<td>Worker node replacements</td>
<td>Automatic</td>
<td>Manual</td>
</tr>
<tr>
<td>Ingress from internet</td>
<td>Self-managed GCE instance</td>
<td>Self-managed MetalLB via a single server</td>
</tr>
<tr>
<td>Ingress failure recovery</td>
<td><code>terraform apply</code></td>
<td>Manual repair and recovery at hardware server level</td>
</tr>
<tr>
<td><strong>Reproducibility</strong></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Infrastructure as Code</td>
<td>Terraform for all parts</td>
<td>Not implemented, would have required Puppet or Hashicorp Packer</td>
</tr>
</tbody>
</table>
<p>At the end of this project, I have removed most of the manual work involved in maintaining the infrastructure for my personal projects using a managed PaaS solution, and replaced them with managed PaaS resources with comparable performance. This resulted in an approximately 45% increase in the monthly cost. The new infrastructure is easier to scale up in the event of unanticipated demand, and a lot easier to recreate in a disaster scenario. </p>
<p>Since the monthly cost remains affordable on my hobby budget while delivering significant time savings, it is working well for me so far. However, this infrastructure will not be scalable for hosting high-bandwidth services (such as self-hosted streaming), and should this be needed in the future, an alternative solution will be required.</p>Running a Low-Cost, Distributed Kubernetes Cluster on Bare-Metal with WireGuard2019-12-26T20:25:00+00:002019-12-26T20:25:00+00:00C Shitag:blog.chongya.ng,2019-12-26:/running-a-low-cost-distributed-kubernetes-cluster-on-bare-metal-with-wireguard.html<h3>Background</h3>
<p><a href="[http://wireguard.com/](http://wireguard.com/)">WireGuard</a> is a very well-abstracted and performant way of establishing site-to-site VPNs across multiple private networks over the internet. It exposes itself as a virtualised network interface on the local system, and does the vast majority of networking work within kernel space. Therefore the user can leverage system-managed IP …</p><h3>Background</h3>
<p><a href="[http://wireguard.com/](http://wireguard.com/)">WireGuard</a> is a very well-abstracted and performant way of establishing site-to-site VPNs across multiple private networks over the internet. It exposes itself as a virtualised network interface on the local system, and does the vast majority of networking work within kernel space. Therefore the user can leverage system-managed IP routing to easily direct traffic down the WireGuard tunnel into private networks running on the other side of the internet.</p>
<p>These characteristics make it ideal to use WireGuard for tunnelling Calico traffic across the internet, between Kubernetes nodes within the same cluster but running at different sites, each with their own private network. In my own use case, I run a <a href="https://blog.chongya.ng/running-a-personal-kubernetes-cluster-with-calico-connected-services-on-bare-metal.html">private Kubernetes cluster</a> running the majority of my personal projects as microservice workloads. These workloads run as Kubernetes pods within one Calico CIDR, no matter which node at which location they run on.</p>
<p>Under this low-budget use-case, in addition to operating a really cheap dedicated server running a self-managed KVM hypervisor (where most of my Kubernetes cluster lives), I'm also always on the look out for high-resource, low-cost virtual private server (VPS) providers which are reputable. After renting VPS from one of these providers, the VPS can be used as a satellite server running a Kubernetes node on its own, connected to the rest of the cluster through WireGuard.</p>
<p>To put this strategy into perspective, while a <code>c4.large</code> AWS EC2 instance (a minimum spec for Kubernetes nodes) with <a href="https://aws.amazon.com/savingsplans/pricing/">committed savings plan pricing</a> costs more than $1000 year (before EBS and egress traffic cost are even factored in), my satellite servers with the same resource specification costs me between $40 and $80 a year each, with generous amounts of local storage and egress traffic included.</p>
<p>By distributing workloads out between different providers, without losing the benefits of running all workloads within one logical cluster, I can effectively implement the concept of "availability zones" provided by <a href="[https://en.wikipedia.org/wiki/Infrastructure_as_a_service](https://en.wikipedia.org/wiki/Infrastructure_as_a_service)">IaaS</a> equivalent to those offered by providers like AWS. Hosting workloads across multiple availability zones provide redundancy between physical sites of infrastructures, while still allowing private network traffic to flow in-between.</p>
<p>Of course, the hypervisor hardware running my budget VMs will not be as reliable as those of AWS EC2 nodes, and these budget providers, despite reasonable reputation of longevity, are still more likely to suddenly go bankrupt compared to AWS. The purpose of my exercise is to operate a bare-metal cluster as cheaply as possible. </p>
<h3>Architecture</h3>
<p><img alt="Multi-Site Cluster Network" src="https://i.doge.at/uploads/big/df70988d0dedf2f7130702a04783a4db.png"></p>
<p>This setup represents a mix of one local private network containing several nodes, connected to satellite nodes hosted elsewhere by WireGuard. WireGuard runs as a separate VM instance (<code>10.100.0.88</code> with DNAT ingress for the WireGuard port on the hypervisor host) responsible for NAT'ing packets traversing to and from satellite servers. </p>
<p>On satellite servers running Kubernetes nodes, it is necessary for each of them to run WireGuard locally, as we don't want any Kubernetes control plane (cluster management) traffic to go over the internet without additional protection. </p>
<p>Normally, connecting several networks together requires one or more bridge servers running NAT. However, NAT and local BGP setups (as used by Calico on TCP 179) together generally cause weird behaviours. I discovered that BGP messages <em>traversing</em> networks work fine under local NAT rules of any WireGuard terminal instances they pass through, so long as the terminal instance(s) belong to either the source or the destination network. However, if the terminal instance is used as a "bridge" peer for two other WireGuard peers not connected via their own endpoints, Calico (or rather, bird) will complain that the source of BGP message was from the incorrectly masquaraded bridge peer, due to NAT on the bridge peer; and refuse to update local routes correctly. </p>
<p>Therefore, it is still necessary for WireGuard interfaces on all networks and discrete satellite servers to have direct paths to each other for peering to work; which requires all of these interfaces to have all other interfaces configured as direct WireGuard peers. This introduces a menial amount of reconfiguration of all WireGuard interfaces each time a new network (or discrete satellite server) is added into the cluster.</p>
<p>This setup does however bring benefits, as NAT will not be required in either direction on any bridge peers. Through observation, both control plane and container network traffic work fine on just static routes to all other internal subnets. Manually configuring static routes on all nodes is still a pain, but not using NAT between internal subnets helps avoiding many other problems.</p>
<p>While it is theoretically possible to not use multiple internal subnets at all, and instead run WireGuard on all nodes in <code>10.100.0.0/25</code>, for Kubernetes control plane to work, this alternative will require all nodes in this subnet to hold a full view of all WireGuard installations within the cluster, including each installation's public internet endpoint and their internal IP); additionally any WireGuard instances running within local networks will need to have a DNAT port on the hypervisor's ingress interface. This quickly becomes unmanageable, not to mention increased points of failures. </p>
<p>In the setup adopted, to connect any additional local network (say, <code>10.101.0.0/25</code>) into the cluster, it is necessary to create a new WireGuard Terminal instance in the new network, and update the WireGuard configurations of terminal instances of existing networks, as well as those of satellite servers running discrete Kubernetes nodes as shown.</p>
<h3>What worked and what didn't</h3>
<p>Kubernetes cluster management traffic (running on node network, which is a combination of <code>10.100.0.0/25</code> and <code>172.16.16.0/24</code>) consistently worked as expected. Satellite nodes in <code>172.16.16.0/24</code> behave as if they were part of the cluster, and talk to the <code>apiserver</code> on master node in in <code>10.100.0.0/25</code> correctly. This did require a minor tweak of <code>kubeadm</code>'s configuration after the initial <code>kubeadm join</code> (below). </p>
<div class="highlight"><pre><span></span><code><span class="err">$</span><span class="w"> </span><span class="n">cat</span><span class="w"> </span><span class="o">/</span><span class="n">etc</span><span class="o">/</span><span class="n">systemd</span><span class="o">/</span><span class="k">system</span><span class="o">/</span><span class="n">kubelet</span><span class="p">.</span><span class="n">service</span><span class="p">.</span><span class="n">d</span><span class="o">/</span><span class="mi">10</span><span class="o">-</span><span class="n">kubeadm</span><span class="p">.</span><span class="n">conf</span><span class="w"></span>
<span class="o">[</span><span class="n">Service</span><span class="o">]</span><span class="w"></span>
<span class="p">(...)</span><span class="w"></span>
<span class="n">Environment</span><span class="o">=</span><span class="ss">"KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml --node-ip 172.16.16.3"</span><span class="w"></span>
<span class="p">(...)</span><span class="w"></span>
</code></pre></div>
<p>This process also involves overcoming a common MTU issue when tunnelling encapsulated packets, which would have caused packets larger than MTU to be dropped incorrectly. To resolve this, I recommend:</p>
<ul>
<li>Make all physical ethernet interfaces, and virtualized ethernet interfaces of satellite workers use MTU = 1500.</li>
<li>Setting <code>MTU = 1360</code> in your <code>[interface]</code> configuration of all WireGuard installations</li>
<li>Configure Calico to use MTU = 1300:<div class="highlight"><pre><span></span><code>$ kubectl get cm -n kube-system calico-config -o yaml
apiVersion: v1
data:
calico_backend: bird
cni_network_config: <span class="p">|</span>-
<span class="o">{</span>
<span class="s2">"name"</span>: <span class="s2">"k8s-pod-network"</span>,
<span class="s2">"cniVersion"</span>: <span class="s2">"0.3.1"</span>,
<span class="s2">"plugins"</span>: <span class="o">[</span>
<span class="o">{</span>
<span class="s2">"type"</span>: <span class="s2">"calico"</span>,
<span class="o">(</span>...<span class="o">)</span>
<span class="s2">"mtu"</span>: <span class="m">1300</span>,
<span class="o">(</span>...<span class="o">)</span>
typha_service_name: none
veth_mtu: <span class="s2">"1300"</span>
kind: ConfigMap
<span class="o">(</span>...<span class="o">)</span>
</code></pre></div>
</li>
</ul>
<p>Once MTU is reconfigured, all the <code>calico-node</code>'s must be restarted to take on the new MTU.</p>
<p>However, <strong>the problem happens</strong> once workloads have been deployed into the satellite servers, RPC packets to and from these workloads do not always successfully reach their destinations, as shown in the <code>tcpdump</code> output below:</p>
<div class="highlight"><pre><span></span><code><span class="err">$</span><span class="w"> </span><span class="n">sudo</span><span class="w"> </span><span class="n">tcpdump</span><span class="w"> </span><span class="o">-</span><span class="n">i</span><span class="w"> </span><span class="n">wg0</span><span class="w"> </span><span class="n">port</span><span class="w"> </span><span class="ow">not</span><span class="w"> </span><span class="mi">6443</span><span class="w"> </span><span class="ow">and</span><span class="w"> </span><span class="n">port</span><span class="w"> </span><span class="ow">not</span><span class="w"> </span><span class="mi">179</span><span class="w"> </span><span class="ow">and</span><span class="w"> </span><span class="n">port</span><span class="w"> </span><span class="ow">not</span><span class="w"> </span><span class="mi">10250</span><span class="w"> </span><span class="o">-</span><span class="n">vv</span><span class="w"> </span><span class="o">-</span><span class="n">n</span><span class="w"></span>
<span class="nl">tcpdump</span><span class="p">:</span><span class="w"> </span><span class="n">listening</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="n">wg0</span><span class="p">,</span><span class="w"> </span><span class="n">link</span><span class="o">-</span><span class="n">type</span><span class="w"> </span><span class="n">RAW</span><span class="w"> </span><span class="p">(</span><span class="n">Raw</span><span class="w"> </span><span class="n">IP</span><span class="p">),</span><span class="w"> </span><span class="n">capture</span><span class="w"> </span><span class="k">size</span><span class="w"> </span><span class="mi">262144</span><span class="w"> </span><span class="n">bytes</span><span class="w"></span>
<span class="mi">18</span><span class="err">:</span><span class="mi">52</span><span class="err">:</span><span class="mf">54.523437</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">63</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">44293</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">IPIP</span><span class="w"> </span><span class="p">(</span><span class="mi">4</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">80</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">172.16.16.3</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">10.100.0.4</span><span class="err">:</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">63</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">65437</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">TCP</span><span class="w"> </span><span class="p">(</span><span class="mi">6</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">60</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">192.168.29.129.38978</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">192.168.1.7.80</span><span class="err">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">S</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">cksum</span><span class="w"> </span><span class="mh">0x78fb</span><span class="w"> </span><span class="p">(</span><span class="n">correct</span><span class="p">),</span><span class="w"> </span><span class="n">seq</span><span class="w"> </span><span class="mi">1021627257</span><span class="p">,</span><span class="w"> </span><span class="n">win</span><span class="w"> </span><span class="mi">25200</span><span class="p">,</span><span class="w"> </span><span class="n">options</span><span class="w"> </span><span class="o">[</span><span class="n">mss 1260,sackOK,TS val 818412491 ecr 0,nop,wscale 7</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="mi">18</span><span class="err">:</span><span class="mi">52</span><span class="err">:</span><span class="mf">54.523878</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">62</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">23546</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">IPIP</span><span class="w"> </span><span class="p">(</span><span class="mi">4</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">80</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">10.100.0.4</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">172.16.16.3</span><span class="err">:</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">63</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">TCP</span><span class="w"> </span><span class="p">(</span><span class="mi">6</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">60</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">192.168.1.7.80</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">192.168.29.129.38978</span><span class="err">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">S.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">cksum</span><span class="w"> </span><span class="mh">0x4116</span><span class="w"> </span><span class="p">(</span><span class="n">correct</span><span class="p">),</span><span class="w"> </span><span class="n">seq</span><span class="w"> </span><span class="mi">991018110</span><span class="p">,</span><span class="w"> </span><span class="n">ack</span><span class="w"> </span><span class="mi">1021627258</span><span class="p">,</span><span class="w"> </span><span class="n">win</span><span class="w"> </span><span class="mi">24960</span><span class="p">,</span><span class="w"> </span><span class="n">options</span><span class="w"> </span><span class="o">[</span><span class="n">mss 1260,sackOK,TS val 4138093197 ecr 818412491,nop,wscale 7</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="mi">18</span><span class="err">:</span><span class="mi">52</span><span class="err">:</span><span class="mf">54.527993</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">63</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">44295</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">IPIP</span><span class="w"> </span><span class="p">(</span><span class="mi">4</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">72</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">172.16.16.3</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">10.100.0.4</span><span class="err">:</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">63</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">65438</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">TCP</span><span class="w"> </span><span class="p">(</span><span class="mi">6</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">52</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">192.168.29.129.38978</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">192.168.1.7.80</span><span class="err">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">cksum</span><span class="w"> </span><span class="mh">0xcfd0</span><span class="w"> </span><span class="p">(</span><span class="n">correct</span><span class="p">),</span><span class="w"> </span><span class="n">seq</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">ack</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">win</span><span class="w"> </span><span class="mi">197</span><span class="p">,</span><span class="w"> </span><span class="n">options</span><span class="w"> </span><span class="o">[</span><span class="n">nop,nop,TS val 818412496 ecr 4138093197</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="mi">18</span><span class="err">:</span><span class="mi">52</span><span class="err">:</span><span class="mf">54.528043</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">63</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">44296</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">IPIP</span><span class="w"> </span><span class="p">(</span><span class="mi">4</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">146</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">172.16.16.3</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">10.100.0.4</span><span class="err">:</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">63</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">65439</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">TCP</span><span class="w"> </span><span class="p">(</span><span class="mi">6</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">126</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">192.168.29.129.38978</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">192.168.1.7.80</span><span class="err">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">P.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">cksum</span><span class="w"> </span><span class="mh">0x8e99</span><span class="w"> </span><span class="p">(</span><span class="n">correct</span><span class="p">),</span><span class="w"> </span><span class="n">seq</span><span class="w"> </span><span class="mi">1</span><span class="err">:</span><span class="mi">75</span><span class="p">,</span><span class="w"> </span><span class="n">ack</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">win</span><span class="w"> </span><span class="mi">197</span><span class="p">,</span><span class="w"> </span><span class="n">options</span><span class="w"> </span><span class="o">[</span><span class="n">nop,nop,TS val 818412496 ecr 4138093197</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">74</span><span class="err">:</span><span class="w"> </span><span class="n">HTTP</span><span class="p">,</span><span class="w"> </span><span class="nl">length</span><span class="p">:</span><span class="w"> </span><span class="mi">74</span><span class="w"></span>
<span class="w"> </span><span class="k">GET</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">HTTP</span><span class="o">/</span><span class="mf">1.1</span><span class="w"></span>
<span class="w"> </span><span class="k">Host</span><span class="err">:</span><span class="w"> </span><span class="mf">192.168.1.7</span><span class="w"></span>
<span class="w"> </span><span class="k">User</span><span class="o">-</span><span class="nl">Agent</span><span class="p">:</span><span class="w"> </span><span class="n">Wget</span><span class="w"></span>
<span class="w"> </span><span class="k">Connection</span><span class="err">:</span><span class="w"> </span><span class="k">close</span><span class="w"></span>
<span class="mi">18</span><span class="err">:</span><span class="mi">52</span><span class="err">:</span><span class="mf">54.528482</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">62</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">23548</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">IPIP</span><span class="w"> </span><span class="p">(</span><span class="mi">4</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">72</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">10.100.0.4</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">172.16.16.3</span><span class="err">:</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">63</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">27417</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">TCP</span><span class="w"> </span><span class="p">(</span><span class="mi">6</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">52</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">192.168.1.7.80</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">192.168.29.129.38978</span><span class="err">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">cksum</span><span class="w"> </span><span class="mh">0xcf83</span><span class="w"> </span><span class="p">(</span><span class="n">correct</span><span class="p">),</span><span class="w"> </span><span class="n">seq</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">ack</span><span class="w"> </span><span class="mi">75</span><span class="p">,</span><span class="w"> </span><span class="n">win</span><span class="w"> </span><span class="mi">195</span><span class="p">,</span><span class="w"> </span><span class="n">options</span><span class="w"> </span><span class="o">[</span><span class="n">nop,nop,TS val 4138093202 ecr 818412496</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="mi">18</span><span class="err">:</span><span class="mi">52</span><span class="err">:</span><span class="mf">54.528787</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">62</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">23549</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">IPIP</span><span class="w"> </span><span class="p">(</span><span class="mi">4</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">388</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">10.100.0.4</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">172.16.16.3</span><span class="err">:</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">63</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">27418</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">TCP</span><span class="w"> </span><span class="p">(</span><span class="mi">6</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">368</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">192.168.1.7.80</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">192.168.29.129.38978</span><span class="err">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">P.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">cksum</span><span class="w"> </span><span class="mh">0xcd1e</span><span class="w"> </span><span class="p">(</span><span class="n">correct</span><span class="p">),</span><span class="w"> </span><span class="n">seq</span><span class="w"> </span><span class="mi">1</span><span class="err">:</span><span class="mi">317</span><span class="p">,</span><span class="w"> </span><span class="n">ack</span><span class="w"> </span><span class="mi">75</span><span class="p">,</span><span class="w"> </span><span class="n">win</span><span class="w"> </span><span class="mi">195</span><span class="p">,</span><span class="w"> </span><span class="n">options</span><span class="w"> </span><span class="o">[</span><span class="n">nop,nop,TS val 4138093202 ecr 818412496</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">316</span><span class="err">:</span><span class="w"> </span><span class="n">HTTP</span><span class="p">,</span><span class="w"> </span><span class="nl">length</span><span class="p">:</span><span class="w"> </span><span class="mi">316</span><span class="w"></span>
<span class="w"> </span><span class="n">HTTP</span><span class="o">/</span><span class="mf">1.1</span><span class="w"> </span><span class="mi">200</span><span class="w"> </span><span class="n">OK</span><span class="w"></span>
<span class="w"> </span><span class="nl">Server</span><span class="p">:</span><span class="w"> </span><span class="n">nginx</span><span class="o">/</span><span class="mf">1.15.12</span><span class="w"></span>
<span class="w"> </span><span class="nc">Date</span><span class="err">:</span><span class="w"> </span><span class="n">Thu</span><span class="p">,</span><span class="w"> </span><span class="mi">19</span><span class="w"> </span><span class="k">Dec</span><span class="w"> </span><span class="mi">2019</span><span class="w"> </span><span class="mi">18</span><span class="err">:</span><span class="mi">52</span><span class="err">:</span><span class="mi">58</span><span class="w"> </span><span class="n">GMT</span><span class="w"></span>
<span class="w"> </span><span class="n">Content</span><span class="o">-</span><span class="nl">Type</span><span class="p">:</span><span class="w"> </span><span class="nc">text</span><span class="o">/</span><span class="n">plain</span><span class="w"></span>
<span class="w"> </span><span class="n">Content</span><span class="o">-</span><span class="nl">Length</span><span class="p">:</span><span class="w"> </span><span class="mi">145</span><span class="w"></span>
<span class="w"> </span><span class="k">Connection</span><span class="err">:</span><span class="w"> </span><span class="k">close</span><span class="w"></span>
<span class="w"> </span><span class="n">Content</span><span class="o">-</span><span class="nl">Type</span><span class="p">:</span><span class="w"> </span><span class="nc">text</span><span class="o">/</span><span class="n">plain</span><span class="w"></span>
<span class="w"> </span><span class="p">(</span><span class="n">body</span><span class="w"> </span><span class="n">removed</span><span class="p">)</span><span class="w"></span>
<span class="mi">18</span><span class="err">:</span><span class="mi">52</span><span class="err">:</span><span class="mf">54.528861</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">62</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">23550</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">IPIP</span><span class="w"> </span><span class="p">(</span><span class="mi">4</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">72</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">10.100.0.4</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">172.16.16.3</span><span class="err">:</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">63</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">27419</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">TCP</span><span class="w"> </span><span class="p">(</span><span class="mi">6</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">52</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">192.168.1.7.80</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">192.168.29.129.38978</span><span class="err">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">F.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">cksum</span><span class="w"> </span><span class="mh">0xce46</span><span class="w"> </span><span class="p">(</span><span class="n">correct</span><span class="p">),</span><span class="w"> </span><span class="n">seq</span><span class="w"> </span><span class="mi">317</span><span class="p">,</span><span class="w"> </span><span class="n">ack</span><span class="w"> </span><span class="mi">75</span><span class="p">,</span><span class="w"> </span><span class="n">win</span><span class="w"> </span><span class="mi">195</span><span class="p">,</span><span class="w"> </span><span class="n">options</span><span class="w"> </span><span class="o">[</span><span class="n">nop,nop,TS val 4138093202 ecr 818412496</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="mi">18</span><span class="err">:</span><span class="mi">52</span><span class="err">:</span><span class="mf">54.532556</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">63</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">44297</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">IPIP</span><span class="w"> </span><span class="p">(</span><span class="mi">4</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">72</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">172.16.16.3</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">10.100.0.4</span><span class="err">:</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">63</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">65440</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">TCP</span><span class="w"> </span><span class="p">(</span><span class="mi">6</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">52</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">192.168.29.129.38978</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">192.168.1.7.80</span><span class="err">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">cksum</span><span class="w"> </span><span class="mh">0xce37</span><span class="w"> </span><span class="p">(</span><span class="n">correct</span><span class="p">),</span><span class="w"> </span><span class="n">seq</span><span class="w"> </span><span class="mi">75</span><span class="p">,</span><span class="w"> </span><span class="n">ack</span><span class="w"> </span><span class="mi">317</span><span class="p">,</span><span class="w"> </span><span class="n">win</span><span class="w"> </span><span class="mi">206</span><span class="p">,</span><span class="w"> </span><span class="n">options</span><span class="w"> </span><span class="o">[</span><span class="n">nop,nop,TS val 818412501 ecr 4138093202</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="mi">18</span><span class="err">:</span><span class="mi">52</span><span class="err">:</span><span class="mf">54.534390</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">63</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">44298</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">IPIP</span><span class="w"> </span><span class="p">(</span><span class="mi">4</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">72</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">172.16.16.3</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">10.100.0.4</span><span class="err">:</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">63</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">65441</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">TCP</span><span class="w"> </span><span class="p">(</span><span class="mi">6</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">52</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">192.168.29.129.38978</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">192.168.1.7.80</span><span class="err">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">F.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">cksum</span><span class="w"> </span><span class="mh">0xce33</span><span class="w"> </span><span class="p">(</span><span class="n">correct</span><span class="p">),</span><span class="w"> </span><span class="n">seq</span><span class="w"> </span><span class="mi">75</span><span class="p">,</span><span class="w"> </span><span class="n">ack</span><span class="w"> </span><span class="mi">318</span><span class="p">,</span><span class="w"> </span><span class="n">win</span><span class="w"> </span><span class="mi">206</span><span class="p">,</span><span class="w"> </span><span class="n">options</span><span class="w"> </span><span class="o">[</span><span class="n">nop,nop,TS val 818412503 ecr 4138093202</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="mi">18</span><span class="err">:</span><span class="mi">52</span><span class="err">:</span><span class="mf">54.534595</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">62</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">23551</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">IPIP</span><span class="w"> </span><span class="p">(</span><span class="mi">4</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">72</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">10.100.0.4</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">172.16.16.3</span><span class="err">:</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">63</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">27420</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">TCP</span><span class="w"> </span><span class="p">(</span><span class="mi">6</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">52</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">192.168.1.7.80</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">192.168.29.129.38978</span><span class="err">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">.</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">cksum</span><span class="w"> </span><span class="mh">0xce38</span><span class="w"> </span><span class="p">(</span><span class="n">correct</span><span class="p">),</span><span class="w"> </span><span class="n">seq</span><span class="w"> </span><span class="mi">318</span><span class="p">,</span><span class="w"> </span><span class="n">ack</span><span class="w"> </span><span class="mi">76</span><span class="p">,</span><span class="w"> </span><span class="n">win</span><span class="w"> </span><span class="mi">195</span><span class="p">,</span><span class="w"> </span><span class="n">options</span><span class="w"> </span><span class="o">[</span><span class="n">nop,nop,TS val 4138093208 ecr 818412503</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="mi">18</span><span class="err">:</span><span class="mi">52</span><span class="err">:</span><span class="mf">58.072372</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">63</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">15157</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">IPIP</span><span class="w"> </span><span class="p">(</span><span class="mi">4</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">80</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">172.16.16.3</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">10.100.0.40</span><span class="err">:</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">63</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">4012</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">TCP</span><span class="w"> </span><span class="p">(</span><span class="mi">6</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">60</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">192.168.29.129.51108</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">192.168.2.18.80</span><span class="err">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">S</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">cksum</span><span class="w"> </span><span class="mh">0x04d7</span><span class="w"> </span><span class="p">(</span><span class="n">correct</span><span class="p">),</span><span class="w"> </span><span class="n">seq</span><span class="w"> </span><span class="mi">708851696</span><span class="p">,</span><span class="w"> </span><span class="n">win</span><span class="w"> </span><span class="mi">25200</span><span class="p">,</span><span class="w"> </span><span class="n">options</span><span class="w"> </span><span class="o">[</span><span class="n">mss 1260,sackOK,TS val 2101778737 ecr 0,nop,wscale 7</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="mi">18</span><span class="err">:</span><span class="mi">52</span><span class="err">:</span><span class="mf">59.079740</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">63</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">15222</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">IPIP</span><span class="w"> </span><span class="p">(</span><span class="mi">4</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">80</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">172.16.16.3</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">10.100.0.40</span><span class="err">:</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">63</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">4013</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">TCP</span><span class="w"> </span><span class="p">(</span><span class="mi">6</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">60</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">192.168.29.129.51108</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">192.168.2.18.80</span><span class="err">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">S</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">cksum</span><span class="w"> </span><span class="mh">0x00e7</span><span class="w"> </span><span class="p">(</span><span class="n">correct</span><span class="p">),</span><span class="w"> </span><span class="n">seq</span><span class="w"> </span><span class="mi">708851696</span><span class="p">,</span><span class="w"> </span><span class="n">win</span><span class="w"> </span><span class="mi">25200</span><span class="p">,</span><span class="w"> </span><span class="n">options</span><span class="w"> </span><span class="o">[</span><span class="n">mss 1260,sackOK,TS val 2101779745 ecr 0,nop,wscale 7</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="mi">18</span><span class="err">:</span><span class="mi">53</span><span class="err">:</span><span class="mf">01.095792</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">63</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">15313</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">IPIP</span><span class="w"> </span><span class="p">(</span><span class="mi">4</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">80</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">172.16.16.3</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">10.100.0.40</span><span class="err">:</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">63</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">4014</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">TCP</span><span class="w"> </span><span class="p">(</span><span class="mi">6</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">60</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">192.168.29.129.51108</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">192.168.2.18.80</span><span class="err">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">S</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">cksum</span><span class="w"> </span><span class="mh">0xf906</span><span class="w"> </span><span class="p">(</span><span class="n">correct</span><span class="p">),</span><span class="w"> </span><span class="n">seq</span><span class="w"> </span><span class="mi">708851696</span><span class="p">,</span><span class="w"> </span><span class="n">win</span><span class="w"> </span><span class="mi">25200</span><span class="p">,</span><span class="w"> </span><span class="n">options</span><span class="w"> </span><span class="o">[</span><span class="n">mss 1260,sackOK,TS val 2101781761 ecr 0,nop,wscale 7</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="mi">18</span><span class="err">:</span><span class="mi">53</span><span class="err">:</span><span class="mf">05.127663</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">63</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">15616</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">IPIP</span><span class="w"> </span><span class="p">(</span><span class="mi">4</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">80</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">172.16.16.3</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">10.100.0.40</span><span class="err">:</span><span class="w"> </span><span class="n">IP</span><span class="w"> </span><span class="p">(</span><span class="n">tos</span><span class="w"> </span><span class="mh">0x0</span><span class="p">,</span><span class="w"> </span><span class="n">ttl</span><span class="w"> </span><span class="mi">63</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="mi">4015</span><span class="p">,</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">flags</span><span class="w"> </span><span class="o">[</span><span class="n">DF</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">proto</span><span class="w"> </span><span class="n">TCP</span><span class="w"> </span><span class="p">(</span><span class="mi">6</span><span class="p">),</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">60</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="mf">192.168.29.129.51108</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="mf">192.168.2.18.80</span><span class="err">:</span><span class="w"> </span><span class="n">Flags</span><span class="w"> </span><span class="o">[</span><span class="n">S</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">cksum</span><span class="w"> </span><span class="mh">0xe947</span><span class="w"> </span><span class="p">(</span><span class="n">correct</span><span class="p">),</span><span class="w"> </span><span class="n">seq</span><span class="w"> </span><span class="mi">708851696</span><span class="p">,</span><span class="w"> </span><span class="n">win</span><span class="w"> </span><span class="mi">25200</span><span class="p">,</span><span class="w"> </span><span class="n">options</span><span class="w"> </span><span class="o">[</span><span class="n">mss 1260,sackOK,TS val 2101785792 ecr 0,nop,wscale 7</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="o">^</span><span class="n">C</span><span class="w"></span>
<span class="mi">14</span><span class="w"> </span><span class="n">packets</span><span class="w"> </span><span class="n">captured</span><span class="w"></span>
<span class="mi">14</span><span class="w"> </span><span class="n">packets</span><span class="w"> </span><span class="n">received</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="k">filter</span><span class="w"></span>
<span class="mi">0</span><span class="w"> </span><span class="n">packets</span><span class="w"> </span><span class="n">dropped</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="n">kernel</span><span class="w"></span>
<span class="mi">4</span><span class="w"> </span><span class="n">packets</span><span class="w"> </span><span class="n">dropped</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="n">interface</span><span class="w"></span>
</code></pre></div>
<p>What's happening in the above is as followed:</p>
<ul>
<li>A pod running on Satellite Server 2 (<code>172.16.16.3</code>) tries to make RPC calls to two pods with identical workloads (<code>192.168.1.7</code>, <code>192.168.2.18</code>) running on Worker 1 (<code>10.100.0.4</code>) and Worker 2 (<code>10.100.0.40</code>) respectively.</li>
<li>Both RPC calls were made with encapsulated IP-in-IP packets.</li>
<li>Encapsulation means that the outer packet headers have source and destination IPs as the IPs of the source (<code>172.16.16.3</code>) and destination nodes (<code>10.100.0.4</code> or <code>10.100.0.40</code>).</li>
<li>And the inner headers contain virtualized Calico pod IP addresses, with which pods are identified in the service mesh.</li>
<li>Calico pod on each node knows which local pod is allocated which pod IP, while the pod IP of the target workload is obtained by querying or processing through a service proxy, such as <a href="https://www.envoyproxy.io/">Envoy</a> or simply the integrated <code>kube-proxy</code>.</li>
<li>Calico on the source node has encapsulated the packet with its source and destination nodes.</li>
<li>Calico on the destination node is meant to unencapsulate the packet, identify the target pod IP, and route traffic towards the local virtual Calico interface of that pod.</li>
<li>However, only the packets to <code>192.168.1.7</code> pod were delivered to target node's Calico (<code>10.100.0.4</code>) at all, <strong>while packets to <code>192.168.2.18</code> (last three entries in the dump) on node <code>10.100.0.40</code> were "dropped by interface" and never routed to the destination</strong>.</li>
</ul>
<p>Because encapsulated packets to both <code>10.100.0.4</code> and <code>10.100.0.40</code> have gone through NAT performed by the WireGuard Terminal instance, and both requests happened nearly simultaneously, it is not a typical NAT-related fault I could recognise; but rather due to some strange interaction between inner workings of WireGuard and encapsulated packets passing through. Note the dropped packets on the interface represent dropped responses to the four retries of the failed request in the tcpdump:</p>
<div class="highlight"><pre><span></span><code>$ ifconfig wg0
wg0: <span class="nv">flags</span><span class="o">=</span><span class="m">209</span><UP,POINTOPOINT,RUNNING,NOARP> mtu <span class="m">1360</span>
inet <span class="m">172</span>.16.16.1 netmask <span class="m">255</span>.255.255.255 destination <span class="m">172</span>.16.16.1
unspec <span class="m">00</span>-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen <span class="m">1000</span> <span class="o">(</span>UNSPEC<span class="o">)</span>
RX packets <span class="m">1088693</span> bytes <span class="m">370424252</span> <span class="o">(</span><span class="m">370</span>.4 MB<span class="o">)</span>
RX errors <span class="m">8</span> dropped <span class="m">34</span> overruns <span class="m">0</span> frame <span class="m">8</span>
TX packets <span class="m">1237790</span> bytes <span class="m">842322648</span> <span class="o">(</span><span class="m">842</span>.3 MB<span class="o">)</span>
TX errors <span class="m">0</span> dropped <span class="m">0</span> overruns <span class="m">0</span> carrier <span class="m">0</span> collisions <span class="m">0</span>
</code></pre></div>
<p>And when switching WireGuard into live debug mode via <code>echo "module wireguard -p" >/sys/kernel/debug/dynamic_debug/control</code>, dmesg will show something like:</p>
<div class="highlight"><pre><span></span><code>$ dmesg <span class="p">|</span> grep wireguard <span class="p">|</span> grep userspace
<span class="o">[</span> <span class="m">507</span>.175613<span class="o">]</span> wireguard: wg0: Failed to give packet to userspace from peer <span class="m">2</span> <span class="o">(</span>xx.xx.xx.xx:7000<span class="o">)</span>
<span class="o">[</span> <span class="m">508</span>.180598<span class="o">]</span> wireguard: wg0: Failed to give packet to userspace from peer <span class="m">2</span> <span class="o">(</span>xx.xx.xx.xx:7000<span class="o">)</span>
</code></pre></div>
<p>Where <code>xx.xx.xx.xx</code> is the public IP serving as the endpoint of the satellite server.</p>
<p>I have observed this kind of packet drops happening <em>sporadically</em>, in <em>either</em> direction encapsulated packets flow, on <em>all</em> internet-facing WireGuard interfaces within the cluster. Encapsulated packets can be dropped while standard TCP packets pass through the same link in the same direction just fine.</p>
<p>Because my cluster has a very low volume of inter-service RPC traffic, I noticed that this issue tends to surface after a period during which no encapsulated packets passed through (Calico's keep-alive BGP packets are not encapsulated themselves, and therefore no encapsulated packets happen in the background). I am not very familiar with inner workings of Linux kernel and encapsulated traffic, so if you have any idea what might be happening here, please det me know.</p>
<h3>Solution: Hold the door</h3>
<p>Because in-built Kubernetes health checks are always uni-directional, for this particular WireGuard problem where encapsulated packets can be randomly blocked in either direction, it was not sufficient to rely on health checks to keep the paths open. </p>
<p>As a result, I wrote and deployed a Go microservice <a href="https://github.com/chongyangshi/wylis">Wylis</a> as a <del>hack</del> workaround. Wylis runs as a Kubernetes Daemonset, which means that it runs as a pod on every node in the cluster. It does the following things:</p>
<ul>
<li>Periodically polls Wylis pods on all other nodes with fresh Calico TCP connections to keep the paths open</li>
<li>Emits <a href="https://prometheus.io">Prometheus</a> metrics to help measuring fine-grained request success and latency across the cluster</li>
<li>Periodically updates its knowledge of other Wylis pods through in-cluster Kubernetes API </li>
</ul>
<p>Following initial running with a polling period of 10 seconds, metrics emitted suggest that Wylis is working very well in keeping RPC traffic reachable across tunnelled networks:</p>
<p><img alt="Request successes and failures" src="https://i.doge.at/uploads/big/5bce5df617258d7db54c04f9a2927e6b.png"></p>
<p><em>No requests with encapsulated packets timed out when travelling through WireGuard under periodic polling.</em></p>
<p><img alt="Request timings" src="https://i.doge.at/uploads/big/951795a8bf78b9e3b42228952a6450eb.png"></p>
<p><em>Polling requests provide useful timing data on inter-node RPCs.</em></p>
<h3>Security Warning</h3>
<p>By default <a href="https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/">kubelet server</a> and Kubernetes <a href="https://kubernetes.io/docs/concepts/services-networking/service/#nodeport">NodePorts</a> will listen on <em>all</em> interfaces on the node, including external-facing interfaces. Normally, this is okay, as Kubernetes nodes are not expected to run with public IPs. But this is not true for our satellite servers, which are VPS servers with public IPs connected into the cluster via WireGuard. Therefore, these ports will be directly exposed on the public-facing network interface (eth0 or similar) unless iptables rules on these interfaces are set to deny ingress by default.</p>
<p>While kubelet server has authentication, your NodePorts may not. So it is extra important that you only allow system services running on the host server to be accessed via the public internet, but not any NodePorts or kubelet ports. The corresponding rules look something like the following:</p>
<div class="highlight"><pre><span></span><code>iptables -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -A INPUT -i eth0 -p tcp -m multiport --dports 22,9301 -j ACCEPT # SSH and some other system service
iptables -A INPUT -i eth0 -p udp -m multiport --dports 7000 -j ACCEPT # WireGuard ingress port
iptables -A INPUT -i eth0 -p udp -j DROP
iptables -A INPUT -i eth0 -p tcp -j DROP
</code></pre></div>Running a personal Kubernetes cluster with Calico-connected services on bare-metal2019-06-09T18:00:00+01:002019-06-09T18:00:00+01:00C Shitag:blog.chongya.ng,2019-06-09:/running-a-personal-kubernetes-cluster-with-calico-connected-services-on-bare-metal.html<p>Out of boredom, I decided to undertake an infrastructural experiment of setting up a personal <a href="https://kubernetes.io">Kubernetes</a> cluster, and moving as much of my personal project workloads into the cluster as possible. The purpose of this exercise was <em>not</em> to improve the resiliency of these fairly inconsequential workloads, but rather to …</p><p>Out of boredom, I decided to undertake an infrastructural experiment of setting up a personal <a href="https://kubernetes.io">Kubernetes</a> cluster, and moving as much of my personal project workloads into the cluster as possible. The purpose of this exercise was <em>not</em> to improve the resiliency of these fairly inconsequential workloads, but rather to see how far I could go in stretching this setup to fit low-cost servers I acquired from some infamous European providers. </p>
<p><img alt="Architecture of personal Kubernetes cluster" src="https://i.doge.at/uploads/big/61a580ae16a9c3463ad3066b95d31d9e.png"></p>
<p>It took some trial and errors over a couple weeks of time, but eventually I was able to achieve a setup that is functional and reasonably painless to maintain.</p>
<p>Two servers are used in the setup:</p>
<ul>
<li>
<p>A bare-metal Debian <strong>host server</strong> running QEMU-KVM (<code>libvirt</code>), which in turn runs a number Ubuntu guest VMs, each running a Kubernetes master or worker node, or a <a href="[https://www.gluster.org/](https://www.gluster.org/)">GlusterFS</a> replicated storage node. </p>
<ul>
<li>The host server runs former VPS host-grade hardware, and therefore was fairly inexpensive to lease from the right provider, but yet still pretty powerful enough to run my cluster.</li>
<li>The Kubernetes node network (<code>10.100.0.0/25</code>) is segregated from the public internet.</li>
<li>Two IP addresses are used, one for the exclusive use of ingress to web services running in Kubernetes (<code>10.100.0.128/25</code>), and another for host maintenance and protected <code>kubectl</code> access.</li>
<li>Ubuntu guest images were built with <a href="https://cloudinit.readthedocs.io/en/latest/">Cloud-Init</a> and runs in DHCP mode.</li>
</ul>
</li>
<li>
<p>An <strong>auxiliary server</strong>, a low-cost yet fairly powerful virtual machine hosted with a different provider.</p>
<ul>
<li>It was originally intended to be set up as an off-site Kubernetes worker node connected into the main cluster via WireGuard. While I managed to get kubelet joining the master node successfully and its <a href="https://www.projectcalico.org/">Calico</a> node reaching the main cluster network, I ran into some weird issues with <a href="https://en.wikipedia.org/wiki/Large_send_offload">send/receive offloading</a> causing longer-than-MTU pod traffic packets to be dropped on Calico over WireGuard, and ~~had to abandon this idea.~~ <strong>Update: I have spent more time looking into this and implemented a partial workaround, see the <a href="https://blog.chongya.ng/running-a-low-cost-distributed-kubernetes-cluster-on-bare-metal-with-wireguard.html">updated post</a>.</strong></li>
<li>The auxiliary server runs workloads which are tricky to containerise, including my private Docker build environment and container repository (major <code>iptables</code> screw-up) and MySQL for backing some legacy projects. </li>
</ul>
</li>
<li>
<p><a href="[https://www.wireguard.com/](https://www.wireguard.com/)">WireGuard</a> runs as a virtualised bridge between this and the auxiliary server hosted elsewhere.</p>
</li>
</ul>
<p>Running Kubernetes on self-managed virtualisation -- and in turn on bare-metal -- is fairly unorthodoxy these days -- not to mention the likes of managed Kubernetes setups such as those from <a href="https://cloud.google.com/kubernetes-engine/">Google</a> and <a href="https://www.digitalocean.com/">DigitalOcean</a>. A wise production setup would at least not involve maintaining one's own hypervisor -- which I did in this setup. This setup is therefore by no means commercially-sensible for most use-cases, but rather as a personal hobby. </p>
<p>Some infrastructural notes for the setup:</p>
<ul>
<li>
<p>The process of setting up <code>libvirt</code> to run KVM in a segregated private subnet and managing it with <code>virsh</code> were fairly <a href="https://help.ubuntu.com/community/KVM/Installation">well</a>-<a href="https://www.cyberciti.biz/faq/installing-kvm-on-ubuntu-16-04-lts-server/">documented</a>.</p>
</li>
<li>
<p>Setting up Kubernetes <a href="https://kubernetes.io/docs/setup/independent/install-kubeadm/">master and worker nodes</a> with Docker as <a href="https://kubernetes.io/docs/setup/cri/">container runtime</a>, <a href="https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#pod-network">joining them together</a>, and <a href="https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/#pod-network">wiring their pods together</a> with Calico was surprisingly easy, as the bare-metal setup process is very mature.</p>
</li>
<li>
<p>The biggest pain point of running a bare-metal setup of Kubernetes is the lack of a ready-made load-balancer and ingress solution, such as <a href="[https://aws.amazon.com/elasticloadbalancing/](https://aws.amazon.com/elasticloadbalancing/)">ELB/NLB</a> available when your cluster runs on AWS EC2. </p>
<ul>
<li>Instead, I used <a href="https://metallb.universe.tf/tutorial/layer2/">MetalLB</a> on Layer 2 routing mode to front the Cluster IP of an NGINX internal ingress service, with MetalLB's own ingress subnet forwarded via NAT to an external ingress IP. </li>
<li>The BGP mode of MetalLB would be really nice to have, but it is unfortunately not compatible with Calico's BGP setup. </li>
</ul>
</li>
<li>
<p>I use <a href="https://www.gluster.org/">GlusterFS</a> as a replicated storage backend, which in this setup is not really redundant since they run on the same physical hard drive of the host server. But in a more budget-accommodating setup this can be easily distributed. GlusterFS is wired into Kubernetes as an endpoint for persistent volumes. </p>
</li>
</ul>
<p>For each of my existing personal projects, I wrote <code>Dockerfile</code>s and supporting <code>Makefile</code>s to enable them to be containerised. These mostly run with three replicas for load-balancing. These projects include:</p>
<ul>
<li>
<p><a href="https://chongya.ng">My personal page</a> which is now served by an <a href="https://github.com/chongyangshi/chongya.ng">NGINX/Alpine container</a> deployment.</p>
</li>
<li>
<p><a href="https://blog.chongya.ng">My personal blog</a> (the site you are reading now) which is now served by another <a href="https://github.com/chongyangshi/blog.chongya.ng">NGINX/Alpine container</a> deployment.</p>
</li>
<li>
<p>The <a href="https://covertmark.com">documentation site</a> for my Master's dissertation project CovertMark, which is now served by yet another <a href="https://github.com/chongyangshi/CovertMark/tree/master/doc">NGINX/Alpine container</a> deployment.</p>
</li>
<li>
<p>The front-end and RESTful API backend of my personal image sharing service <a href="https://github.com/chongyangshi/yronwood/">Yronwood</a>, which I wrote from scratch in Golang to replace the old PHP/MySQL-based <a href="https://github.com/LycheeOrg/Lychee/">Lychee</a> that was way too painful to containerise.</p>
</li>
<li>
<p>Some static and PHP sites I host for family and friends on a <em>pro bono</em> basis.</p>
</li>
</ul>
<p>Due to the non-distributed nature of most of these setups, these projects don't really benefit from additional redundancy and resiliency which Kubernetes is supposed to provide, but this hopefully serves as a good technical demonstrator for the feasibility of managing small scale projects in Kubernetes.</p>Epilogue II: Cambridge Boogaloo2018-07-10T15:00:00+01:002018-07-10T15:00:00+01:00C Shitag:blog.chongya.ng,2018-07-10:/epilogue-ii-cambridge-boogaloo.html<p>My <a href="https://blog.chongya.ng/epilogue.html">previous epilogue</a> was written almost a year ago, when I was finishing up my three years of undergraduate study at York. Since then -- thanks to how degrees work in the British system -- I have already gone and done another degree (MPhil) at Cambridge. This second epilogue is however written …</p><p>My <a href="https://blog.chongya.ng/epilogue.html">previous epilogue</a> was written almost a year ago, when I was finishing up my three years of undergraduate study at York. Since then -- thanks to how degrees work in the British system -- I have already gone and done another degree (MPhil) at Cambridge. This second epilogue is however written approximately four years earlier than I originally expected, due to some changes in circumstances which I will soon be able to tell more widely. But in short: I am not doing a PhD, at least for now.</p>
<p>Somewhat notoriously known as a competitive person, it had been a regret of mine for several years that I failed to gain entry into Oxford for undergraduate. When this story was mentioned in passing to many of my friends in Cambridge, particularly among those who previously read the Computer Science Tripos, a common joke was that I simply applied to the wrong university for undergraduate. There might be some truth to it, but I was already very glad that my undergraduate work at York was recognised by those from the Computer Laboratory, from whom this opportunity allowed me to spend a very interesting year here. </p>
<p>Being on the graduate side of Cambridge means that many of the ancient or arcane traditions and rules do not apply to you. Some would argue that this makes the Cambridge experience less authentic, but when one really just wants to do decent research and talk to very academically-capable people, that side of Cambridge is mostly there for vanity. In fact, from my limited interactions with undergraduates, the typical public perception of snobbish, privately-educated young adults was an exception here rather than the norm. With three short eight-week terms, timetabled teaching sessions are tight and examinations are harsh. The undergraduates are incredibly hard-working, and it is not uncommon to see self-study schedules tacked to their library stalls running from 8 am to 10 pm every day. </p>
<p><img alt="Undergraduate students were the ones hit by the strikes the worst in Cambridge." src="https://i.doge.at/uploads/big/b269375419f965671e45a791315b840b.jpg"></p>
<p>On a related side note, many Cambridge students are very good at self-parodying. One of the best I overheard in college was a student signing documents to cash some savings bonds, while chatting with his friend about how "that was one of the most Cambridge things to do". At end of the day, when people don't take themselves <em>that</em> seriously, this academic environment is just like any other in the country.</p>
<p><img alt="Stephen Hawking's funeral procession from a distance." src="https://i.doge.at/uploads/big/8af349c355cfe6a92939f94d3ba920b0.jpg"></p>
<p>So what would be my own conclusion for my brief time here? It is incredibly difficult to find an environment that values hard work and tries its best to reward you accordingly. Cambridge is not yet there, as far as I am concerned, but it is doing reasonably well to get there, and far ahead of many other UK universities.</p>
<p><img alt="These glasses are very expensive." src="https://i.doge.at/uploads/big/9e51ae13105b43c591183061a7e3a7c8.jpg"></p>Epilogue2017-07-12T19:30:00+01:002017-07-12T19:30:00+01:00C Shitag:blog.chongya.ng,2017-07-12:/epilogue.html<p>So it has been rather quick: I have concluded my business with the University of York for now. It is nearly the end of road for Eboracum. The end result was more scared than hurt: I managed to scrape together the requirements for the best possible line they could award …</p><p>So it has been rather quick: I have concluded my business with the University of York for now. It is nearly the end of road for Eboracum. The end result was more scared than hurt: I managed to scrape together the requirements for the best possible line they could award to me on the piece of paper, crossing it by nothing but a small amount behind the decimal point. However, I managed it, so all is well.</p>
<p>One of the few things coming out of this tiny achievement is that I can be slightly critical of my now <em>alma mater</em> without sounding too much like harangue from someone who did not work up to the expectations of their degree, in a hopefully constructive way. All of these are of course over my limited understanding of the institution as a student -- your experience may vary.</p>
<p>Before any of these, however, I should point out that the vast majority of my experience doing this degree have been generally positive, which is way beyond the average I should be expecting living out the next decades of my life. The "Russell Group" name badge does seem to imply that mildly infuriating matters are mostly out of your way as a student. A lot of great teaching staff dwarfing the lesser; the IT Services was great (high availability, performance, and quality service); a real effort has been made by the Careers Service; my department did care about student experience and had done what they could against resource constraints and bureaucracy to improve the experience; and so on.</p>
<p>However, there were less impressive things -- more familiar to me when they are associated with my department -- that I never mentioned to visiting applicants (I was paid to not talk about them, a great arrangement). </p>
<p>A small disappointment is that my (most recent) college ended up on my degree certificate, in a large font immediately beneath my name. I find it pretentious to make colleges appear more important than they really were. I changed college half way through my degree due to moving into a different college's campus accommodation. Apart from some fun activities, both colleges contributed next to nothing to my academic experience, and the requirement of having college affiliation switched when one moves into a different college's accommodation makes the identify of the college even less significant. I do not feel that the college affiliation was worthy enough to appear on a York degree certificate.</p>
<p>A greater criticism is on the university's constant hunger for expansion and the means through which they sate the hunger. Constructions have been happening non-stop on both campuses, developments of which mostly serving an increased student number for more money, causing endless nuisance and disruptions to students who have already paid their tuition (and for some, accommodation) fees, and make the campuses look less impressive to those who need to make their mind up. In an extreme case, one side of an entire accommodation building was nose-to-nose to a building site meters away, with concrete pouring sessions running throughout term time. With the increased domestic tuition fee, Brexit, and a shrinking UK economy, it seems way too optimistic to expect ever-increasing numbers of UK, EU, and overseas students, as the situation is not looking great to any one of the categories. A long term plan to recruit more students cause falling admissions standards (personally I have talked to many taught Masters students whose English language skills were way beneath admissible standards of a "Russell Group" university); nuisance from construction of new facilities to house more students; as well as a further devaluation of worthiness of a York degree. More students certainly bring in additional short-term revenue; but I do not think that the senior leadership of the university consider the long-term impact seriously enough.</p>
<p>Another criticism I would raise -- this time about my department in particular -- is the dominance of research groups on the experience of <em>taught</em> students. Each of the research groups generally host one or more personal chairs, as well as a slew of other academic and research staff, research and project students. While the primary duties of research groups are for the greater good of academic research, it is very disappointing that student complaints related to experience of later years tend to originate from the rigidness of research groups. There are two main ways research groups affect taught students: module options and projects. In the first case, each research group require teaching hours to fulfil duties required on their academic staff. While most if not all of them are great researchers, a small but significant amount of them were less imposing when trying to deliver specialised knowledge to upper year undergraduates. This has been a source of recurring student complaints which the department acknowledges but cannot do much about. </p>
<p>The problem with projects is a reflection of something more troubling to the department: short-sighted resource allocation. For many years, the Department of Computer Science have largely overlooked the field of cryptography and security, to the point of losing their <a href="https://www.sheffield.ac.uk/dcs/people/academic/jclark">best staff in the field</a> to Sheffield, who was half a year into his appointment as Head of Department. With the impending departure of another academic staff, they were entirely devoid of staff in this field, all while the accreditation bodies demanding more input from this field into the degree programs. In an unprecedented fashion, the department had to hire <a href="https://www.cs.york.ac.uk/people/?search=MArs&username=angus">three</a> <a href="https://www.cs.york.ac.uk/people/?group=Academic%20and%20Teaching%20Staff&username=vv">academic</a> <a href="https://www.cs.york.ac.uk/people/?group=Academic%20and%20Teaching%20Staff&username=siamak">staff</a> from this same field to make up the loss. For me, this was late enough to leave me a permanent disappointment of not having been able to do a crypto project with my first degree, but might not be too late for others. Plus, it always leaves open the possibility to come back to do a PhD now they do have staff, so I applaud for finally seeing the improvements.</p>
<p>I think it is reasonable to argue that large institutions are inherently ill, and I would not expect any better in my next destination. However, some of them do try to take the right medicine for their symptoms. I would say that York has the right medicine, they just need to take it on time.</p>Mixed IKEv2 / IKEv1 Cisco IPSec VPN Server with No User Certificates2016-11-20T00:26:00+00:002016-11-20T00:26:00+00:00C Shitag:blog.chongya.ng,2016-11-20:/mixed-ikev2-ikev1-cisco-ipsec-vpn-server-with-no-user-certificates.html<p>Also known as: <strong>Moving on from racoon to strongSwan, with back compatibility</strong>.</p>
<p>After an afternoon (well, mostly evening since I woke up at 3 pm) of troubleshooting, I figured out why iOS 9+ and OS X 10.11+ are having slow connection issues with <a href="http://ipsec-tools.sourceforge.net/">racoon</a>-powered Cisco IPSec IKEv1 VPNs …</p><p>Also known as: <strong>Moving on from racoon to strongSwan, with back compatibility</strong>.</p>
<p>After an afternoon (well, mostly evening since I woke up at 3 pm) of troubleshooting, I figured out why iOS 9+ and OS X 10.11+ are having slow connection issues with <a href="http://ipsec-tools.sourceforge.net/">racoon</a>-powered Cisco IPSec IKEv1 VPNs, and why it is really the time to move on to strongSwan and IKEv2. And I will also provide a solution to deploy a strongSwan mixed IKEv2+IKEv1 server that would work for almost all clients.</p>
<h4>Trouble with racoon</h4>
<p>After getting an iOS 9 and an iOS 10 device, I noticed a considerable slow down in their "Cisco IPSec" (IKEv1) VPN connections to my servers. After some IRC discussion today, I decided to take a look, and found the culprit.</p>
<p>Apparently, Apple is deprecating the widely used (but also old) AES128/DES, HMAC_SHA1 and DH Group 2/modp1024 configuration set for IKEv1. They are old, and many parts of the configuration set are getting onto the brink of insecurity. However, this is still the highest supported configuration set for vpnc -- default IKEv1 client with Network Manager support on Ubuntu, released in 2008(!), iOS 8 and earlier, Mac OS X 10.10 and earlier, and many others. This does not mean that this configuration set will not work on iOS 9+ and OS X 10.11+, but as they will try their preferred configurations (AES256, SHA512 and modp2048) first, and then many others in between, and finally to our old configurations, this makes the handshake time to be 10-20 second in my case, which is utterly bad. </p>
<p>racoon / ipsec-tools is also very old, and while Japan's Network Information Centre (JPNIC) has a <a href="http://www.racoon2.wide.ad.jp/w/?TheRacoon2Project">fork</a> of racoon that supports IKEv2, I think <a href="https://www.strongswan.org/">strongSwan</a> is a far better supported, tested and safer option to move on to. However, a lot of aforementioned legacy devices also do not support IKEv2, so we need to deploy a mixed IKEv2+IKEv1 server to put everything into one server.</p>
<h4>The Painful strongSwan</h4>
<p>The reason that I have avoided strongSwan for so long is that the recommended client certificate authentication being difficult to deploy and get right for a private server without access to a trusted client certificate issuing facility. An entire self-signed trust environment would mean that a self-signed CA would have to be installed on your client devices (bad) and it is difficult to protect your CA private key from chances of compromise (bad). If your CA's private key somehow becomes compromised, then an attacker can easily issue certificates for websites that your clients would trust while visiting. </p>
<p>Now, we don't have to have client certificates for EAP key exchange/authentication (which is what I will use), but instead usernames and passwords. However, you will still need a server certificate that can either be self-signed or purchased. </p>
<p>If you choose to create a CA and self-sign your server certificate, you still have the problem mentioned above. Therefore, I have chosen to use a purchased and trusted certificate. My GoGetSSL reseller account allows me to pay $11.25 for a three year PositiveSSL certificate, but you can also get one for free from <a href="https://letsencrypt.org/">Let's Encrypt</a>, just follow their <code>certbot-auto</code> instructions and make symbolic links from Let's Encrypt directories to the relevant directories under <code>/etc/ipsec.d/</code>. Make sure that auto-renewal works for Let's Encrypt, otherwise your server may suddenly stop working! </p>
<h4>Deploying strongSwan</h4>
<p>If you have skipped the large blocks of text above, go and read the paragraph just before this section heading, otherwise you will be confused about the certificates used ("hey where are the instructions about making certificates?").</p>
<p><strong>Sorting out certificates</strong></p>
<p>After obtaining a set of trusted certificates in whatever way, you should have the following certificate and key files (names are examples but don't matter):</p>
<ul>
<li>CA Certificate (provided by your certificate issuer): <code>ca.crt</code></li>
<li>CA Intermediate Certificate(s) (there is usually one, since CAs mostly issue certificates through their intermediates only, and maybe two or more) <code>intermediate1.crt</code>, <code>intermediate2.crt</code> (if exists), ...</li>
<li>Server Certificate (usually provided by your certificate issuer): <code>server.crt</code></li>
<li>Server Certificate Key (should not have left your server at this point!): <code>server_key.pem</code></li>
</ul>
<p>Place (or in the case of Let's Encrypt certificates symbolic link) them as followed:</p>
<ul>
<li><code>ca.crt</code>, <code>intermediate1.crt</code>, <code>intermediate2.crt</code> (if exists, and so on if more) to <code>/etc/ipsec.d/cacerts/</code></li>
<li><code>server.crt</code> to <code>/etc/ipsec.d/certs/</code></li>
<li><code>server_key.pem</code> to <code>/etc/ipsec.d/private/</code></li>
</ul>
<p>Make sure that <code>ca.crt</code> along with all the intermediate certificates (<code>intermediate1.crt</code> etc) would form the complete certificate chain for your server certificate, otherwise your client may not trust it if it cannot follow the broken chain!</p>
<p>Also, make sure that all these certificates are read-only when stored.</p>
<p><strong>Installing strongSwan</strong></p>
<p>Now, I am a Ubuntu user and enjoy access to a wide range of pre-built packages, therefore I am just going to use the package manager to install strongSwan, since the default build works for my purposes. If you use another distribution, or have special requirements with the installation, then you will need to <a href="download.strongswan.org">download the source</a> and build it yourself.</p>
<div class="highlight"><pre><span></span><code>apt-get install strongswan libstrongswan libstrongswan libstrongswan-standard-plugins libcharon-extra-plugins
</code></pre></div>
<p>And it's done, no fuss.</p>
<p><strong>Configuring strongSwan</strong></p>
<p>First move the IPSec configurations to their backups, since we are starting anew:</p>
<div class="highlight"><pre><span></span><code>mv /etc/ipsec.secrets /etc/ipsec.secrets.old
mv /etc/ipsec.conf /etc/ipsec.conf.old
</code></pre></div>
<p>Figure out your server's public IP now (e.g. find the right one in <code>ifconfig</code>), since we will need it in a moment. For compatibility reasons, the domain hostname your certificate was issued for should be resolved to your server (as observed from the outside, so no CDN in the middle). We will use this hostname instead of the IP address to configure strongSwan. In this post, the example will be <code>vpn.ebornet.com</code>, change this to your server's when editing the configuration.</p>
<p>Edit <code>/etc/ipsec.conf</code>:</p>
<div class="highlight"><pre><span></span><code><span class="nv">config</span> <span class="nv">setup</span>
# <span class="nv">This</span> <span class="nv">permits</span> <span class="nv">multiple</span> <span class="nv">logins</span> <span class="nv">with</span> <span class="nv">the</span> <span class="nv">same</span> <span class="nv">username</span><span class="o">/</span><span class="nv">password</span>, <span class="nv">set</span> <span class="nv">this</span> <span class="nv">to</span> <span class="nv">yes</span> <span class="k">if</span> <span class="nv">you</span> <span class="nv">don</span><span class="s1">'</span><span class="s">t like this.</span>
<span class="nv">uniqueids</span><span class="o">=</span><span class="nv">no</span>
<span class="nv">conn</span> <span class="o">%</span><span class="nv">default</span>
# <span class="nv">Using</span> <span class="nv">advanced</span> <span class="nv">ciphers</span>.
<span class="nv">ike</span><span class="o">=</span><span class="nv">aes256gcm16</span><span class="o">-</span><span class="nv">aes256gcm12</span><span class="o">-</span><span class="nv">aes128gcm16</span><span class="o">-</span><span class="nv">aes128gcm12</span><span class="o">-</span><span class="nv">sha256</span><span class="o">-</span><span class="nv">sha1</span><span class="o">-</span><span class="nv">modp2048</span><span class="o">-</span><span class="nv">modp4096</span><span class="o">-</span><span class="nv">modp1024</span>,<span class="nv">aes256</span><span class="o">-</span><span class="nv">aes128</span><span class="o">-</span><span class="nv">sha256</span><span class="o">-</span><span class="nv">sha1</span><span class="o">-</span><span class="nv">modp2048</span><span class="o">-</span><span class="nv">modp4096</span><span class="o">-</span><span class="nv">modp1024</span>,<span class="mi">3</span><span class="nv">des</span><span class="o">-</span><span class="nv">sha1</span><span class="o">-</span><span class="nv">modp1024</span><span class="o">!</span>
<span class="nv">esp</span><span class="o">=</span><span class="nv">aes128gcm12</span><span class="o">-</span><span class="nv">aes128gcm16</span><span class="o">-</span><span class="nv">aes256gcm12</span><span class="o">-</span><span class="nv">aes256gcm16</span><span class="o">-</span><span class="nv">modp2048</span><span class="o">-</span><span class="nv">modp4096</span><span class="o">-</span><span class="nv">modp1024</span>,<span class="nv">aes128</span><span class="o">-</span><span class="nv">aes256</span><span class="o">-</span><span class="nv">sha1</span><span class="o">-</span><span class="nv">sha256</span><span class="o">-</span><span class="nv">modp2048</span><span class="o">-</span><span class="nv">modp4096</span><span class="o">-</span><span class="nv">modp1024</span>,<span class="nv">aes128</span><span class="o">-</span><span class="nv">sha1</span><span class="o">-</span><span class="nv">modp2048</span>,<span class="nv">aes128</span><span class="o">-</span><span class="nv">sha1</span><span class="o">-</span><span class="nv">modp1024</span>,<span class="mi">3</span><span class="nv">des</span><span class="o">-</span><span class="nv">sha1</span><span class="o">-</span><span class="nv">modp1024</span>,<span class="nv">aes128</span><span class="o">-</span><span class="nv">aes256</span><span class="o">-</span><span class="nv">sha1</span><span class="o">-</span><span class="nv">sha256</span>,<span class="nv">aes128</span><span class="o">-</span><span class="nv">sha1</span>,<span class="mi">3</span><span class="nv">des</span><span class="o">-</span><span class="nv">sha1</span><span class="o">!</span>
<span class="nv">dpdaction</span><span class="o">=</span><span class="nv">clear</span>
<span class="nv">dpddelay</span><span class="o">=</span><span class="mi">35</span><span class="nv">s</span>
<span class="nv">dpdtimeout</span><span class="o">=</span><span class="mi">2000</span><span class="nv">s</span>
<span class="nv">keyexchange</span><span class="o">=</span><span class="nv">ikev2</span>
<span class="nv">auto</span><span class="o">=</span><span class="nv">add</span>
<span class="nv">rekey</span><span class="o">=</span><span class="nv">no</span>
<span class="nv">reauth</span><span class="o">=</span><span class="nv">no</span>
<span class="nv">fragmentation</span><span class="o">=</span><span class="nv">yes</span>
#<span class="nv">compress</span><span class="o">=</span><span class="nv">yes</span>
# <span class="nv">left</span> <span class="o">-</span> <span class="nv">local</span> <span class="ss">(</span><span class="nv">server</span><span class="ss">)</span> <span class="nv">side</span>
<span class="nv">leftcert</span><span class="o">=</span><span class="nv">server</span>.<span class="nv">crt</span> # <span class="nv">Filename</span> <span class="nv">of</span> <span class="nv">certificate</span> <span class="nv">located</span> <span class="nv">at</span> <span class="o">/</span><span class="nv">etc</span><span class="o">/</span><span class="nv">ipsec</span>.<span class="nv">d</span><span class="o">/</span><span class="nv">certs</span><span class="o">/</span>
<span class="nv">leftsendcert</span><span class="o">=</span><span class="nv">always</span>
<span class="nv">leftsubnet</span><span class="o">=</span><span class="mi">0</span>.<span class="mi">0</span>.<span class="mi">0</span>.<span class="mi">0</span><span class="o">/</span><span class="mi">0</span>,
# <span class="nv">right</span> <span class="o">-</span> <span class="nv">remote</span> <span class="ss">(</span><span class="nv">client</span><span class="ss">)</span> <span class="nv">side</span>
<span class="nv">eap_identity</span><span class="o">=%</span><span class="nv">identity</span>
<span class="nv">rightsourceip</span><span class="o">=</span><span class="mi">10</span>.<span class="mi">1</span>.<span class="mi">1</span>.<span class="mi">0</span><span class="o">/</span><span class="mi">24</span>
<span class="nv">rightdns</span><span class="o">=</span><span class="mi">8</span>.<span class="mi">8</span>.<span class="mi">8</span>.<span class="mi">8</span> #<span class="nv">Change</span> <span class="nv">it</span> <span class="nv">to</span> <span class="nv">another</span> <span class="nv">public</span> <span class="nv">DNS</span> <span class="k">if</span> <span class="nv">required</span>.
# <span class="nv">Windows</span> <span class="nv">and</span> <span class="nv">BlackBerry</span> <span class="nv">clients</span>
<span class="nv">conn</span> <span class="nv">ikev2</span><span class="o">-</span><span class="nv">mschapv2</span>
<span class="nv">rightauth</span><span class="o">=</span><span class="nv">eap</span><span class="o">-</span><span class="nv">mschapv2</span>
# <span class="nv">Apple</span> <span class="nv">clients</span>
<span class="nv">conn</span> <span class="nv">ikev2</span><span class="o">-</span><span class="nv">mschapv2</span><span class="o">-</span><span class="nv">apple</span>
<span class="nv">rightauth</span><span class="o">=</span><span class="nv">eap</span><span class="o">-</span><span class="nv">mschapv2</span>
<span class="nv">leftid</span><span class="o">=</span><span class="nv">vpn</span>.<span class="nv">ebornet</span>.<span class="nv">com</span> #<span class="nv">Change</span> <span class="nv">this</span> <span class="nv">to</span> <span class="nv">your</span> <span class="nv">certificate</span> <span class="nv">hostname</span>.
<span class="nv">conn</span> <span class="nv">ikev1group</span>
<span class="nv">aggressive</span> <span class="o">=</span> <span class="nv">yes</span> # <span class="nv">Not</span> <span class="nv">good</span>, <span class="nv">but</span> <span class="nv">standard</span> <span class="nv">practise</span> <span class="nv">and</span> <span class="nv">required</span> <span class="nv">to</span> <span class="nv">make</span> <span class="nv">IKEv1</span> <span class="nv">work</span> <span class="nv">on</span> <span class="nv">most</span> <span class="nv">consumer</span> <span class="nv">clients</span> <span class="nv">such</span> <span class="nv">as</span> <span class="nv">iOS</span>.
<span class="nv">keyexchange</span><span class="o">=</span><span class="nv">ikev1</span>
<span class="nv">authby</span><span class="o">=</span><span class="nv">xauthpsk</span>
<span class="nv">xauth</span><span class="o">=</span><span class="nv">server</span>
<span class="nv">left</span><span class="o">=%</span><span class="nv">defaultroute</span>
<span class="nv">leftsubnet</span><span class="o">=</span><span class="mi">0</span>.<span class="mi">0</span>.<span class="mi">0</span>.<span class="mi">0</span><span class="o">/</span><span class="mi">0</span>
<span class="nv">leftfirewall</span><span class="o">=</span><span class="nv">yes</span>
<span class="nv">right</span><span class="o">=%</span><span class="nv">any</span>
<span class="nv">rightsubnet</span><span class="o">=</span><span class="mi">10</span>.<span class="mi">1</span>.<span class="mi">2</span>.<span class="mi">0</span><span class="o">/</span><span class="mi">24</span>
<span class="nv">rightsourceip</span><span class="o">=</span><span class="mi">10</span>.<span class="mi">1</span>.<span class="mi">2</span>.<span class="mi">0</span><span class="o">/</span><span class="mi">24</span>
<span class="nv">rightdns</span><span class="o">=</span><span class="mi">8</span>.<span class="mi">8</span>.<span class="mi">8</span>.<span class="mi">8</span> #<span class="nv">Change</span> <span class="nv">it</span> <span class="nv">to</span> <span class="nv">another</span> <span class="nv">public</span> <span class="nv">DNS</span> <span class="k">if</span> <span class="nv">required</span>.
<span class="nv">auto</span><span class="o">=</span><span class="nv">add</span>
</code></pre></div>
<p>A couple of things to note when copy-pasting:</p>
<ul>
<li>We are using local ranges <code>10.1.1.0/24</code> for IKEv2 and <code>10.1.2.0/24</code> for IKEv1. If you want to change this, change their occurrences in the above file, as well as in the iptables rules followed soon.</li>
<li>I use Google's Public DNS (<code>8.8.8.8</code>) for pushing to clients, if you don't like using this, change them to the one you like.</li>
<li>Change the domain in <code>leftid=vpn.ebornet.com</code> to the correct and resolved certificate hostname for your server, otherwise iOS and OS X clients won't work.</li>
</ul>
<p>Edit <code>/etc/strongswan.conf</code> by adding a line as shown:</p>
<div class="highlight"><pre><span></span><code><span class="n">charon</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">load_modular</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">yes</span><span class="w"></span>
<span class="w"> </span><span class="n">i_dont_care_about_security_and_use_aggressive_mode_psk</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">yes</span><span class="w"></span>
<span class="w"> </span><span class="c1">#Add the above, again not good, but required for most IKEv1 clients to function.</span><span class="w"></span>
<span class="w"> </span><span class="n">plugins</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">include</span><span class="w"> </span><span class="n">strongswan</span><span class="o">.</span><span class="n">d</span><span class="o">/</span><span class="n">charon</span><span class="o">/*.</span><span class="n">conf</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Edit <code>/etc/ipsec.secrets</code>:</p>
<div class="highlight"><pre><span></span><code>#<span class="k">For</span> <span class="nv">IKEv2</span>:
: <span class="nv">RSA</span> <span class="nv">server_key</span>.<span class="nv">pem</span>
<span class="nv">v2user</span> : <span class="nv">EAP</span> <span class="s2">"</span><span class="s">SomeComplicatedPassword</span><span class="s2">"</span>
# <span class="nv">Add</span> <span class="nv">more</span> <span class="nv">username</span> <span class="ss">(</span><span class="nv">e</span>.<span class="nv">g</span>, <span class="nv">v2user</span><span class="ss">)</span> <span class="nv">and</span> <span class="nv">password</span> <span class="ss">(</span><span class="nv">e</span>.<span class="nv">g</span>. <span class="nv">SomeComplicatedPassword</span><span class="ss">)</span> <span class="nv">pairs</span> <span class="nv">in</span> <span class="nv">this</span> <span class="nv">format</span>.
#<span class="k">For</span> <span class="nv">IKEv1</span>:
<span class="mi">99</span>.<span class="mi">99</span>.<span class="mi">99</span>.<span class="mi">99</span> <span class="nv">v1group</span> : <span class="nv">PSK</span> <span class="s2">"</span><span class="s">SomeStrangeSharedKey!</span><span class="s2">"</span>
<span class="sc">#99</span>.<span class="mi">99</span>.<span class="mi">99</span>.<span class="mi">99</span> <span class="nv">is</span> <span class="nv">your</span> <span class="nv">server</span> <span class="nv">IP</span>, <span class="nv">v1group</span> <span class="nv">is</span> <span class="nv">the</span> <span class="nv">IKEv1</span> <span class="nv">group</span> <span class="nv">name</span>, <span class="nv">SomeStrangeSharedKey</span><span class="o">!</span> <span class="nv">is</span> <span class="nv">the</span> <span class="nv">pre</span><span class="o">-</span><span class="nv">shared</span> <span class="nv">key</span> <span class="k">for</span> <span class="nv">the</span> <span class="nv">group</span>.
<span class="nv">v1user</span> : <span class="nv">XAUTH</span> <span class="s2">"</span><span class="s">AnotherComplicatedPassword</span><span class="s2">"</span>
# <span class="nv">Add</span> <span class="nv">more</span> <span class="nv">username</span> <span class="ss">(</span><span class="nv">e</span>.<span class="nv">g</span>, <span class="nv">v1user</span><span class="ss">)</span> <span class="nv">and</span> <span class="nv">password</span> <span class="ss">(</span><span class="nv">e</span>.<span class="nv">g</span>. <span class="nv">AnotherComplicatedPassword</span><span class="ss">)</span> <span class="nv">pairs</span> <span class="nv">in</span> <span class="nv">this</span> <span class="nv">format</span>.
</code></pre></div>
<p>Things to note when copy-pasting:</p>
<ul>
<li>For IKEv2, if your server certificate key is not <code>/etc/ipsec.d/private/server_key.pem</code>, change it to the right file name.</li>
<li>Change <code>99.99.99.99</code> to the IP of your server.</li>
<li>v1group is the IKEv1 group name that you can change.</li>
<li>Customise usernames and passwords in the above formats.</li>
</ul>
<p>Yes, plain text usernames and passwords are not great, but absent deploying a Radius server this is saving time for a private server. Compromise of a user's password does not compromise the EAP key exchange for other users. </p>
<p>Save everything, and we are all done.</p>
<p><strong>Configure iptables and traffic forwarding</strong></p>
<p>Depends on whether you like using things like <code>iptables-persistent</code> or plug everything into <code>/etc/rc.local</code>, I would apply the following iptables rules:</p>
<div class="highlight"><pre><span></span><code>iptables --table nat --append POSTROUTING --jump MASQUERADE
iptables -t nat -A POSTROUTING -s 10.1.1.0/24 -j SNAT --to-source 99.99.99.99 #Change to your server IP.
iptables -t nat -A POSTROUTING -s 10.1.2.0/24 -j SNAT --to-source 99.99.99.99 #Change to your server IP.
iptables -I FORWARD -m policy --dir in --pol ipsec --proto esp -j ACCEPT
iptables -I FORWARD -m policy --dir out --pol ipsec --proto esp -j ACCEPT
</code></pre></div>
<p>Remember to change <code>99.99.99.99</code> to the correct server IP of yours.</p>
<p>I have plugged them into <code>/etc/rc.local</code> before the <code>exit</code> line so that they would be automatically reapplied on reboot, if you would also like to do this, now is the time.</p>
<p>Edit <code>/etc/sysctl.conf</code>:</p>
<p>Uncomment the line for enabling IPv4 forwarding, so that it would look like this:</p>
<div class="highlight"><pre><span></span><code>net.ipv4.ip_forward=1
</code></pre></div>
<p>And apply the change:</p>
<div class="highlight"><pre><span></span><code>sysctl -p
</code></pre></div>
<p>If you are not using a Debian/Ubuntu distribution, the correct thing to do here may vary.</p>
<p>Now we just need to restart strongSwan, depends on what manages your services:</p>
<div class="highlight"><pre><span></span><code>#<span class="k">For</span> <span class="nv">systemd</span>:
<span class="nv">systemctl</span> <span class="nv">restart</span> <span class="nv">strongswan</span>
#<span class="k">For</span> <span class="nv">Ubuntu</span> <span class="nv">upstart</span>:
<span class="nv">service</span> <span class="nv">strongswan</span> <span class="nv">restart</span>
</code></pre></div>
<p>You may wish to check the logs to make sure that everything works, and if they do, then great, we are done.</p>
<p><strong>A note on EAP</strong></p>
<p>Yes, EAP key exchange is arguably not as secure as certificate authentication, but it saves so much hassle in things randomly not working because of improper client profile installation (potentially dangerous) or the inability to issue trusted client certificates. I consider this as a trade off.</p>
<p><strong>A special note on wildcard certificates</strong></p>
<p>Despite strongSwan developers being <a href="https://wiki.strongswan.org/issues/794#note-3">adamant</a> about not supporting wildcard certificates (such as <code>CN=*.ebornet.com</code>), there is a way to get it work.</p>
<p>If you use a wildcard certificate, in <code>/etc/ipsec.conf</code>, set <code>leftid</code> under <code>ikev2-mschapv2-apple</code> as the wildcard form of your domain, such as <code>*.example.com</code>, and when connecting from your client (more below), set server name as usual to be the resolved name of your server (such as <code>vpn.example.com</code>), but put <code>*.example.com</code> in as the remote ID. This is tested to work on iOS at the very least.</p>
<h4>Now Configure the Clients</h4>
<p>The guides are rough, please follow client system instructions.</p>
<p><strong>Windows 7/8/10</strong></p>
<p>Go to your Control Panel (the full one) -> Network and Sharing Centre -> Create a new connection or network -> Set up a VPN (wording varies).</p>
<p>Set server address to the certificate hostname resolved to your server, and some description of your choice. And continue until the wizard finishes -- we still need to change a few adaptor settings.</p>
<p>Now go to <code>Change Adaptor Settings</code>, right click on your newly-created VPN connection, choose <code>Properties</code>, and go through the tabs. Make sure that the type of the VPN is set to IKEv2, we use EAP and MS-CHAPv2, make it save your credentials but do not use system login credentials. Choose require encryption. Now we can click <code>OK</code>.</p>
<p>Now double click on your VPN connection, you will be prompted the IKEv2 username and password that you have set earlier. Enter them and you should be connected.</p>
<p><strong>iOS 9+ and OS X 10.11+</strong></p>
<p>Go to create a new VPN configuration (location varies), and set a description of your choice, <code>Server</code> as the certificate hostname resolved to your server (and <code>Remote ID</code> the same); <code>Local ID</code> does not matter in this case (I think), but I have set it to my IKEv2 username. </p>
<p>For Authentication, use <code>Username</code> for <code>User Authentication</code>, and enter your IKEv2 username and password set earlier. Click <code>Done</code> and it should be ready to connect.</p>
<p><strong>iOS 8</strong></p>
<p>iOS 8 supports IKEv2, but does not have a GUI for it yet. If you are still using iOS 8, you need to configure it with a configuration profile, see <a href="https://wiki.strongswan.org/projects/strongswan/wiki/AppleIKEv2Profile">the documentation</a> for more details.</p>
<p>You can of course, also choose to use "Cisco IPSec" (IKEv1), using the server hostname or IP, IKEv1 username and its password, group name and its shared secret as set earlier.</p>
<p><strong>iOS 7 or earlier and OS X 10.10 or earlier</strong></p>
<p>Out of luck, they have no native support for IKEv2. </p>
<p>However, you can use "Cisco IPSec" (IKEv1), using the server hostname or IP, IKEv1 username and its password, group name (e.g. <code>v1group</code>) and its shared secret as set earlier.</p>
<p><strong>Android (tested on 5.1+)</strong></p>
<p>strongSwan has an official VPN application for Android, download it from Play Store <a href="https://play.google.com/store/apps/details?id=org.strongswan.android">here</a>, it's free.</p>
<p>Configuration is straightforward, use EAP mode, and your server certificate hostname, your IKEv2 username and password. </p>
<p><strong>Linux Desktop (in this case, Ubuntu 16.04)</strong></p>
<p>On my Ubuntu 16.04 desktop, the default binary packages of <code>strongswan-nm</code> will <strong>not</strong> work, as they were not built correctly. To use IKEv2 on Ubuntu 16.04 desktop, manual builds of strongSwan and NetworkManager-strongswan are required. The followed is what I did and finally worked, if it does not work for you (good chances), please skip this part and try IKEv1 instead.</p>
<div class="highlight"><pre><span></span><code><span class="c1"># Build and install strongSwan</span><span class="w"></span>
<span class="n">cd</span><span class="w"> </span><span class="o">~/</span><span class="n">Downloads</span><span class="w"></span>
<span class="n">wget</span><span class="w"> </span><span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">download</span><span class="o">.</span><span class="n">strongswan</span><span class="o">.</span><span class="n">org</span><span class="o">/</span><span class="n">strongswan</span><span class="o">-</span><span class="mf">5.5</span><span class="o">.</span><span class="mf">1.</span><span class="n">tar</span><span class="o">.</span><span class="n">gz</span><span class="w"></span>
<span class="n">tar</span><span class="w"> </span><span class="n">zxvf</span><span class="w"> </span><span class="n">strongswan</span><span class="o">-</span><span class="mf">5.5</span><span class="o">.</span><span class="mf">1.</span><span class="n">tar</span><span class="o">.</span><span class="n">gz</span><span class="w"></span>
<span class="n">cd</span><span class="w"> </span><span class="n">strongswan</span><span class="o">-</span><span class="mf">5.5</span><span class="o">.</span><span class="mi">1</span><span class="w"></span>
<span class="o">./</span><span class="n">configure</span><span class="w"> </span><span class="o">--</span><span class="n">sysconfdir</span><span class="o">=/</span><span class="n">etc</span><span class="w"> </span><span class="o">--</span><span class="n">prefix</span><span class="o">=/</span><span class="n">usr</span><span class="w"> </span><span class="o">--</span><span class="n">libexecdir</span><span class="o">=/</span><span class="n">usr</span><span class="o">/</span><span class="n">lib</span><span class="w"> </span><span class="o">--</span><span class="n">disable</span><span class="o">-</span><span class="n">aes</span><span class="w"> </span><span class="o">--</span><span class="n">disable</span><span class="o">-</span><span class="n">des</span><span class="w"> </span><span class="o">--</span><span class="n">disable</span><span class="o">-</span><span class="n">md5</span><span class="w"> </span><span class="o">--</span><span class="n">disable</span><span class="o">-</span><span class="n">sha1</span><span class="w"> </span><span class="o">--</span><span class="n">disable</span><span class="o">-</span><span class="n">sha2</span><span class="w"> </span><span class="o">--</span><span class="n">disable</span><span class="o">-</span><span class="n">fips</span><span class="o">-</span><span class="n">prf</span><span class="w"> </span><span class="o">--</span><span class="n">disable</span><span class="o">-</span><span class="n">gmp</span><span class="w"> </span><span class="o">--</span><span class="n">enable</span><span class="o">-</span><span class="n">openssl</span><span class="w"> </span><span class="o">--</span><span class="n">enable</span><span class="o">-</span><span class="n">nm</span><span class="w"> </span><span class="o">--</span><span class="n">enable</span><span class="o">-</span><span class="n">agent</span><span class="w"> </span><span class="o">--</span><span class="n">enable</span><span class="o">-</span><span class="n">eap</span><span class="o">-</span><span class="n">gtc</span><span class="w"> </span><span class="o">--</span><span class="n">enable</span><span class="o">-</span><span class="n">eap</span><span class="o">-</span><span class="n">md5</span><span class="w"> </span><span class="o">--</span><span class="n">enable</span><span class="o">-</span><span class="n">eap</span><span class="o">-</span><span class="n">mschapv2</span><span class="w"> </span><span class="o">--</span><span class="n">enable</span><span class="o">-</span><span class="n">eap</span><span class="o">-</span><span class="n">identity</span><span class="w"></span>
<span class="n">make</span><span class="w"></span>
<span class="n">sudo</span><span class="w"> </span><span class="n">make</span><span class="w"> </span><span class="n">install</span><span class="w"></span>
<span class="c1">#Test charon-nm, make sure that it does not output any errors! (no output is fine):</span><span class="w"></span>
<span class="o">/</span><span class="n">usr</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">ipsec</span><span class="o">/</span><span class="n">charon</span><span class="o">-</span><span class="n">nm</span><span class="w"></span>
<span class="c1"># Build and install NetworkManager-strongswan:</span><span class="w"></span>
<span class="n">cd</span><span class="w"> </span><span class="o">~/</span><span class="n">Downloads</span><span class="w"></span>
<span class="n">wget</span><span class="w"> </span><span class="n">https</span><span class="p">:</span><span class="o">//</span><span class="n">download</span><span class="o">.</span><span class="n">strongswan</span><span class="o">.</span><span class="n">org</span><span class="o">/</span><span class="n">NetworkManager</span><span class="o">/</span><span class="n">NetworkManager</span><span class="o">-</span><span class="n">strongswan</span><span class="o">-</span><span class="mf">1.4</span><span class="o">.</span><span class="mf">1.</span><span class="n">tar</span><span class="o">.</span><span class="n">bz2</span><span class="w"></span>
<span class="n">tar</span><span class="w"> </span><span class="n">xjvf</span><span class="w"> </span><span class="n">NetworkManager</span><span class="o">-</span><span class="n">strongswan</span><span class="o">-</span><span class="mf">1.4</span><span class="o">.</span><span class="mf">1.</span><span class="n">tar</span><span class="o">.</span><span class="n">bz2</span><span class="w"></span>
<span class="o">./</span><span class="n">configure</span><span class="w"> </span><span class="o">--</span><span class="n">sysconfdir</span><span class="o">=/</span><span class="n">etc</span><span class="w"> </span><span class="o">--</span><span class="n">prefix</span><span class="o">=/</span><span class="n">usr</span><span class="w"> </span><span class="o">--</span><span class="n">with</span><span class="o">-</span><span class="n">charon</span><span class="o">=/</span><span class="n">usr</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">ipsec</span><span class="o">/</span><span class="n">charon</span><span class="o">-</span><span class="n">nm</span><span class="w"> </span><span class="c1">#Specifying charon-nm location seems to make it work?</span><span class="w"></span>
<span class="n">make</span><span class="w"></span>
<span class="n">sudo</span><span class="w"> </span><span class="n">make</span><span class="w"> </span><span class="n">install</span><span class="w"></span>
</code></pre></div>
<p>And after restarting the computer, use Network Manager to configure a "strongswan" VPN connection, use your server certificate, as well as EAP mode and your IKEv2 username and password.</p>
<p>I have no idea about other distributions, they may have pre-built packages that work out of the box. Or maybe you don't need Network Manager like I do, then there's no need of getting strongSwan to work with Network Manager.</p>
<p>To use the supported IKEv1 client on Ubuntu, install <code>network-manager-vpnc-gnome</code>, and set up a connection in the Network Manager using the server hostname or IP, IKEv1 username and its password, group name (e.g. <code>v1group</code>) and its shared secret as set earlier.</p>A Spoilt Ballot, and maybe a Security Audit2016-06-10T21:27:00+01:002016-06-10T21:27:00+01:00C Shitag:blog.chongya.ng,2016-06-10:/a-spoilt-ballot-and-maybe-a-security-audit.html<p>In case you are not aware, the <a href="https://yusu.org">university's student union YUSU</a> has recently conducted a <a href="https://www.yusu.org/blog/view/1572">referendum</a> on the continuation of its membership with the <a href="http://www.nus.org.uk/">National Union of Students</a>. While I cannot care less about the bits and bobs of why it is taking place, with the results announced, there is …</p><p>In case you are not aware, the <a href="https://yusu.org">university's student union YUSU</a> has recently conducted a <a href="https://www.yusu.org/blog/view/1572">referendum</a> on the continuation of its membership with the <a href="http://www.nus.org.uk/">National Union of Students</a>. While I cannot care less about the bits and bobs of why it is taking place, with the results announced, there is something interesting on the (electronic) ballot.</p>
<p>Among the votes cast, 1461 voted in favour of staying in, while 1233 voted to leave, with 46 abstentions. However, there is also one single spoilt ballot (which seems bizarre for electronic voting), for which YUSU has provided a seemingly ambiguous explanation:</p>
<blockquote>
<p>A spoilt ballot happens when a person casts a vote but not for one of the candidates listed. The electronic ballot is more technical but it is still possible to write a different candidate name (...) the vote will still be recorded, but as it's not a valid candidate the ballot gets marked as spoilt (...)</p>
</blockquote>
<p>Weird, isn't it? When the web-based voting system provides you with three options (Remain, Leave, Abstention), what else could you go? </p>
<p>As it turns out, there is a way, which really should not be possible with a competent web developer. In the <a href="http://www.yusu.org/docs/yusu-nus-referendum-report-2016.pdf">Report from the Deputy Returning Officer</a>, detailed statistics are provided by attributes such as gender and year of study. Because only one person has spoilt the ballot, it is possible to determine that the person is a third year Computer Science undergraduate student, who is male and registered to Langwith College (all from public statistics). In fact, a few of us know who he is, but it is not appropriate to name the hero here.</p>
<p>However, my speculation of how this was done has been partly confirmed. The vote page is basically an HTML form with a input field containing the option chosen, whose value is dependent on the user selection among the provided options. It is very likely that YUSU's system does not actually check if the value submitted by the HTML form is among the available options. Therefore, it is possible for the user to modify the form and send any value to the system. I am fairly certain that this is not an intentional design, as otherwise there will be an input field "Non of the above:" provided along with the given options. </p>
<p>While I agree that treating a "modified" ballot as spoilt is the only appropriate way, I do really hope that the user input is either <a href="https://en.wikipedia.org/wiki/Secure_input_and_output_handling">escaped</a> or bound to parameters before accepted into the database, otherwise any person could manipulate the database in whatever way they want, and gain access to sensitive student data available to YUSU (and I also hope that YUSU is not using the likes of <code>mysql_real_escape_string</code>). </p>
<p>I still wonder if there are other parts of the YUSU website and systems that have lurking flaws just like this that may unearth one day. Suggestion? Get a security audit.</p>Can the government simply ban encryption?2015-01-22T00:57:00+00:002015-01-22T00:57:00+00:00C Shitag:blog.chongya.ng,2015-01-22:/can-the-government-simply-ban-encryption.html<p>Under the background of nations calling for strict legislations on data retention and surveillance, I attempt to explain as an amateur in cryptography, how this may or may not work, and what can you do to keep your messages safe and secure.</p>
<h2>What happened?</h2>
<p>The <a href="https://en.wikipedia.org/wiki/War_on_Terror">Global War on Terror</a> has …</p><p>Under the background of nations calling for strict legislations on data retention and surveillance, I attempt to explain as an amateur in cryptography, how this may or may not work, and what can you do to keep your messages safe and secure.</p>
<h2>What happened?</h2>
<p>The <a href="https://en.wikipedia.org/wiki/War_on_Terror">Global War on Terror</a> has never stopped since 9/11, and intelligence agencies' access to private communications have always been controversial. <a href="https://en.wikipedia.org/wiki/Global_surveillance_disclosures_(2013%E2%80%93present)">Edward Snowden's Whistleblowing</a> has revealed to the world that how the <a href="https://en.wikipedia.org/wiki/Five_Eyes">"Five Eyes"</a> and other nations' intelligence agencies have implemented surveillance measures -- many being unwarranted, on civilian communications. The recent <a href="https://en.wikipedia.org/wiki/Charlie_Hebdo_shooting">terrorist attack on Charlie Hebdo Magazine</a> has, once again, provided a chance for governments of nations to call for more legislated powers on intelligence surveillance.</p>
<p>Prime Minister Mr David Cameron has <a href="http://www.telegraph.co.uk/technology/internet-security/11340621/Spies-should-be-able-to-monitor-all-online-messaging-says-David-Cameron.html">made an interesting statement</a> on legislation on eavesdropping of encryption, which sparked <a href="http://www.bbc.co.uk/news/technology-30794953">arguments</a> on whether it being practical, or even operable.</p>
<p>In favour of reintroducing the <a href="https://en.wikipedia.org/wiki/Draft_Communications_Data_Bill">Communications Data Bill</a> (draft stroke down in 2012), Mr Cameron has made the following statement: "...Do we allow terrorists safer spaces for them to talk to each other? I say we don’t – and we should legislate accordingly." He also added: "...in our country, do we want to allow a <strong>means of communication between people</strong> which even <em>in extremis</em>, with a signed warrant from the home secretary personally, <strong>that we cannot read?</strong>" This part of his claim has sparked arguments.</p>
<p>I will now attempt to explain, from the viewpoint of an amateur in cryptography, about how this may or may not work.</p>
<p><em>Disclaimer: All the information included are compiled from public knowledge and sources. The author introduces the followed knowledge strictly for the public interest in Computer Science and Cryptography. In no way does the author suggest, or encourage the reader to conduct any illegal communications, and conducting such communications can always be deemed as a criminal offence.</em></p>
<h2>Can Your Messages Be Read?</h2>
<p>In most cases, the answer is yes. There's always one way or another, that can reveal what you have typed into your tiny or big screen. The Electronic Frontier Foundation (EFF, an organisation for the protection of digital rights) has published <a href="https://www.eff.org/secure-messaging-scorecard">a comparison report</a> of popular instant messaging services, on their security measures and transparency.</p>
<p>It is very clear that most instant messaging services do encrypt your chat messages in transit, mostly through <a href="https://en.wikipedia.org/wiki/Transport_Layer_Security">Transport Layer Security</a> (TLS), which provides certificate identity authentication and data encryption between you and your messaging service provider's servers. You are probably familiar with Skype, WhatsApp or Snapchat. </p>
<p>However, many of these providers don't really implement encryption measures of your chat data on their end. A significant <a href="http://www.independent.co.uk/life-style/gadgets-and-tech/news/the-snappening-snapsave-admits-security-breach-but-says-only-500mb-of-images-leaked-9794488.html">security breach</a> happened to Snapchat last year, where attackers leaked huge amount of user chat log, even uploaded photos (well, your photo disappears from your phone after seconds, but it's not erased from existence on their servers). WhatsApp has gone through <a href="https://en.wikipedia.org/wiki/WhatsApp#Security">similar problems</a> with the risk of leaking user information. These sort of security breaches have nothing to do with intelligence surveillance, but they do reveal how weak your uploaded data is protected from hacking.</p>
<p>Many providers did implement security measures to encrypt your stored data, in the case of a security breach. However, this does not mean that your data is secured. For example, The aforementioned TLS combines Key-exchange, Cipher Encryption and Data Integrity Verification. However, Key-exchange through protocols like <a href="https://en.wikipedia.org/wiki/RSA_(cryptosystem)">RSA</a> or <a href="https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange">Diffie-Hellman</a> are vulnerable to <a href="https://en.wikipedia.org/wiki/Man-in-the-middle_attack">Man-in-the-middle Attack</a> (MITM), which can be performed by a computer in a more privileged position in your network. This computer can be owned by your network administrator, or, unsurprisingly, intelligence agencies with some control over your Internet Service Provider (ISP)'s network. MITM can be performed by impersonating as the other party to both you and the server you intended to communicate with, and reveal the supposedly encrypted data in between. A famous example would be presenting a fake certificate of your server in TLS/SSL handshake. </p>
<p>Even though <a href="https://hg.mozilla.org/mozilla-central/raw-file/tip/security/nss/lib/ckfw/builtins/certdata.txt">pre-defined trust-store</a> of trusted <a href="https://en.wikipedia.org/wiki/Certificate_authority">Certificate Authorities</a> (CA) and <a href="https://www.owasp.org/index.php/Certificate_and_Public_Key_Pinning">certificate/key pinning</a> can largely prevent MITM from happening, problems can still occur. For example, intelligence (somehow) <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=542689">gets hold of a certificate wrongfully issued by a trusted CA</a>; or your cooperative / malware-infected machine runs a screen/keyboard recording bloatware. (You really shouldn't transmit confidential personal information on your managed work machine)</p>
<p>One of the solutions to these security issues is, bluntly, let's make data encrypted before they are transmitted! Unfortunately, this isn't always successful in keeping the information safe. </p>
<p>A notable example would be Apple's new iMessage service (iOS 8). Apple has implemented what's called <a href="https://en.wikipedia.org/wiki/End-to-end_encryption">end-to-end encryption</a>, which ensures that a user's message has already been encrypted before being sent via Apple's servers, and Apple "cannot" read what the user has written. However, a <a href="http://blog.quarkslab.com/imessage-privacy.html">loophole in Apple's security mechanisms</a> has been discovered by researchers at QuarksLab, demonstrating how the secrecy can be broken. Push server's certificate-pinning is not performed at client, making MITM attacks possible, not mentioning that Apple theoretically could perform such attacks themselves to reveal information. Even worse, user's Apple ID credentials are sent in TLS tunnelled clear-text, meaning that if TLS secrecy is broken, someone else can log in as the user, to impersonate as him in future conversations.</p>
<p>So far, we have looked at services that have, or potentially have security flaws. I shall now introduce some possible solutions that are still considered to be secure. (Keep in mind: they might be proven insecure in the future, especially if one of their building blocks, like OpenSSL, invokes a <a href="https://en.wikipedia.org/wiki/Heartbleed">security exploit</a>.)</p>
<h2>What Could Protect Your Messages?</h2>
<p>We have seen that improperly (<a href="https://en.wikipedia.org/wiki/Dual_EC_DRBG#Software_and_hardware_which_contained_the_possible_backdoor">intentional</a> or not) implemented security measures can harm the secrecy of your messages, and established that a well-protected transmission must have secure authentication, encryptions performed by strong ciphers and carefully done verifications. </p>
<p>It is worth mentioning that public knowledge of how an encryption mechanism work is very important, and the secrecy should be proven by public cryptanalytic research. </p>
<p>A good example of doing authentication safely is <a href="https://crypto.cat/">Cryptocat</a>. In addition to the implementations of [Off-the-Record Messaging] (https://en.wikipedia.org/wiki/Off-the-Record_Messaging) (OTR) and <a href="https://en.wikipedia.org/wiki/Forward_secrecy#Perfect_forward_secrecy">Perfect Forward Secrecy</a>, there is also a function for two parties of the conversation to exchange personal questions that only the other party would know how to answer, thus establishing a trusted conversation over public relays. The identity authentication is not done mathematically, but by human nature -- even though this could cause badly-designed personal questions vulnerable to <a href="https://en.wikipedia.org/wiki/Social_engineering_(security)">social engineering</a>.</p>
<p>A even better way to communicate securely and effectively is encrypted and authenticated emails done through <a href="https://en.wikipedia.org/wiki/GNU_Privacy_Guard">GNU Privacy Guard</a> (GnuPG) based on Phil Zimmermann's <a href="https://en.wikipedia.org/wiki/Pretty_Good_Privacy">Pretty Good Privacy</a> (PGP). In fact, you can start sending PGP signed and encrypted emails today. </p>
<p>The authentication of identity is done via digital signatures in a <a href="https://en.wikipedia.org/wiki/Public-key_cryptography">public-key distribution system</a>. Your public key, unique to your secret private key, is signed by people who trust you -- who in practise should have seen you in real life and have verified your identity, thus establishing a <a href="https://en.wikipedia.org/wiki/Web_of_trust">web of trust</a> that people who signed your key vouch for your ownership of that key. This way, people receiving your encrypted message can decrypt the message with their private key, and verify the authenticity of the message with your trusted public key. </p>
<p>The encryption ciphers and verification mechanisms in GPG are also proven to be robust. If you use GPG/PGP properly, your authentic email (or other messages) will be encrypted and protected before being sent through your email provider, thus preventing others from knowing the content of your email.</p>
<p>You can check <a href="https://www.gnupg.org/">GnuPG</a> and <a href="http://www.gpg4win.org/">Gpg4win</a> or <a href="https://gpgtools.org/">GPGTools</a> to start using GPG/PGP. Remember: <strong>don't sign keys of people who you don't trust</strong> -- otherwise you are breaking the web of trust; and keep your private key(s) encrypted and secure.</p>
<h2>Back to the Question</h2>
<p><strong>So, can there be messages on the Internet that no outsider can read?</strong></p>
<p>Yes, <strong>but only if you do it properly</strong>, and given that there isn't a court warrant issued to <a href="https://en.wikipedia.org/wiki/Lavabit#Suspension_and_gag_order">take over your private key</a>.</p>
<p>I shall not go into the arguments of whether mass surveillance is beneficial, or whether strict legislations on retention is good. However, when all of these are still legal in the UK (using them legally, of course), I wish to write something that can lead people to more knowledge in this area, which is what I have done today.</p>
<p><em>Content correct at the time of writing, to the best knowledge of the author.</em></p>
<p><em>I do appreciate mistakes being pointed out, please contact doge [AT] ebornet.com in that case.</em></p>