Initial server setup

On this blog, we will focus mostly on these three big players in cloud computing: AWS (Amazon Web Services), Google Cloud, Microsoft Azure.

While AWS is still the most dominant power on the cloud computing market, Google Cloud and Microsoft Azure have a lot to offer. We will use Cloud Dataproc, Amazon EMR, HDInsight and a lot more.

Most of the time we are going to deploy our software in the containers. Cloud providers offer a lot of managed solutions for this use case, for example Google Kubernetes Engine, Amazon Elastic Container Service or Microsoft Azure Containers.

If you are interested in Cloud Computing market share, please take a look at market research article from Synergy Research Group.

Cloud Computing market share

But today, I wanted to do something different. For research purposes, we are going setup Kubernetes cluster on clean systems.

1. Choosing hosting provider

First of all, we need to choose where are we going to deploy our cluster. You can do that on the computer in your home, dedicated server or on VPS using any cloud provider you like. For example, I really like both DigitalOcean and Linode for personal projects, while lately, I’ve been preferring the latter. I recommend you choose the provider that has private networking, allowing server located in the same datacenter to communicate directly. This might require additional firewall security configuration, depending on how private networking options are configured.

I’m going to use two dedicated Intel® Core i7-4770 Quad-Core 32 GB DDR3 RAM servers connected using 10 Gbit LAN, hosted on Hetzner. You can do that same on VPS servers, there is no different for our purposes.

There are few problems with having cluster size so small. First of all, failure tolerance. Internally, Kubernetes uses etcd key-value store, which doesn’t guarantee cluster health with cluster this small (it won’t be able to function properly with the death of any node). Please take a look at their FAQ:

CLUSTER SIZE MAJORITY FAILURE TOLERANCE
1 1 0
2 2 0
3 2 0

Also, our Kubernetes cluster will have 2 kube-masters and two kube-nodes located on two nodes with private network. This kind of setup is tolerable only for development. But this gives us powerful hardware to work with at cheap price.

2. Choosing Operation System

A lot of people love Ubuntu, and I’m one of them. It’s began as a user-friendly desktop system and now it has a significant market share in the server market. It has a large community and it’s generally easier to get started with Ubuntu. Having Ubuntu as your development system often gives you an advantage of having latest and greatest versions of the software, but it also more prone to bugs.

CentOS, on the other hand, is a free clone of Red Hat Enterprise Linux, which has the distinction of being the most widely-supported distribution in corporate IT.

CentOS is (arguably) more stable and secure, as it has less frequent updates, that means that the software is tested for a longer period of time before it gets released. Because of that, we are going to use CentOS 7 exclusively in this course.

3. Automation

Let’s imagine we have few servers that we are going to use for our experiments with clean CentOS 7.4 minimal installed. What to do next ? What is the first thing you should do?

We don’t want to install our system manually. We want to automate our work as much as possible, especially in the multi-server environment. We don’t want to install the same software on every server manually. It’s a tedious work and it’s not what this course is about.

Ansible is free open-source software with huge community that is able to help us with automating our workflow. For example, kubespray (an ansible playbook), is the easiest way to install Kubernetes cluster on baremetal system.

Also, all new servers usually require the same tasks done over and over again:

  1. Upgrading the system
  2. Setting hostname
  3. Configuring mail delivery
  4. Creating new users/groups
  5. Changing default ssh settings
  6. Setting up firewall
  7. Setting up backups
  8. Installing monitoring system
  9. Setup WireGuard VPN
  10. and much more …

In the case when you want to move to different hosting provider, you will have to do that all over again. So it only makes sense that we automate as much work as possible, and focus on delivering quality software, not worrying about ops too much.

4. Tutorial

Let’s create very basic Ansible playbook file, named initial.yaml, what will do just basic server configuration, that will look like this:

- hosts: ScalableSystemDesign
  user: root
  roles:
    - initial
    - kernel
    - wireguard
    - postfix
    - user
    - yumcron
  tasks:
    # finish all transactions for kubespray
    - yum:
        name: yum-utils
        state: present
        
    - command: yum-complete-transaction --cleanup-only
      become: true

A full version of this playbook you can find here. Let’s take a closer look each at of these roles.

But first, take a look at how our inventory might look like for this setup

[cluster]
s1.scalablesystem.design
s2.scalablesystem.design
s3.scalablesystem.design

4.1. initial

This is the most basic role that I like to have in every Ansible setup I use.

First of all, it will try to update system using yum.

- name: upgrade all packages
  yum: name=* state=latest

Secondly, it will generate en_US.UTF-8 locale and set it as default one.

- name: Create en_US.UTF-8 locale
  command: localedef -i en_US -c -f UTF-8 en_US.UTF-8
  changed_when: false

- name: locale fix
  lineinfile: dest=/etc/environment line="LC_ALL="en_US.utf-8"" state=present

Third, and very important one, is that we are setting hostname of the system.

- name: set hostname
  hostname:
    name: "{{ inventory_hostname }}"

Please notice, that CentOS 7 introduced some changes on how you can set hostname, adding hostnamectl command.

Also, we need to disable swap for Kubernetes.

- name: swap - remove current swaps from fstab
  lineinfile:
    dest: /etc/fstab
    regexp: '^/[\S]+\s+none\s+swap '
    state: absent

- name: swap - disable swap
  command: swapoff --all
  ignore_errors: yes

As the last step, we will create bind mount1 for Kubernetes Local Volumes:

- name: Create data directory for mounting
  - name: Create data directory for mounting
    file:
      owner: root
      group: root
      path: "/data/vol{{ item }}"
      state: directory
    with_sequence: start=1 end={{ volumes }}

  - name: Bindmounting local directory for Kubernetes mounting
    mount:
      path: "/mnt/disks/vol{{ item }}"
      src: "/data/vol{{ item }}"
      opts: bind
      fstype: none
      state: mounted
    with_sequence: start=1 end={{ volumes }}

4.2. kernel

Upgrade Kernel from ELrepo, so we can install WireGuard VPN and reboot.

4.3. wireguard

Install WireGuard VPN.

4.4. postfix

I have a fairly complicated email delivery setup that uses mix of Mandrill, Google G Suite and Amazon Route 53.

This role will setup basic Postfix relaying system using Mandrill.

4.5. user

This role will create user on remote system and add it to sudoers. Also, it will add key_file.pub public key located in the current directory to authorized_keys. This will not disable root ssh login, as it will be done in the next lesson.

$ ssh-keygen -t rsa -b 4096 -C "your_email@example.com"

4.6. yumcron

Enable automatic security updates.

Then you can go to url http://127.0.0.1:19998/.

5. Running ansible

Now, let’s run our playbook using this command, which will prompt you asking root password. Password must be the same for all servers.

$ ansible-playbook -i hosts initial.yml --ask-pass

After ansible playbook is finished running, your servers are ready for use.

  1. Using bind mounts you can mount part of an already-mounted filesystem to another location, and have the filesystem accessible from both mount points. 

Updated: