鉴于写博客比较耗费时间,所以这个版本比较简单,实际工作中会分role处理,也会有更完善的容错逻辑。待有空再发新的博客上传改进后的代码

使用 ansible 搭建一个多 master 多 worker 的 k8s 集群

kubernetes + istio 是目前最强大,也是最易于使用的服务网格方案。要使用kubernetes + istio, 首先要搭建 kubernets 集群。搭建kubernetes 集群的方式有很多,其中使用 anisble 自动化搭建 kubernetes 集群的方案非常便捷、可靠。

服务器列表

VIP 192.168.2.111

HOSTROLEIPCPUMEMORY
k8s-lvs-01LVS MASTER192.168.2.582C4G
k8s-lvs-02LVS BACKUP192.168.2.2332C4G
k8s-main-01K8S MASTER192.168.2.854C8G
k8s-main-02K8S MASTER192.168.2.1554C8G
k8s-main-03K8S MASTER192.168.2.2544C8G
k8s-node-01K8S WORKER192.168.2.1104C8G
k8s-node-02K8S WORKER192.168.2.2144C8G
k8s-node-03K8S WORKER192.168.2.364C8G

1、在工作机上安装 ansible

GitHub:

1.1、在 Linux 上安装 ansible:

安装之前可以先更新下apt源

sudo apt-get update

安装 ansible:

sudo apt-get install ansible

在ansible中使用密码方式设置集群必须要安装sshpass,如果不使用密码模式可以不安装:

sudo apt-get install sshpass

如果 apt 找不到ansible和sshpass,可以设置 源后再安装,参考链接:

1.2、在 macOS 上安装 ansible

从 AppStore 安装 Xcode 后执行以下命令

xcode-select --install

安装 :

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

安装ansible:

brew install --verbose ansible

要安装 sshpass,需要 使用 baocang/delicious 库:

brew tap baocang/delicious

安装 sshpass:

brew install --verbose sshpass

sshpass是开源的,baocang/delicious是以源码编译的方式安装。

1.3、为 anisble 编写 hosts 文件

使用自己习惯用的文件编辑器编辑hosts.ini, 并输入以下内容:

[all:vars]
kubernetes_vip=192.168.2.111
keepalived_master_ip=192.168.2.58

[lvs]
k8s-lvs-01 ansible_host=192.168.2.58 ansible_ssh_port=22 ansible_ssh_user=ansible ansible_ssh_pass="P@ssw0rd" ansible_sudo_pass="P@ssw0rd"
k8s-lvs-02 ansible_host=192.168.2.233 ansible_ssh_port=22 ansible_ssh_user=ansible ansible_ssh_pass="P@ssw0rd" ansible_sudo_pass="P@ssw0rd"

[main]
k8s-main-01 ansible_host=192.168.2.85 ansible_ssh_port=22 ansible_ssh_user=ansible ansible_ssh_pass="P@ssw0rd" ansible_sudo_pass="P@ssw0rd"

[masters]
k8s-main-02 ansible_host=192.168.2.155 ansible_ssh_port=22 ansible_ssh_user=ansible ansible_ssh_pass="P@ssw0rd" ansible_sudo_pass="P@ssw0rd"
k8s-main-03 ansible_host=192.168.2.254 ansible_ssh_port=22 ansible_ssh_user=ansible ansible_ssh_pass="P@ssw0rd" ansible_sudo_pass="P@ssw0rd"

[workers]
k8s-node-01 ansible_host=192.168.2.110 ansible_ssh_port=22 ansible_ssh_user=ansible ansible_ssh_pass="P@ssw0rd" ansible_sudo_pass="P@ssw0rd"
k8s-node-02 ansible_host=192.168.2.214 ansible_ssh_port=22 ansible_ssh_user=ansible ansible_ssh_pass="P@ssw0rd" ansible_sudo_pass="P@ssw0rd"
k8s-node-03 ansible_host=192.168.2.36 ansible_ssh_port=22 ansible_ssh_user=ansible ansible_ssh_pass="P@ssw0rd" ansible_sudo_pass="P@ssw0rd"

[kubernetes:children]
main
masters
workers

如果某个服务器的 ssh 是22,可以省略ansible_ssh_port

以上内容 使用[lvs][main][masters][workers] 对服务器进行分组,格式如下:

IP地址ssh端口号ssh用户名登录ssh时使用的密码执行sudo时使用的密码

[kubernetes:children] 是将[lvs][main][masters][workers] 组中的服务器组合并放到名为kubernetes的组下。

默认还会有一个all组,包含所有的服务器列表

[all:vars]是为了定义变量,比如这里用变量kubernetes_vip存储 VIP,使用keepalived_master_ip存储keepalived的 MASTER 节点 IP.

1.4、编写一个简单的ansible配置文件

这个配置夜会在每个服务器上创建一个.test.txt文件:

之后的配置不再指定文件名称,可以在全文最后找到完整的配置来执行。

文件名:demo-anisble-playbook.yml`

---
- name: Demo
  hosts: lvs
  become: yes
  tasks:
    - name: Write current user to file named .test.txt
      shell: |
        echo `whoami` > .test.txt

然后执行以下命令:

ansible-playbook -i hosts.ini demo-anisble-playbook.yml

得到如下输出:

PLAY [Demo] *********************************************************************************

TASK [Gathering Facts] **********************************************************************
ok: [k8s-lvs-02]
ok: [k8s-lvs-01]

TASK [Write current user to file named .test.txt] *******************************************
changed: [k8s-lvs-01]
changed: [k8s-lvs-02]

PLAY RECAP **********************************************************************************
k8s-lvs-01                : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
k8s-lvs-02                : ok=2    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

执行完成后,在 lvs 的两个服务器上都有一个名为.test.txt的文件,内容为root,因为become: yes指定了默认以sudo方式运行。

echowhoami > .test.txt改为rm -rf .test.txt后再执行一遍即可完成清理工作

2、配置 LVS 服务器

2.1、什么是 VIP

VIP(Virtual IP Address,虚拟 IP 地址)是网络中一个不直接分配给某个特定网络接口的 IP 地址。在多种网络服务和应用中,VIP 通常用于实现高可用性、负载均衡和故障转移。

VIP 在不同场景中的应用:

  1. 负载均衡
    • 在负载均衡器(如 LVS)中,VIP 作为对外的统一访问点。客户端通过 VIP 连接服务,而负载均衡器将请求分发给后端的多个服务器。
    • 这样,即使某个后端服务器出现问题,负载均衡器仍然可以通过将流量重定向到其他健康服务器来保持服务的连续性。
  2. 高可用性集群
    • 在高可用性(HA)集群中,VIP 可以在多个节点间迁移,通常与故障检测机制一起使用。
    • 当主节点发生故障时,VIP 可以迅速移动到备用节点上,从而实现无缝的故障转移。
  3. 网络虚拟化
    • 在网络虚拟化环境中,VIP 用于抽象化物理服务器的实际网络地址。
    • 这允许在不同的物理服务器之间灵活地迁移虚拟机或服务。

VIP 的优势:

  • 灵活性:提供了一种机制,使得可以在不同的服务器或网络设备之间迁移服务,而无需修改客户端的配置。
  • 可用性:在故障检测和故障转移机制的帮助下,VIP 增强了服务的可用性和可靠性。
  • 简化管理:对外提供单一访问点,简化了网络管理和客户端配置。

注意事项:

  • VIP 需要正确配置和管理,以确保其在所需的网络场景中有效工作。
  • 在某些设置中,可能需要考虑安全性和权限控制,以防止非授权访问。
  • VIP 的配置和管理通常需要网络管理的专业知识。

2.2、LVS 工作模式

Linux Virtual Server (LVS) 提供了几种不同的工作模式,每种模式都有其特定的用途和网络流量处理方式。以下是 LVS 的主要工作模式:

  1. NAT (Network Address Translation) 模式
    • 在此模式下,LVS 服务器修改经过的数据包的源或目的 IP 地址。
    • 后端服务器将看到 LVS 服务器的 IP 地址作为客户端请求的源地址。
    • 回应包发回给 LVS 服务器,然后 LVS 服务器再将包发送回原始客户端。
  2. DR (Direct Routing) 模式
    • 客户端数据包的目的 IP 地址保持不变,直接路由到后端服务器。
    • 后端服务器响应直接发回客户端,绕过 LVS 服务器。
    • 此模式下,源 IP 地址保持不变。
  3. TUN (IP Tunneling) 模式
    • 数据包被封装在 IP 隧道中,然后发送到后端服务器。
    • 后端服务器解封装数据包,然后直接回应给客户端。
    • 此模式下,原始数据包的源和目的 IP 地址保持不变。
  4. Masquerade (Masq) 模式
    • 这是基于 NAT 的变体,其中 LVS 服务器将数据包的源 IP 地址替换为自己的 IP 地址。
    • 后端服务器看到的是 LVS 服务器的 IP 地址,而不是客户端的 IP 地址。
    • 回应包首先发回给 LVS 服务器,然后再由 LVS 服务器转发给客户端。

每种模式都有其独特的应用场景和优势。选择哪种模式取决于具体的需求,比如是否需要保留源 IP 地址、网络拓扑结构、以及性能考虑等。通常,DR 和 TUN 模式在性能方面表现更好,因为它们减少了 LVS 服务器的网络流量负载。然而,这些模式可能需要在网络配置上进行更多的调整。相反,NAT 和 Masquerade 模式更易于配置,但可能会影响性能,并且不保留原始的源 IP 地址。

2.3、LVS 负载均衡算法

Linux Virtual Server (LVS) 提供了多种负载均衡算法,用于决定如何将进入的请求分配到不同的后端服务器上。这些算法各有特点,适用于不同的场景。以下是 LVS 中常见的几种负载均衡算法:

  1. 轮询 (Round Robin, RR)
    • 按顺序依次将请求分配给每个服务器。
    • 当列表到达末尾时,会再次从头开始。
    • 适用于所有服务器处理能力相近的情况。
  2. 加权轮询 (Weighted Round Robin, WRR)
    • 类似于轮询,但每个服务器都有一个权重。
    • 服务器的处理能力越高,分配给它的请求就越多。
    • 适用于服务器性能不均的情况。
  3. 最少连接 (Least Connections, LC)
    • 将新请求分配给当前连接数最少的服务器。
    • 这种方式可以更公平地分配负载,特别是在会话长度不一的情况下。
  4. 加权最少连接 (Weighted Least Connections, WLC)
    • 结合了最少连接和服务器权重的特点。
    • 选择连接数与权重比例最低的服务器。
    • 更适合处理能力不均匀的服务器群。
  5. 基于局部性的最少连接 (Locality-Based Least Connections, LBLC)
    • 针对具有会话亲和性(session affinity)或持久性(persistence)的应用。
    • 尝试将来自同一客户端的请求发送到同一服务器。
  6. 基于局部性的加权最少连接 (Locality-Based Weighted Least Connections, LBLCR)
    • 类似于 LBLC,但加入了服务器权重。
    • 用于会话亲和性应用的不均匀环境。
  7. 目的 IP 哈希 (Destination Hashing, DH)
    • 基于请求的目的 IP 地址来分配请求。
    • 每个请求都固定分配给某个服务器,适合需要强会话亲和性的应用。
  8. 源 IP 哈希 (Source Hashing, SH)
    • 基于请求的源 IP 地址来分配请求。
    • 保证来自同一源 IP 的请求总是分配到同一服务器,适合需要会话亲和性的场景。

根据应用需求和服务器性能,可以选择最适合的负载均衡算法。例如,如果服务器性能大致相同,轮询或加权轮询可能是个好选择;如果服务器性能不同,可以考虑使用加权最少连接算法。对于需要会话持久性的应用,基于哈希的算法可能更加适合。

两台lvs 服务器都需要安装ipvsadm和keepalived组件,其中ipvsadm用于管理和想看ipvs规则,keepalived用于管理VIP和生成ipvs规则,进行健康检查等。

2.4、编写 keepalived.conf.j2 模板文件

在k8s-setup.yml文件同目录下的resources目录中创建keepalived.conf.j2 文件

如果没有机器名,可以将 ansible_hostname 改为判断 ansible_host 或 ansible_default_ipv4.address 用IP判断

文件名:resources/keepalived.conf.j2

vrrp_instance VI_1 {
    state {{ 'MASTER' if ansible_host == keepalived_master_ip else 'BACKUP' }}
    interface ens160
    virtual_router_id 51
    priority {{ 255 if ansible_host == keepalived_master_ip else 254 }}
    advert_int 1

    authentication {
        auth_type PASS
        auth_pass 123456
    } 

    virtual_ipaddress {
        {{ kubernetes_vip }}/24
    }
}


# masters with port 6443
virtual_server {{ kubernetes_vip }} 6443 {
    delay_loop 6
    lb_algo wlc
    lb_kind DR
    persistence_timeout 360
    protocol TCP

{% for host in groups['main'] %}

    # {{ host }}
    real_server {{ hostvars[host]['ansible_host'] }} 6443 {
        weight 1
        SSL_GET {
            url {
                path /livez?verbose
                status_code 200
            }
            connect_timeout 3
            nb_get_retry 3
            delay_before_retry 3
        }
    }
{% endfor %}
{% for host in groups['masters'] %}

    # {{ host }}
    real_server {{ hostvars[host]['ansible_host'] }} 6443 {
        weight 1
        SSL_GET {
            url {
                path /livez?verbose
                status_code 200
            }
            connect_timeout 3
            nb_get_retry 3
            delay_before_retry 3
        }
    }
{% endfor %}
}

# workers with port 80
virtual_server {{ kubernetes_vip }} 80 {
    delay_loop 6
    lb_algo wlc
    lb_kind DR
    persistence_timeout 7200
    protocol TCP
{% for host in groups['workers'] %}

    # {{ host }}
    real_server {{ hostvars[host]['ansible_host'] }} 80 {
        weight 1
        TCP_CHECK {
            connect_timeout 10
            connect_port 80
        }
    }
{% endfor %}
}

# workers with port 443
virtual_server {{ kubernetes_vip }} 443 {
    delay_loop 6
    lb_algo wlc
    lb_kind DR
    persistence_timeout 7200
    protocol TCP

{% for host in groups['workers'] %}

    # {{ host }}
    real_server {{ hostvars[host]['ansible_host'] }} 443 {
        weight 1
        TCP_CHECK {
            connect_timeout 10
            connect_port 443
        }
    }
{% endfor %}
}

vrrp_instance 指定了实例名为 VI_1,在 k8s-lvs-01 上配置为 MASTER,在其余机器上配置为BACKUP

interface ens160 是当前网卡上的接口名称,通过 ip addr 或 ifconfig 命令可获取,输出中带有当前节点IP地址的就是,初始情况下,有一个回路lo接口和另一个类似 ens160、en0之类的接口

advert_int 1 指定了Keepalived发送VRRP通告的时间间隔为1秒

priority 一般在 BACKUP节点上应该低于 MASTER节点

virtual_ipaddress中配置的就是VIP(Virtual IP)

virtual_server 用于定义 虚拟服务器,一个虚拟服务器下有多个真实的服务器(real_server)

lb_algo wlc 指定了负载均衡算法
lb_kind DR 指定了使用 Direct Routing 模式路由数据

{% for host in groups['masters'] %} 这些语句顶格写是为了防止出现锁进错误

2.5、 ipvsadm 和 keepalived 的 ansible 配置文件

---
- name: Setup Load Balancer with IPVS and Keepalived
  hosts: lvs
  become: yes
  tasks:
    # Upgrade all installed packages to their latest versions
    - name: Upgrade all installed apt packages
      apt:
        upgrade: 'yes'
        update_cache: yes
        cache_valid_time: 3600  # Cache is considered valid for 1 hour

    # Install IP Virtual Server (IPVS) administration utility
    - name: Install ipvsadm for IPVS management
      apt:
        name: ipvsadm
        state: present

    # Install keepalived for high availability
    - name: Install Keepalived for load balancing
      apt:
        name: keepalived
        state: present

    # Deploy keepalived configuration from a Jinja2 template
    - name: Deploy keepalived configuration file
      template:
        src: resources/keepalived.conf.j2
        dest: /etc/keepalived/keepalived.conf

    # Restart keepalived to apply changes
    - name: Restart Keepalived service
      service:
        name: keepalived
        state: restarted

3、安装 kubernetes 和 containerd.io

3.1、各个组件的功能

  1. kubelet
    • kubelet 是运行在所有 Kubernetes 集群节点上的一个代理,负责管理该节点上的容器。
    • 它监控由 API 服务器下发的指令(PodSpecs,Pod 的规格描述),确保容器的状态与这些规格描述相匹配。
    • kubelet 负责维护容器的生命周期,比如启动、停止、重启容器,以及实现健康检查。
  2. kubeadm
    • kubeadm 是一个工具,用于快速部署 Kubernetes 集群。
    • 它帮助你初始化集群(设置 master 节点),加入新的节点到集群中,以及进行必要的配置以建立一个可工作的集群。
    • kubeadm 并不管理集群中的节点或者 Pod,它只负责集群的启动和扩展。
  3. kubectl
    • kubectl 是 Kubernetes 集群的命令行工具,允许用户与集群进行交互。
    • 它可以用来部署应用、检查和管理集群资源以及查看日志等。
    • kubectl 主要与 Kubernetes API 服务器通信,执行用户的命令。
  4. containerd.io
    • containerd.io 是一个容器运行时,Kubernetes 用它来运行容器。
    • 它负责镜像的拉取、容器的运行以及容器的生命周期管理。
    • containerd.io 是 Docker 的核心组件之一,但在 Kubernetes 中可以独立使用,不依赖于完整的 Docker 工具集。

3.2、安装 Kubernetes 依赖项

---
- name: Install kubernetes packages and containerd.io
  hosts: kubernetes
  become: yes
  tasks:

    # Upgrade all installed packages to their latest versions
    - name: Upgrade all installed apt packages
      apt:
        upgrade: 'yes'
        update_cache: yes
        cache_valid_time: 3600  # Cache is considered valid for 1 hour

    # Install required packages for Kubernetes and Docker setup
    - name: Install prerequisites for Kubernetes and Docker
      apt:
        name:
          - ca-certificates
          - curl
          - gnupg
        update_cache: yes
        cache_valid_time: 3600

    # Ensure the keyring directory exists for storing GPG keys
    - name: Create /etc/apt/keyrings directory for GPG keys
      file:
        path: /etc/apt/keyrings
        state: directory
        mode: '0755'

    # Add Docker's official GPG key
    - name: Add official Docker GPG key to keyring
      apt_key:
        url: https://download.docker.com/linux/ubuntu/gpg
        keyring: /etc/apt/keyrings/docker.gpg
        state: present

    # Add Docker's apt repository
    - name: Add Docker repository to apt sources
      apt_repository:
        # repo: "deb [arch={{ ansible_architecture }} signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu {{ ansible_distribution_release }} stable"
        repo: "deb [signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu {{ ansible_distribution_release }} stable"
        filename: docker
        update_cache: yes
      notify: Update apt cache

    # Add Kubernetes' GPG key
    - name: Add Kubernetes GPG key to keyring
      apt_key:
        url: https://pkgs.k8s.io/core:/stable:/v1.28/deb/Release.key
        keyring: /etc/apt/keyrings/kubernetes-apt-keyring.gpg
        state: present

    # Add Kubernetes' apt repository
    - name: Add Kubernetes repository to apt sources
      lineinfile:
        path: /etc/apt/sources.list.d/kubernetes.list
        line: 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.28/deb/ /'
        create: yes
      notify: Update apt cache

    # Install Kubernetes packages
    - name: Install Kubernetes packages (kubelet, kubeadm, kubectl) and containerd.io
      apt:
        name:
          - kubelet
          - kubeadm
          - kubectl
          - containerd.io
        state: present

    # Hold the installed packages to prevent automatic updates
    - name: Hold Kubernetes packages and containerd.io
      dpkg_selections:
        name: "{{ item }}"
        selection: hold
      loop:
        - kubelet
        - kubeadm
        - kubectl
        - containerd.io

  handlers:
    # Handler to update apt cache when notified
    - name: Update apt cache
      apt:
        update_cache: yes

如果要指定 repo 的 arch,可以如下方式使用(以docker为例):

        repo: "deb [arch={{ ansible_architecture }} signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu {{ ansible_distribution_release }} stable"

3.3、转发 IPv4 并让 iptables 看到桥接流量

---
- name: Configure Kubernetes prerequisites
  hosts: kubernetes
  become: yes  # to run tasks that require sudo
  tasks:
    - name: Load Kernel Modules
      copy:
        content: |
          overlay
          br_netfilter
        dest: /etc/modules-load.d/k8s.conf
      notify: Load Modules

    - name: Set Sysctl Parameters
      copy:
        content: |
          net.bridge.bridge-nf-call-iptables  = 1
          net.bridge.bridge-nf-call-ip6tables = 1
          net.ipv4.ip_forward                 = 1
        dest: /etc/sysctl.d/k8s.conf
      notify: Apply Sysctl

  handlers:
    - name: Load Modules
      modprobe:
        name: "{{ item }}"
        state: present
      loop:
        - overlay
        - br_netfilter

    - name: Apply Sysctl
      command: sysctl --system

4、预加载 k8s 需要的镜像

containerd 其实也用的到一个 k8s中用的到的镜像,只不过可能版本更低,不过可以改成相同版本的,所以在配置 containerd 之前预加载 k8s 需要的镜像。

---
- name: Prefetch kubernetes images
  hosts: kubernetes
  become: true
  tasks:
    - name: Get kubeadm version
      command: kubeadm version -o short
      register: kubeadm_version

    - name: List Kubernetes images for the specific kubeadm version
      command: "kubeadm config images list --kubernetes-version={{ kubeadm_version.stdout }}"
      register: kubernetes_images

    - name: Pull and retag Kubernetes images from Aliyun registry
      block:
        - name: List old images in k8s.io namespace
          command: ctr -n k8s.io images list -q
          register: old_images_list

        - name: Pull Kubernetes image from Aliyun
          command: "ctr -n k8s.io images pull registry.aliyuncs.com/google_containers/{{ item.split('/')[-1] }}"
          loop: "{{ kubernetes_images.stdout_lines }}"
          when: item not in old_images_list.stdout
          loop_control:
            label: "{{ item }}"

        - name: Retag Kubernetes image
          command: "ctr -n k8s.io images tag registry.aliyuncs.com/google_containers/{{ item.split('/')[-1] }} {{ item }}"
          loop: "{{ kubernetes_images.stdout_lines }}"
          when: item not in old_images_list.stdout
          loop_control:
            label: "{{ item }}"

        - name: List new images in k8s.io namespace
          command: ctr -n k8s.io images list -q
          register: new_images_list

        - name: Remove images from Aliyun registry
          command: "ctr -n k8s.io images remove {{ item }}"
          loop: "{{ new_images_list.stdout_lines }}"
          when: item.startswith('registry.aliyuncs.com/google_containers')
          loop_control:
            label: "{{ item }}"

        # # Optional: Remove old SHA256 tags if necessary
        # - name: Remove old SHA256 tags
        #   command: "ctr -n k8s.io images remove {{ item }}"
        #   loop: "{{ new_images_list.stdout_lines }}"
        #   when: item.startswith('sha256:')
        #   loop_control:
        #     label: "{{ item }}"

    - name: Restart containerd service
      service:
        name: containerd
        state: restarted

通过anisble 执行后就完成了所有的镜像预加载工作,也可以在kubeadm的init配置文件中改写imageRepository: registry.k8s.io来省略这步操作。

注释的内容是删除sha256:开头的镜像,建议不要删除.

ctr -n k8s.io images ls -q
registry.k8s.io/coredns/coredns:v1.10.1
registry.k8s.io/etcd:3.5.9-0
registry.k8s.io/kube-apiserver:v1.28.4
registry.k8s.io/kube-controller-manager:v1.28.4
registry.k8s.io/kube-proxy:v1.28.4
registry.k8s.io/kube-scheduler:v1.28.4
registry.k8s.io/pause:3.9
sha256:73deb9a3f702532592a4167455f8bf2e5f5d900bcc959ba2fd2d35c321de1af9
sha256:7fe0e6f37db33464725e616a12ccc4e36970370005a2b09683a974db6350c257
sha256:83f6cc407eed88d214aad97f3539bde5c8e485ff14424cd021a3a2899304398e
sha256:d058aa5ab969ce7b84d25e7188be1f80633b18db8ea7d02d9d0a78e676236591
sha256:e3db313c6dbc065d4ac3b32c7a6f2a878949031b881d217b63881a109c5cfba1
sha256:e6f1816883972d4be47bd48879a08919b96afcd344132622e4d444987919323c
sha256:ead0a4a53df89fd173874b46093b6e62d8c72967bbf606d672c9e8c9b601a4fc

SHA256 标签在 Docker 和容器技术中扮演着重要角色,主要涉及到镜像的完整性和版本控制。以下是 SHA256 标签的作用和删除它们的潜在好处与坏处:

SHA256 标签的作用:

  1. 完整性验证:SHA256 标签是镜像内容的加密哈希。它们用于验证镜像内容的完整性,确保在传输过程中没有被篡改。
  2. 唯一标识:每个镜像都有一个唯一的 SHA256 哈希,这有助于区分不同的镜像版本,即使它们可能有相同的标签(如latest)。
  3. 版本控制:在多次构建和更新镜像时,SHA256 哈希有助于追踪特定的镜像版本。

删除 SHA256 标签的好处:

  1. 简化镜像管理:删除长的 SHA256 标签可以简化镜像管理,使得只通过人类可读的标签(如v1.0.0)来识别镜像。
  2. 减少混淆:对于那些不需要详细追踪每个镜像版本的场景,移除这些长的哈希标签可以减少视觉上的混淆。

删除 SHA256 标签的坏处:

  1. 丢失详细版本信息:删除 SHA256 标签意味着丢失了一个精确指向特定镜像版本的方式。如果需要回溯或审核,这可能是个问题。
  2. 潜在的安全风险:在某些环境中,保留 SHA256 标签可以作为额外的安全措施,以确保镜像在拉取和部署过程中保持不变。

总的来说,是否删除 SHA256 标签取决于您的具体需求和管理偏好。在生产环境中,通常建议保留这些标签以便于版本控制和安全审核。在开发或测试环境中,如果标签的数量导致管理变得繁琐,可以考虑移除它们。

5、配置 containerd

containerd 默认的配置文件禁用了cri插件,并且默认SystemdCgroup = false,在开始配置kubernetes 前,需要启用cri插件,并且设置SystemdCgroup = true

SystemdCgroup=true 是一个配置选项,通常在容器运行时环境的配置文件中使用,特别是在与 Docker 或 Kubernetes 相关的设置中。这个选项涉及到 Linux 中的 cgroup(控制组)管理,具体来说是 cgroup 的第二版本(cgroup v2)。

4.1、cgroup 简介

  • 控制组 (cgroup):cgroup 是 Linux 内核的一个特性,允许您分配、优先排序、限制、记录和隔离资源的使用(如 CPU、内存、磁盘 I/O 等)。
  • cgroup v1 与 v2:cgroup v1 是最初的版本,而 cgroup v2 是一个更新的版本,提供了更一致和简化的接口。

5.2、Systemd 与 cgroup

  • Systemd:Systemd 是一个系统和服务管理器,它为现代 Linux 发行版提供了初始化系统、服务管理等功能。Systemd 也可以用来管理 cgroup。
  • Systemd 与 cgroup 集成:当 Systemd 作为进程管理器时,它可以使用 cgroup 来管理系统资源。在 cgroup v2 中,Systemd 提供了更好的集成和资源管理。

5.3、SystemdCgroup=true 的含义

当您在容器运行时的配置文件中设置SystemdCgroup=true 时,您告诉容器运行时(如 Docker 或 containerd)使用 Systemd 来管理容器的 cgroup。这样做的优点包括:

  1. 更好的资源管理:Systemd 提供了对资源限制和优先级的细粒度控制,允许更有效地管理容器使用的资源。
  2. 统一的管理:使用 Systemd 管理 cgroup 可以和系统中的其他服务管理机制保持一致。
  3. 更好的兼容性:特别是对于使用 cgroup v2 的系统,使用 Systemd 进行管理可以确保更好的兼容性和稳定性。

5.4、使用场景

在 Kubernetes 环境中,这个配置通常出现在 Kubelet 的配置中,确保容器的资源管理与系统的其它部分保持一致。这对于保证系统资源的高效利用和稳定运行非常重要。

5.5、修改 containerd 配置文件

通过将 disabled_plugins = ["cri"] 中内容清空可以启用cri插件,设置 SystemdCgroup=true 可以使用 Systemd 来管理容器的 cgroup,并且修改sandbox_image,使其与 k8s 保持一致。

---
- name: Configure containerd
  hosts: kubernetes
  become: true
  tasks:

    - name: Get Kubernetes images list
      command: kubeadm config images list
      register: kubernetes_images

    - name: Set pause image variable
      set_fact:
        pause_image: "{{ kubernetes_images.stdout_lines | select('match', '^registry.k8s.io/pause:') | first }}"

    - name: Generate default containerd config
      command: containerd config default
      register: containerd_config
      changed_when: false

    - name: Write containerd config to file
      copy:
        dest: /etc/containerd/config.toml
        content: "{{ containerd_config.stdout }}"
        mode: '0644'
        
    - name: Replace 'sandbox_image' and 'SystemdCgroup' in containerd config
      lineinfile:
        path: /etc/containerd/config.toml
        regexp: "{{ item.regexp }}"
        line: "{{ item.line }}"
      loop:
        - { regexp: '^\s*sandbox_image\s*=.*$', line: '    sandbox_image = "{{ pause_image }}"' }
        - { regexp: 'SystemdCgroup =.*', line: '            SystemdCgroup = true' }

    - name: Restart containerd service
      service:
        name: containerd
        state: restarted

6、初始化 main master 节点

使用 kubeadm 初始化 k8s 集群时,只需要初始化一个 master 节点,其余的 master 和 worker 节点都可以使用kubeadm join命令加入,所以需要先初始化 main master 节点。

6.1、生成kubeadm init用的 token

kubeadm 默认的 token 是 abcdef.0123456789abcdef,其格式要求为"[a-z0-9]{6}.[a-z0-9]{16}, 可以通过下面这条命令来生成:

LC_CTYPE=C tr -dc 'a-z0-9' </dev/urandom | head -c 6; echo -n '.'; LC_CTYPE=C tr -dc 'a-z0-9' </dev/urandom | head -c 16

6.2、编写 kubeadm-init.yaml.j2 模板文件

文件名:resources/kubeadm-init.yaml.j2

通过kubeadm config print init-defaults命令获取到默认的配置文件,然后修改为如下内容:

apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- token: {{ token }}
  ttl: 0s
  usages:
    - signing
    - authentication
  description: "kubeadm bootstrap token"
  groups:
    - system:bootstrappers:kubeadm:default-node-token
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: {{ ansible_host }}
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///var/run/containerd/containerd.sock
  imagePullPolicy: IfNotPresent
  name: k8s-main-01
  taints: null
---
apiServer:
  certSANs:
    - {{ kubernetes_vip }}
    - {{ ansible_host }}
{% for host in groups['masters'] %}
    - {{ hostvars[host]['ansible_host'] }}
{% endfor %}
    - k8s-main-01
    - k8s-main-02
    - k8s-main-03
    - kubernetes.cluster
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.k8s.io
kind: ClusterConfiguration
kubernetesVersion: 1.28.4
controlPlaneEndpoint: "{{ kubernetes_vip }}:6443"
networking:
  dnsDomain: cluster.local
  podSubnet: 10.244.0.0/12
  serviceSubnet: 10.96.0.0/12
scheduler: {}
---
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: systemd
---
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs

ttl: 0s指定了永不过期。

推荐的做法是设置合理的过期时间,后续使用kubeadm token create --print-join-command的方式获取 token。

6.3、生成 kubeadm init 配置文件

通过 register 捕获 shell 脚本生成的内容,并且在Generate kubeadm config file 子任务中定义名为token的变量接收来自 stdout 输出的内容。

使用register 的基本原理:

  • 捕获输出:当您在某个任务上使用register 关键字时,Ansible 会捕获该任务的输出并将其存储在您指定的变量中。
  • 变量使用:注册后的变量可以在 playbook 的后续任务中使用,允许您根据之前任务的输出来决定后续操作

编写 ansible 任务配置文件内容如下:

---
- name: Initialize Kubernetes Cluster on Main Master
  hosts: main
  become: true
  tasks:
    - name: Generate Kubernetes init token
      shell: >
        LC_CTYPE=C tr -dc 'a-z' </dev/urandom | head -c 1;
        LC_CTYPE=C tr -dc 'a-z0-9' </dev/urandom | head -c 5; 
        echo -n '.'; 
        LC_CTYPE=C tr -dc 'a-z0-9' </dev/urandom | head -c 16
      register: k8s_init_token

    - name: Generate kubeadm config file
      template:
        src: resources/kubeadm-init.yaml.j2
        dest: kubeadm-init.yaml
      vars:
        token: "{{ k8s_init_token.stdout }}"

以上配置文件会在 /home/ansible 目录下生成一个名为kubeadm-init.yaml的文件,用户名是在 hosts.ini 中 main 组的服务器通过 ansible_ssh_user 指定的。可以登录到服务器检查 kubeadm-init.yaml 文件的内容。

6.4、使用 kubeadm 创建 主 master 节点

resources/cilium-linux-amd64.tar.gz文件下载方式参考

在 linux 上安装 cilium-cli 时这样下载

CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
CLI_ARCH=amd64
if [ "$(uname -m)" = "aarch64" ]; then CLI_ARCH=arm64; fi
curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
# 在Linux上这样检查文件 sha256
# sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum
# 在Mac上这样检查 sha256
# shasum -a 256 -c cilium-linux-${CLI_ARCH}.tar.gz.sha256sum

在 Mac 上安装 cilium-cli 时这样下载

CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
CLI_ARCH=amd64
if [ "$(uname -m)" = "arm64" ]; then CLI_ARCH=arm64; fi
curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-darwin-${CLI_ARCH}.tar.gz{,.sha256sum}
shasum -a 256 -c cilium-darwin-${CLI_ARCH}.tar.gz.sha256sum

这是在上面的配置文件基础上增加的内容

建议与后面的加入其它 masters 和 workers 的步骤放同一个文件中一起执行,这样可以通过set_fact设置的变量直接执行kubeadm join命令。

下面的配置文件也将 masters 和 workers 上需要的join命令存到了.master_join_command.txt.worker_join_command.txt中,单独加入节点,或后续手动执行kubeadm join可以使用其中的命令。

下面的配置直接在 k8s 的 main master 上安装cilium-cli, 将下载好的文件放到 resources/cilium-linux-amd64.tar.gz 目录:

---
- name: Initialize Kubernetes Cluster on Main Master
  hosts: main
  become: true
  tasks:

    - name: Check if IP address is already present
      shell: "ip addr show dev lo | grep {{ kubernetes_vip }}"
      register: ip_check
      ignore_errors: yes
      failed_when: false
      changed_when: false
    
    - name: Debug print ip_check result
      debug:
        msg: "{{ ip_check }}"

    - name: Add IP address to loopback interface
      command:
        cmd: "ip addr add {{ kubernetes_vip }}/32 dev lo"
      when: ip_check.rc != 0

    - name: Generate Kubernetes init token
      shell: >
        LC_CTYPE=C tr -dc 'a-z0-9' </dev/urandom | head -c 6; 
        echo -n '.'; 
        LC_CTYPE=C tr -dc 'a-z0-9' </dev/urandom | head -c 16
      register: k8s_init_token

    - name: Generate kubeadm config file
      template:
        src: resources/kubeadm-init.yaml.j2
        dest: kubeadm-init.yaml
      vars:
        token: "{{ k8s_init_token.stdout }}"

    - name: Initialize the Kubernetes cluster using kubeadm
      command:
        cmd: kubeadm init --v=5 --skip-phases=addon/kube-proxy --upload-certs --config kubeadm-init.yaml
      register: kubeadm_init
    
    - name: Set fact for master join command
      set_fact:
        master_join_command: "{{ kubeadm_init.stdout | regex_search('kubeadm join(.*\\n)+?.*--control-plane.*', multiline=True) }}"
        cacheable: yes
      run_once: true

    - name: Set fact for worker join command
      set_fact:
        worker_join_command: "{{ kubeadm_init.stdout | regex_search('kubeadm join(.*\\n)+?.*sha256:[a-z0-9]{64}', multiline=True) }}"
        cacheable: yes
      run_once: true

    # - name: Create the target directory if it doesn't exist
    #   file:
    #     path: ~/.kube
    #     state: directory
    #     owner: "{{ ansible_user_id }}"
    #     group: "{{ ansible_user_id }}"
    #     mode: '0755'
    #   when: not ansible_check_mode  # This ensures it only runs when not in check mode

    # - name: Copy kube admin config to ansible user directory
    #   copy:
    #     src: /etc/kubernetes/admin.conf
    #     dest: ~/.kube/config
    #     remote_src: yes
    #     owner: "{{ ansible_user_id }}"
    #     group: "{{ ansible_user_id }}"
    #     mode: '0644'

    - name: Write master join command to .master_join_command.txt
      copy:
        content: "{{ master_join_command }}"
        dest: ".master_join_command.txt"
        mode: '0664'
      delegate_to: localhost

    - name: Append worker join command to .worker_join_command.txt
      lineinfile:
        path: ".worker_join_command.txt"
        line: "{{ worker_join_command }}"
        create: yes
      delegate_to: localhost


- name: Install cilium on Main Master
  hosts: main
  become: true
  tasks:

    - name: Ensure tar is installed (Debian/Ubuntu)
      apt:
        name: tar
        state: present
      when: ansible_os_family == "Debian"

    - name: Check for Cilium binary in /usr/local/bin
      stat:
        path: /usr/local/bin/cilium
      register: cilium_binary

    - name: Transfer and Extract Cilium
      unarchive:
        src: resources/cilium-linux-amd64.tar.gz
        dest: /usr/local/bin
        remote_src: no
      when: not cilium_binary.stat.exists

    - name: Install cilium to the Kubernetes cluster
      environment:
        KUBECONFIG: /etc/kubernetes/admin.conf
      command:
        cmd: cilium install --version 1.14.4 --set kubeProxyReplacement=true
    
    - name: Wait for Kubernetes cluster to become ready
      environment:
        KUBECONFIG: /etc/kubernetes/admin.conf
      command: kubectl get nodes
      register: kubectl_output
      until: kubectl_output.stdout.find("Ready") != -1
      retries: 20
      delay: 30

如果想继续使用kube-proxy,删除kubeadm init命令中的--skip-phases=addon/kube-proxy参数和cilium install命令中的--set kubeProxyReplacement=true

6.5、将其余的 masters 加入集群

---
- name: Join Masters to the Cluster
  hosts: masters
  become: true
  tasks:
    - name: Joining master node to the Kubernetes cluster
      shell:
        cmd: "{{ hostvars['k8s-main-01']['master_join_command'] }}"
      ignore_errors: yes

    - name: Wait for node to become ready
      environment:
        KUBECONFIG: /etc/kubernetes/admin.conf
      command: kubectl get nodes
      register: kubectl_output
      until: kubectl_output.stdout.find("NotReady") == -1
      retries: 20
      delay: 30

    - name: Check if IP address is already present
      shell: "ip addr show dev lo | grep {{ kubernetes_vip }}"
      register: ip_check
      ignore_errors: yes
      failed_when: false
      changed_when: false
    
    - name: Debug print ip_check result
      debug:
        msg: "{{ ip_check }}"

    - name: Add IP address to loopback interface
      command:
        cmd: "ip addr add {{ kubernetes_vip }}/32 dev lo"
      when: ip_check.rc != 0

6.6、将其余的 works 加入集群

---
- name: Join Worker Nodes to the Cluster
  hosts: workers
  become: true
  tasks:
    - name: Joining master node to the Kubernetes cluster
      shell:
        cmd: "{{ hostvars['k8s-main-01']['worker_join_command'] }}"
      ignore_errors: yes

    - name: Check if IP address is already present
      shell: "ip addr show dev lo | grep {{ kubernetes_vip }}"
      register: ip_check
      ignore_errors: yes
      failed_when: false
      changed_when: false
    
    - name: Debug print ip_check result
      debug:
        msg: "{{ ip_check }}"

    - name: Add IP address to loopback interface
      command:
        cmd: "ip addr add {{ kubernetes_vip }}/32 dev lo"
      when: ip_check.rc != 0

完整的 ansible playbook 配置

只需要执行这个配置文件,加上resources目录中的三个文件,就可以完成上面所有的操作

依赖的文件:

  • 参考 2.4、编写 keepalived.conf.j2 模板文件
  • 参考 6.2、编写 kubeadm-init.yaml.j2 模板文件
  • 参考 6.4、使用 kubeadm 创建 主 master 节点 中下载的 cilium-cli 文件,需要 linux 版本
---
- name: Setup Load Balancer with IPVS and Keepalived
  hosts: lvs
  become: yes
  tasks:
    # Upgrade all installed packages to their latest versions
    - name: Upgrade all installed apt packages
      apt:
        upgrade: 'yes'
        update_cache: yes
        cache_valid_time: 3600  # Cache is considered valid for 1 hour

    # Install IP Virtual Server (IPVS) administration utility
    - name: Install ipvsadm for IPVS management
      apt:
        name: ipvsadm
        state: present

    # Install keepalived for high availability
    - name: Install Keepalived for load balancing
      apt:
        name: keepalived
        state: present

    # Deploy keepalived configuration from a Jinja2 template
    - name: Deploy keepalived configuration file
      template:
        src: resources/keepalived.conf.j2
        dest: /etc/keepalived/keepalived.conf

    # Restart keepalived to apply changes
    - name: Restart Keepalived service
      service:
        name: keepalived
        state: restarted


- name: Install kubernetes packages and containerd.io
  hosts: kubernetes
  become: yes
  tasks:

    # Upgrade all installed packages to their latest versions
    - name: Upgrade all installed apt packages
      apt:
        upgrade: 'yes'
        update_cache: yes
        cache_valid_time: 3600  # Cache is considered valid for 1 hour

    # Install required packages for Kubernetes and Docker setup
    - name: Install prerequisites for Kubernetes and Docker
      apt:
        name:
          - ca-certificates
          - curl
          - gnupg
        update_cache: yes
        cache_valid_time: 3600

    # Ensure the keyring directory exists for storing GPG keys
    - name: Create /etc/apt/keyrings directory for GPG keys
      file:
        path: /etc/apt/keyrings
        state: directory
        mode: '0755'

    # Add Docker's official GPG key
    - name: Add official Docker GPG key to keyring
      apt_key:
        url: https://download.docker.com/linux/ubuntu/gpg
        keyring: /etc/apt/keyrings/docker.gpg
        state: present

    # Add Docker's apt repository
    - name: Add Docker repository to apt sources
      apt_repository:
        # repo: "deb [arch={{ ansible_architecture }} signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu {{ ansible_distribution_release }} stable"
        repo: "deb [signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu {{ ansible_distribution_release }} stable"
        filename: docker
        update_cache: yes
      notify: Update apt cache

    # Add Kubernetes' GPG key
    - name: Add Kubernetes GPG key to keyring
      apt_key:
        url: https://pkgs.k8s.io/core:/stable:/v1.28/deb/Release.key
        keyring: /etc/apt/keyrings/kubernetes-apt-keyring.gpg
        state: present

    # Add Kubernetes' apt repository
    - name: Add Kubernetes repository to apt sources
      lineinfile:
        path: /etc/apt/sources.list.d/kubernetes.list
        line: 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.28/deb/ /'
        create: yes
      notify: Update apt cache

    # Install Kubernetes packages
    - name: Install Kubernetes packages (kubelet, kubeadm, kubectl) and containerd.io
      apt:
        name:
          - kubelet
          - kubeadm
          - kubectl
          - containerd.io
        state: present

    # Hold the installed packages to prevent automatic updates
    - name: Hold Kubernetes packages and containerd.io
      dpkg_selections:
        name: "{{ item }}"
        selection: hold
      loop:
        - kubelet
        - kubeadm
        - kubectl
        - containerd.io

  handlers:
    # Handler to update apt cache when notified
    - name: Update apt cache
      apt:
        update_cache: yes


- name: Configure Kubernetes prerequisites
  hosts: kubernetes
  become: yes  # to run tasks that require sudo
  tasks:
    - name: Load Kernel Modules
      copy:
        content: |
          overlay
          br_netfilter
        dest: /etc/modules-load.d/k8s.conf
      notify: Load Modules

    - name: Set Sysctl Parameters
      copy:
        content: |
          net.bridge.bridge-nf-call-iptables  = 1
          net.bridge.bridge-nf-call-ip6tables = 1
          net.ipv4.ip_forward                 = 1
        dest: /etc/sysctl.d/k8s.conf
      notify: Apply Sysctl

  handlers:
    - name: Load Modules
      modprobe:
        name: "{{ item }}"
        state: present
      loop:
        - overlay
        - br_netfilter

    - name: Apply Sysctl
      command: sysctl --system


- name: Prefetch kubernetes images
  hosts: kubernetes
  become: true
  tasks:
    - name: Get kubeadm version
      command: kubeadm version -o short
      register: kubeadm_version

    - name: List Kubernetes images for the specific kubeadm version
      command: "kubeadm config images list --kubernetes-version={{ kubeadm_version.stdout }}"
      register: kubernetes_images

    - name: Pull and retag Kubernetes images from Aliyun registry
      block:
        - name: List old images in k8s.io namespace
          command: ctr -n k8s.io images list -q
          register: old_images_list

        - name: Pull Kubernetes image from Aliyun
          command: "ctr -n k8s.io images pull registry.aliyuncs.com/google_containers/{{ item.split('/')[-1] }}"
          loop: "{{ kubernetes_images.stdout_lines }}"
          when: item not in old_images_list.stdout
          loop_control:
            label: "{{ item }}"

        - name: Retag Kubernetes image
          command: "ctr -n k8s.io images tag registry.aliyuncs.com/google_containers/{{ item.split('/')[-1] }} {{ item }}"
          loop: "{{ kubernetes_images.stdout_lines }}"
          when: item not in old_images_list.stdout
          loop_control:
            label: "{{ item }}"

        - name: List new images in k8s.io namespace
          command: ctr -n k8s.io images list -q
          register: new_images_list

        - name: Remove images from Aliyun registry
          command: "ctr -n k8s.io images remove {{ item }}"
          loop: "{{ new_images_list.stdout_lines }}"
          when: item.startswith('registry.aliyuncs.com/google_containers')
          loop_control:
            label: "{{ item }}"

        # # Optional: Remove old SHA256 tags if necessary
        # - name: Remove old SHA256 tags
        #   command: "ctr -n k8s.io images remove {{ item }}"
        #   loop: "{{ new_images_list.stdout_lines }}"
        #   when: item.startswith('sha256:')
        #   loop_control:
        #     label: "{{ item }}"


- name: Configure containerd
  hosts: kubernetes
  become: true
  tasks:

    - name: Get Kubernetes images list
      command: kubeadm config images list
      register: kubernetes_images

    - name: Set pause image variable
      set_fact:
        pause_image: "{{ kubernetes_images.stdout_lines | select('match', '^registry.k8s.io/pause:') | first }}"

    - name: Generate default containerd config
      command: containerd config default
      register: containerd_config
      changed_when: false

    - name: Write containerd config to file
      copy:
        dest: /etc/containerd/config.toml
        content: "{{ containerd_config.stdout }}"
        mode: '0644'
        
    - name: Replace 'sandbox_image' and 'SystemdCgroup' in containerd config
      lineinfile:
        path: /etc/containerd/config.toml
        regexp: "{{ item.regexp }}"
        line: "{{ item.line }}"
      loop:
        - { regexp: '^\s*sandbox_image\s*=.*$', line: '    sandbox_image = "{{ pause_image }}"' }
        - { regexp: 'SystemdCgroup =.*', line: '            SystemdCgroup = true' }

    - name: Restart containerd service
      service:
        name: containerd
        state: restarted

- name: Initialize Kubernetes Cluster on Main Master
  hosts: main
  become: true
  tasks:

    - name: Check if IP address is already present
      shell: "ip addr show dev lo | grep {{ kubernetes_vip }}"
      register: ip_check
      ignore_errors: yes
      failed_when: false
      changed_when: false
    
    - name: Debug print ip_check result
      debug:
        msg: "{{ ip_check }}"

    - name: Add IP address to loopback interface
      command:
        cmd: "ip addr add {{ kubernetes_vip }}/32 dev lo"
      when: ip_check.rc != 0

    - name: Generate Kubernetes init token
      shell: >
        LC_CTYPE=C tr -dc 'a-z0-9' </dev/urandom | head -c 6; 
        echo -n '.'; 
        LC_CTYPE=C tr -dc 'a-z0-9' </dev/urandom | head -c 16
      register: k8s_init_token

    - name: Generate kubeadm config file
      template:
        src: resources/kubeadm-init.yaml.j2
        dest: kubeadm-init.yaml
      vars:
        token: "{{ k8s_init_token.stdout }}"

    - name: Initialize the Kubernetes cluster using kubeadm
      command:
        cmd: kubeadm init --v=5 --skip-phases=addon/kube-proxy --config kubeadm-init.yaml --upload-certs
      register: kubeadm_init
    
    - name: Set fact for master join command
      set_fact:
        master_join_command: "{{ kubeadm_init.stdout | regex_search('kubeadm join(.*\\n)+?.*--control-plane', multiline=True) }}"
        cacheable: yes
      run_once: true

    - name: Set fact for worker join command
      set_fact:
        worker_join_command: "{{ kubeadm_init.stdout | regex_search('kubeadm join(.*\\n)+?.*sha256:[a-z0-9]{64}', multiline=True) }}"
        cacheable: yes
      run_once: true

    # - name: Create the target directory if it doesn't exist
    #   file:
    #     path: ~/.kube
    #     state: directory
    #     owner: "{{ ansible_user_id }}"
    #     group: "{{ ansible_user_id }}"
    #     mode: '0755'
    #   when: not ansible_check_mode  # This ensures it only runs when not in check mode

    # - name: Copy kube admin config to ansible user directory
    #   copy:
    #     src: /etc/kubernetes/admin.conf
    #     dest: ~/.kube/config
    #     remote_src: yes
    #     owner: "{{ ansible_user_id }}"
    #     group: "{{ ansible_user_id }}"
    #     mode: '0644'

    - name: Write master join command to .master_join_command.txt
      copy:
        content: "{{ master_join_command }}"
        dest: ".master_join_command.txt"
        mode: '0664'
      delegate_to: localhost

    - name: Append worker join command to .worker_join_command.txt
      lineinfile:
        path: ".worker_join_command.txt"
        line: "{{ worker_join_command }}"
        create: yes
      delegate_to: localhost


- name: Install cilium on Main Master
  hosts: main
  become: true
  tasks:

    - name: Ensure tar is installed (Debian/Ubuntu)
      apt:
        name: tar
        state: present
      when: ansible_os_family == "Debian"

    - name: Check for Cilium binary in /usr/local/bin
      stat:
        path: /usr/local/bin/cilium
      register: cilium_binary

    - name: Transfer and Extract Cilium
      unarchive:
        src: resources/cilium-linux-amd64.tar.gz
        dest: /usr/local/bin
        remote_src: no
      when: not cilium_binary.stat.exists

    - name: Install cilium to the Kubernetes cluster
      environment:
        KUBECONFIG: /etc/kubernetes/admin.conf
      command:
        cmd: cilium install --version 1.14.4 --set kubeProxyReplacement=true
    
    - name: Wait for Kubernetes cluster to become ready
      environment:
        KUBECONFIG: /etc/kubernetes/admin.conf
      command: kubectl get nodes
      register: kubectl_output
      until: kubectl_output.stdout.find("Ready") != -1
      retries: 20
      delay: 30


- name: Join Masters to the Cluster
  hosts: masters
  become: true
  tasks:
    - name: Joining master node to the Kubernetes cluster
      shell:
        cmd: "{{ hostvars['k8s-main-01']['master_join_command'] }}"
      ignore_errors: yes


- name: Join Worker Nodes to the Cluster
  hosts: workers
  become: true
  tasks:
    - name: Joining master node to the Kubernetes cluster
      shell:
        cmd: "{{ hostvars['k8s-main-01']['worker_join_command'] }}"
      ignore_errors: yes

用于重置 Kubernetes 节点的配置

---
- name: Setup Load Balancer with IPVS and Keepalived
  hosts: lvs
  become: yes
  tasks:
    # Upgrade all installed packages to their latest versions
    - name: Upgrade all installed apt packages
      apt:
        upgrade: 'yes'
        update_cache: yes
        cache_valid_time: 3600  # Cache is considered valid for 1 hour

    # Install IP Virtual Server (IPVS) administration utility
    - name: Install ipvsadm for IPVS management
      apt:
        name: ipvsadm
        state: present

    # Install keepalived for high availability
    - name: Install Keepalived for load balancing
      apt:
        name: keepalived
        state: present

    # Deploy keepalived configuration from a Jinja2 template
    - name: Deploy keepalived configuration file
      template:
        src: resources/keepalived.conf.j2
        dest: /etc/keepalived/keepalived.conf

    # Restart keepalived to apply changes
    - name: Restart Keepalived service
      service:
        name: keepalived
        state: restarted


- name: Install kubernetes packages and containerd.io
  hosts: kubernetes
  become: yes
  tasks:

    # Upgrade all installed packages to their latest versions
    - name: Upgrade all installed apt packages
      apt:
        upgrade: 'yes'
        update_cache: yes
        cache_valid_time: 3600  # Cache is considered valid for 1 hour

    # Install required packages for Kubernetes and Docker setup
    - name: Install prerequisites for Kubernetes and Docker
      apt:
        name:
          - ca-certificates
          - curl
          - gnupg
        update_cache: yes
        cache_valid_time: 3600

    # Ensure the keyring directory exists for storing GPG keys
    - name: Create /etc/apt/keyrings directory for GPG keys
      file:
        path: /etc/apt/keyrings
        state: directory
        mode: '0755'

    # Add Docker's official GPG key
    - name: Add official Docker GPG key to keyring
      apt_key:
        url: https://download.docker.com/linux/ubuntu/gpg
        keyring: /etc/apt/keyrings/docker.gpg
        state: present

    # Add Docker's apt repository
    - name: Add Docker repository to apt sources
      apt_repository:
        # repo: "deb [arch={{ ansible_architecture }} signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu {{ ansible_distribution_release }} stable"
        repo: "deb [signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu {{ ansible_distribution_release }} stable"
        filename: docker
        update_cache: yes
      notify: Update apt cache

    # Add Kubernetes' GPG key
    - name: Add Kubernetes GPG key to keyring
      apt_key:
        url: https://pkgs.k8s.io/core:/stable:/v1.28/deb/Release.key
        keyring: /etc/apt/keyrings/kubernetes-apt-keyring.gpg
        state: present

    # Add Kubernetes' apt repository
    - name: Add Kubernetes repository to apt sources
      lineinfile:
        path: /etc/apt/sources.list.d/kubernetes.list
        line: 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.28/deb/ /'
        create: yes
      notify: Update apt cache

    # Install Kubernetes packages
    - name: Install Kubernetes packages (kubelet, kubeadm, kubectl) and containerd.io
      apt:
        name:
          - kubelet
          - kubeadm
          - kubectl
          - containerd.io
        state: present

    # Hold the installed packages to prevent automatic updates
    - name: Hold Kubernetes packages and containerd.io
      dpkg_selections:
        name: "{{ item }}"
        selection: hold
      loop:
        - kubelet
        - kubeadm
        - kubectl
        - containerd.io

  handlers:
    # Handler to update apt cache when notified
    - name: Update apt cache
      apt:
        update_cache: yes


- name: Configure Kubernetes prerequisites
  hosts: kubernetes
  become: yes  # to run tasks that require sudo
  tasks:
    - name: Load Kernel Modules
      copy:
        content: |
          overlay
          br_netfilter
        dest: /etc/modules-load.d/k8s.conf
      notify: Load Modules

    - name: Set Sysctl Parameters
      copy:
        content: |
          net.bridge.bridge-nf-call-iptables  = 1
          net.bridge.bridge-nf-call-ip6tables = 1
          net.ipv4.ip_forward                 = 1
        dest: /etc/sysctl.d/k8s.conf
      notify: Apply Sysctl

  handlers:
    - name: Load Modules
      modprobe:
        name: "{{ item }}"
        state: present
      loop:
        - overlay
        - br_netfilter

    - name: Apply Sysctl
      command: sysctl --system


- name: Prefetch kubernetes images
  hosts: kubernetes
  become: true
  tasks:
    - name: Get kubeadm version
      command: kubeadm version -o short
      register: kubeadm_version

    - name: List Kubernetes images for the specific kubeadm version
      command: "kubeadm config images list --kubernetes-version={{ kubeadm_version.stdout }}"
      register: kubernetes_images

    - name: Pull and retag Kubernetes images from Aliyun registry
      block:
        - name: List