5 months ago

我們做網路服務的,總會有流量高峰或低峰時期,以往要做這種變動總是很麻煩,多改多錯 少改少錯 不改就不會錯,直接配置一台很強主機的讓你用。

但是 我們通常比較希望用多少開多少服務,所以我們在架構上需要可以達到一個彈性擴充的能力。
那首先請你遵守 Twelve Factors 後你再繼續看下去,不然你架構上根本不彈性就不要搞什麼水平擴容了。

#建立服務

[root@master1 ~]# kubectl run php-apache --image=gcr.io/google_containers/hpa-example --requests=cpu=200m --expose --port=80
service "php-apache" created
deployment "php-apache" created

#建立Autoscaler

[root@master1 ~]# kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
deployment "php-apache" autoscaled

這時候可以看負載狀態

[root@master1 ~]# kubectl get hpa
NAME         REFERENCE               TARGETS           MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   <unknown> / 50%   1         10        1          19m

如果跟我的一樣 目前負載是 那你的監控就有問題啦

要知道詳細的資訊可以下

[root@master1 ~]# kubectl describe hpa
2017-09-19 17:03:19.381621 I | proto: duplicate proto type registered: google.protobuf.Any
2017-09-19 17:03:19.381665 I | proto: duplicate proto type registered: google.protobuf.Duration
2017-09-19 17:03:19.381676 I | proto: duplicate proto type registered: google.protobuf.Timestamp
Name:                           php-apache
Namespace:                      default
Labels:                         <none>
Annotations:                        <none>
CreationTimestamp:                  Tue, 19 Sep 2017 16:37:39 +0800
Reference:                      Deployment/php-apache
Metrics:                        ( current / target )
  resource cpu on pods  (as a percentage of request): <unknown> / 50%
Min replicas:                       1
Max replicas:                       10
Conditions:
  Type      Status  Reason          Message
  ----      ------  ------          -------
  AbleToScale   True    SucceededGetScale   the HPA controller was able to get the target's current scale
  ScalingActive False   FailedGetResourceMetric the HPA was unable to compute the replica count: unable to get metrics for resource cpu: failed to get pod resource metrics: an error on the server ("unknown") has prevented the request from succeeding (get services http:heapster:)
Events:
  FirstSeen LastSeen    Count   From                SubObjectPath   Type        Reason              Message
  --------- --------    -----   ----                -------------   --------    ------              -------
  25m       9s      51  horizontal-pod-autoscaler           Warning     FailedGetResourceMetric     unable to get metrics for resource cpu: failed to get pod resource metrics: an error on the server ("unknown") has prevented the request from succeeding (get services http:heapster:)
  25m       9s      51  horizontal-pod-autoscaler           Warning     FailedComputeMetricsReplicas    failed to get cpu utilization: unable to get metrics for resource cpu: failed to get pod resource metrics: an error on the server ("unknown") has prevented the request from succeeding (get services http:heapster:)

大概看一下 看到 unknown 、 get services http:heapster 可以猜到缺什麼服務找不到吧
上網一找 heapster 果然是監控 cpu 用量的服務
說明這邊看https://github.com/kubernetes/kops/blob/master/docs/addons.md

掃一下系統內部關於

[root@master1 ~]# kubectl get pods -n kube-system |grep heapster
heapster-1321562559-hmdf5               0/2       Pending   0          12m

補充一下,如果沒有發現 heapster 就自己裝一下

[root@master1 ~]# kubectl create -f https://raw.githubusercontent.com/kubernetes/kops/master/addons/monitoring-standalone/v1.7.0.yaml
deployment "heapster" created
service "heapster" created
serviceaccount "heapster" created
clusterrolebinding "heapster" created
role "system:pod-nanny" created
rolebinding "heapster-binding" created

可以看到我們的 heapster pod 居然是 Pending,好吧....順便學一下 pod 除錯

想要看 pod 的資訊可以下 describe,但因為這 pod 是屬於系統內部的,所以我們要再加上 -n kube-system

[root@master1 ~]# kubectl describe pod heapster-1321562559-hmdf5 -n kube-system
Name:       heapster-1321562559-hmdf5
Namespace:  kube-system
Node:       <none>
Labels:     k8s-app=heapster
        pod-template-hash=1321562559
        version=v1.7.0
Annotations:    kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"kube-system","name":"heapster-1321562559","uid":"24338a44-9d15-11e7-a663-08002745...
        scheduler.alpha.kubernetes.io/critical-pod=
Status:     Pending
IP:     
Created By: ReplicaSet/heapster-1321562559
Controlled By:  ReplicaSet/heapster-1321562559
Containers:
  heapster:
    Image:  gcr.io/google_containers/heapster:v1.4.0
    Port:   <none>
    Command:
      /heapster
      --source=kubernetes.summary_api:''
    Limits:
      cpu:  100m
      memory:   300Mi
    Requests:
      cpu:      100m
      memory:       300Mi
    Liveness:       http-get http://:8082/healthz delay=180s timeout=5s period=10s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from heapster-token-q4ss6 (ro)
  heapster-nanny:
    Image:  gcr.io/google_containers/addon-resizer:2.0
    Port:   <none>
    Command:
      /pod_nanny
      --cpu=80m
      --extra-cpu=0.5m
      --memory=140Mi
      --extra-memory=4Mi
      --deployment=heapster
      --container=heapster
      --poll-period=300000
    Limits:
      cpu:  50m
      memory:   100Mi
    Requests:
      cpu:  50m
      memory:   100Mi
    Environment:
      MY_POD_NAME:  heapster-1321562559-hmdf5 (v1:metadata.name)
      MY_POD_NAMESPACE: kube-system (v1:metadata.namespace)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from heapster-token-q4ss6 (ro)
Conditions:
  Type      Status
  PodScheduled  False 
Volumes:
  heapster-token-q4ss6:
    Type:   Secret (a volume populated by a Secret)
    SecretName: heapster-token-q4ss6
    Optional:   false
QoS Class:  Guaranteed
Node-Selectors: <none>
Tolerations:    CriticalAddonsOnly
Events:
  FirstSeen LastSeen    Count   From            SubObjectPath   Type        Reason          Message
  --------- --------    -----   ----            -------------   --------    ------          -------
  19m       16m     15  default-scheduler           Warning     FailedScheduling    No nodes are available that match all of the following predicates:: Insufficient cpu (2), Insufficient memory (2), PodToleratesNodeTaints (1).
  15m       15m     3   default-scheduler           Warning     FailedScheduling    No nodes are available that match all of the following predicates:: Insufficient memory (2), PodToleratesNodeTaints (1).
  15m       18s     55  default-scheduler           Warning     FailedScheduling    No nodes are available that match all of the following predicates:: Insufficient cpu (1), Insufficient memory (2), PodToleratesNodeTaints (1).

大概看一下是資源不足吧?

再補充一下如果想看所有事件可以這樣下,事件都會被集中儲存到 etcd

[root@master1 ~]# kubectl get events
LASTSEEN   FIRSTSEEN   COUNT     NAME                              KIND                      SUBOBJECT                         TYPE      REASON                         SOURCE                      MESSAGE
48m        48m         1         load-generator-3044827360-j5kbp   Pod                                                         Normal    Scheduled                      default-scheduler           Successfully assigned load-generator-3044827360-j5kbp to node1
48m        48m         1         load-generator-3044827360-j5kbp   Pod                                                         Normal    SuccessfulMountVolume          kubelet, node1              MountVolume.SetUp succeeded for volume "default-token-vv5cv" 
48m        48m         1         load-generator-3044827360-j5kbp   Pod                       spec.containers{load-generator}   Normal    Pulling                        kubelet, node1              pulling image "busybox"
48m        48m         1         load-generator-3044827360-j5kbp   Pod                       spec.containers{load-generator}   Normal    Pulled                         kubelet, node1              Successfully pulled image "busybox"
48m        48m         1         load-generator-3044827360-j5kbp   Pod                       spec.containers{load-generator}   Normal    Created                        kubelet, node1              Created container
48m        48m         1         load-generator-3044827360-j5kbp   Pod                       spec.containers{load-generator}   Normal    Started                        kubelet, node1              Started container
48m        48m         1         load-generator-3044827360         ReplicaSet                                                  Normal    SuccessfulCreate               replicaset-controller       Created pod: load-generator-3044827360-j5kbp
48m        48m         1         load-generator                    Deployment                                                  Normal    ScalingReplicaSet              deployment-controller       Scaled up replica set load-generator-3044827360 to 1
18m        18m         1         php-apache-593471247-059vb        Pod                                                         Normal    Scheduled                      default-scheduler           Successfully assigned php-apache-593471247-059vb to node2

事件也有各自的 namespace 做隔離

#看特定的

kubectl get events --namespace=my-namespace
#看所有的

kubectl get events --all-namespace

解決方法
.
.
.
.
.
增加資源啊~~XD
每台 Node 1G Ram 真的很小,你要去改系統用的資源似乎不是很實際的做法
所以每台 Node 目前是 2G Ram 2 Cpu 重啟之後

[root@master1 ~]# kubectl get pods -n kube-system
NAME                                    READY     STATUS    RESTARTS   AGE
calico-node-2252h                       1/1       Running   5          5d
calico-node-9frnc                       1/1       Running   5          5d
calico-node-j4p17                       1/1       Running   5          5d
heapster-2066201068-x3n32               2/2       Running   0          45s
kube-apiserver-master1                  1/1       Running   6          5d
kube-controller-manager-master1         1/1       Running   6          5d
kube-dns-3888408129-lsq33               3/3       Running   16         5d
kube-dns-3888408129-s6x33               3/3       Running   15         5d
kube-proxy-master1                      1/1       Running   6          5d
kube-proxy-node1                        1/1       Running   6          5d
kube-proxy-node2                        1/1       Running   5          5d
kube-scheduler-master1                  1/1       Running   6          5d
kubedns-autoscaler-1629318612-ht1xh     1/1       Running   5          5d
kubernetes-dashboard-3941213843-r06v3   1/1       Running   7          5d
nginx-proxy-node1                       1/1       Running   6          5d
nginx-proxy-node2                       1/1       Running   5          5d

系統用的都可以正常執行了 (heapster 才活了 45 秒.....之前都沒跑起來過啊.....XD)

再看一下 負載資訊 可以看到 cpu 的目前負載已經可以讀取了

[root@master1 ~]# kubectl get hpa
NAME         REFERENCE               TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   0% / 50%   1         10        1          1h

在 kubernetes 起一個 busybox 的 pod 並進入操作

[root@master1 ~]# kubectl run -i --tty load-generator --image=busybox /bin/sh
If you don't see a command prompt, try pressing enter.
/ # wget -q -O- http://php-apache
OK!

在 Kubernetes 內部,會有自己的 DNS 來幫你做找主機這件事,所以你直接問名稱就可以找到對方的服務了

製造壓力的指令

# while true; do wget -q -O- http://php-apache.default.svc.cluster.local; done
OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OKOK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!

大概要30~60秒就會有變化

#一開始流量上來

[root@master1 ~]# kubectl get hpa
NAME         REFERENCE               TARGETS      MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   208% / 50%   1         10        1          1h

#開始建立副本

[root@master1 ~]# kubectl get hpa
NAME         REFERENCE               TARGETS      MINPODS   MAXPODS   REPLICAS   AGE
php-apache   Deployment/php-apache   208% / 50%   1         10        4          1h

#建立的副本

[root@master1 ~]# kubectl get pods
NAME                              READY     STATUS        RESTARTS   AGE
load-generator-3044827360-8vj8s   1/1       Terminating   0          2m
php-apache-593471247-059vb        1/1       Running       1          1h
php-apache-593471247-5ntxl        1/1       Running       0          1m
php-apache-593471247-637p9        1/1       Running       0          1m
php-apache-593471247-f97fg        1/1       Running       0          1m

如果沒有流量之後 Kubernetes 就會自動再把副本關閉,當然要等個幾分鐘確定真的流量是降下來了之後才會動作,不然反覆建立副本其實也很耗資源

reference:
Horizontal Pod Autoscaling - Kubernetes
Horizontal Pod Autoscaling Walkthrough - Kubernetes
Kubernetes Pod容量伸缩 — 青蛙小白
在Deployment部署下Scaleout · Kuberbetes學習筆記
Kubernetes用户指南(四)--应用检查和调试 - 小黑 - CSDN博客
Kubernetes集群之Monitoring - 陈健的博客 | ChenJian Blog

← Blue/Green Deployments on Kubernetes How to setup Ceph Persistent Volume for Kubernetes →
 
comments powered by Disqus