K8s错误排查步骤是我们使用k8s的必经之路,可以参考如下步骤:
首先查看pod的情况,使用命令:
kubectl get pods
输出如下:
[zzq@localhost zzq]$ kubectl get pods
NAME READY STATUS RESTARTS AGE
report-api-57f64db6c7-6zksv 0/1 Init:0/1 0 1m
report-api-57f64db6c7-mqn7x 0/1 Init:CrashLoopBackOff 2 1m
拿到pod的name,查看详细的情况,使用命令:
kubectl describe pod report-api-57f64db6c7-mqn7x
1
这里report-api-57f64db6c7-mqn7x跟上面kubectl get pods中的name对应。
输出如下:
[zzq@localhost zzq]$ kubectl describe pod report-api-57f64db6c7-mqn7x
Name: report-api-57f64db6c7-mqn7x
Namespace: default
Node: ip-10-10-133-37.cn-northwest-1.compute.internal/10.10.133.37
Start Time: Thu, 13 Sep 2018 14:32:10 +0800
Labels: app=report-api
pod-template-hash=1392086273
Annotations: kubernetes.io/limit-ranger=LimitRanger plugin set: cpu request for container report-api; cpu request for init container pull-lib
Status: Pending
IP: 100.96.6.2
Controlled By: ReplicaSet/report-api-57f64db6c7
Init Containers:
pull-lib:
Container ID: docker://41ada0ce00b3c724466abc3f2b945c7e85f59244ac52623ef235216b3adb64f6
Image: anigeo/awscli:latest
Image ID: docker-pullable://anigeo/awscli@sha256:910a18d43a9e936f38313b0dc44fbd7dc25303fab4ea89c4d7b082fefc654c8d
Port: <none>
Host Port: <none>
Args:
s3
cp
s3://general-data-group/lib/report-api/report-api-1.0.0-SNAPSHOT.jar
/jar/
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 13 Sep 2018 14:35:52 +0800
Finished: Thu, 13 Sep 2018 14:35:52 +0800
Ready: False
Restart Count: 5
Requests:
cpu: 100m
Environment:
AWS_DEFAULT_REGION: cn-northwest-1
Mounts:
/jar from workdir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-gs59h (ro)
Containers:
report-api:
Container ID:
Image: java:8
Image ID:
Port: 9999/TCP
Host Port: 0/TCP
Command:
java
Args:
-jar
/jar/report-api-1.0.0-SNAPSHOT.jar
--spring.profiles.active=prod
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 100m
Environment: <none>
Mounts:
/jar from workdir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-gs59h (ro)
Conditions:
Type Status
Initialized False
Ready False
PodScheduled True
Volumes:
workdir:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
default-token-gs59h:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-gs59h
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m default-scheduler Successfully assigned report-api-57f64db6c7-mqn7x to ip-10-10-133-37.cn-northwest-1.compute.internal
Normal SuccessfulMountVolume 4m kubelet, ip-10-10-133-37.cn-northwest-1.compute.internal MountVolume.SetUp succeeded for volume "workdir"
Normal SuccessfulMountVolume 4m kubelet, ip-10-10-133-37.cn-northwest-1.compute.internal MountVolume.SetUp succeeded for volume "default-token-gs59h"
Normal Pulling 3m (x4 over 4m) kubelet, ip-10-10-133-37.cn-northwest-1.compute.internal pulling image "anigeo/awscli:latest"
Normal Pulled 3m (x4 over 4m) kubelet, ip-10-10-133-37.cn-northwest-1.compute.internal Successfully pulled image "anigeo/awscli:latest"
Normal Created 3m (x4 over 4m) kubelet, ip-10-10-133-37.cn-northwest-1.compute.internal Created container
Normal Started 3m (x4 over 4m) kubelet, ip-10-10-133-37.cn-northwest-1.compute.internal Started container
Warning BackOff 3m (x7 over 4m) kubelet, ip-10-10-133-37.cn-northwest-1.compute.internal Back-off restarting failed container
[zzq@localhost zzq]$ |