遇到问题请加群咨询,不要评论了。群链接在关于我里面

应用安装排错流程

首先查看事件(点击安装的应用名,点击事件查看)
一般拉取镜像失败,权限设置错误等错误会这里显示

其次查看日志(点击应用名称旁边的三个点,点击日志)
这里就是应用的全部日志,一般应用的权限设置错误,应用本身的错误会在这里显示。

常见错误

nil pointer evaluating interface {}.mode

[EFAULT] Failed to install chart release: Error: INSTALLATION FAILED: template: APPNAME/templates/common.yaml:1:3: executing "APPNAME/templates/common.yaml" at : error calling include: template: APPNAME/charts/common/templates/loader/_all.tpl:6:6: executing "tc.v1.common.loader.all" at : error calling include: template: APPNAME/charts/common/templates/loader/_apply.tpl:47:6: executing "tc.v1.common.loader.apply" at : error calling include: template: APPNAME/charts/common/templates/spawner/_pvc.tpl:25:10: executing "tc.v1.common.spawner.pvc" at : error calling include: template: APPNAME/charts/common/templates/lib/storage/_validation.tpl:18:43: executing "tc.v1.common.lib.persistence.validation" at <$objectData.static.mode>: nil pointer evaluating interface {}.mode

The issue: This error is due to old version of Helm. Helm > 3.9.4 is required.

The solution: Upgrade to TrueNAS SCALE Cobia (23.10.x) or newer. System Settings -> Update -> Select Cobia from the dropdown. SCALE Bluefin and Angelfish releases are no longer supported.

cannot patch "APPNAME-redis" with kind StatefulSet

[EFAULT] Failed to update App: Error: UPGRADE FAILED: cannot patch "APPNAME-redis" with kind StatefulSet: StatefulSet.apps "APPNAME-redis" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'ordinals', 'template', 'updateStrategy', 'persistentVolumeClaimRetentionPolicy' and 'minReadySeconds' are forbidden

The solution: Check which apps have statefulsets by running:

k3s kubectl get statefulsets -A | grep "ix-"

Then, to delete the statefulset:

k3s kubectl delete statefulset STATEFULSETNAME -n ix-APPNAME

Example:

k3s kubectl delete statefulset nextcloud-redis -n ix-nextcloud

Once deleted you can attempt the update (or if you were already updated to latest versions, then edit and save without any changes).

Operator-Related Errors

service "cnpg-webhook-service" not found

[EFAULT] Failed to update App: Error: UPGRADE FAILED: cannot patch "APPNAME-cnpg-main" with kind Cluster: Internal error occurred: failed calling webhook "mcluster.cnpg.io": failed to call webhook: Post "https://cnpg-webhook-service.ix-cloudnative-pg.svc/mutate-postgresql-cnpg-io-v1-cluster?timeout=10s": service "cnpg-webhook-service" not found

The solution:

  • Enter the following command
k3s kubectl delete deployment.apps/cloudnative-pg --namespace ix-cloudnative-pg
  • Update Cloudnative-PG to the latest version, or if you already on the latest version, edit cCloudnative-PG and save/update it again without any changes.
  • If the app remains stopped, hit the start button in the UI for Cloudnative-PG.

"monitoring.coreos.com/v1" ensure CRDs are installed first

[EFAULT] Failed to update App: Error: UPGRADE FAILED: unable to build kubernetes objects from current release manifest: resource mapping not found for name: "APPNAME" namespace: "ix-APPNAME" from "": no matches for kind "PodMonitor" in version "monitoring.coreos.com/v1" ensure CRDs are installed first

The solution:

  • Install Prometheus-Operator first, then go back and install the app you were trying to install
  • If you see this error with Prometheus-Operator already installed, delete it and reinstall
  • While deleting Prometheus-{o}perator, if you encounter the error:

Error: [EFAULT] Unable to uninstall 'prometheus-operator' chart release: b'Error: failed to delete release: prometheus-operator\n'

Run the following command from the TrueNAS SCALE shell as root:

k3s kubectl delete namespace ix-prometheus-operator

Then install Prometheus-Operator again. It will fail on the first install attempt, but the second time it will work.

Rendered manifests contain a resource that already exists

certificaterequests.cert-manager.io

[EFAULT] Failed to install App: Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists. Unable to continue with install: CustomResourceDefinition "certificaterequests.cert-manager.io" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "cert-manager"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "ix-cert-manager"

The solution: The Cert-Manager operator is required for the use of Cert-Manager and Clusterissuer to issue certificates for chart ingress.

To remove the previous automatically installed operator run this in the system shell as root :

k3s kubectl delete  --grace-period 30 --v=4 -k https://github.com/truecharts/manifests/delete4

https://truecharts.org/manual/FAQ#cert-manager

backups.postgresql.cnpg.io

[EFAULT] Failed to install App: Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists. Unable to continue with install: CustomResourceDefinition "backups.postgresql.cnpg.io" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "cloudnative-pg"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "ix-cloudnative-pg"

The solution: The Cloudnative-PG operator is required for the use of any charts that utilize CloudNative Postgresql (CNPG).

DATA LOSS

The following command is destructive and will delete any existing CNPG databases.

Run the following command in system shell as root to see if you have any current CNPG databases to migrate:

k3s kubectl get cluster -A

Follow this guide to safely migrate any existing CNPG databases.

To remove the previous automatically installed operator run this in the system shell as root :

k3s kubectl delete  --grace-period 30 --v=4 -k https://github.com/truecharts/manifests/delete2

https://truecharts.org/manual/FAQ#cloudnative-pg

addresspools.metallb.io

[EFAULT] Failed to install App: Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists. Unable to continue with install: CustomResourceDefinition "addresspools.metallb.io" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "metallb"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "ix-metallb"

The solution: The Metallb operator is required for the use of MetalLB to have each chart utilize a unique IP address.

LOSS OF CONNECTIVITY

Installing the MetalLB operator will prevent the use of the TrueNAS Scale integrated load balancer. Only install this operator if you intend to use MetalLB.

To remove the previous automatically installed operator run this in the system shell as root :

k3s kubectl delete  --grace-period 30 --v=4 -k https://github.com/truecharts/manifests/delete

https://truecharts.org/manual/FAQ#metallb

alertmanagerconfigs.monitoring.coreos.com

[EFAULT] Failed to install chart release: Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists. Unable to continue with install: CustomResourceDefinition "alertmanagerconfigs.monitoring.coreos.com" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "prometheus-operator"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "ix-prometheus-operator"

The solution: The Prometheus-operator is required for the use of Prometheus metrics and for any charts that utilize CloudNative Postgresql (CNPG).

To remove the previous automatically installed operator run this in the system shell as root :

k3s kubectl delete  --grace-period 30 --v=4 -k https://github.com/truecharts/manifests/delete3

https://truecharts.org/manual/FAQ#prometheus-operator

Operator [traefik] has to be installed first

Failed to install App: Operator [traefik] has to be installed first

The solution: If this error appears while installing Traefik, install Traefik with its own ingress disabled first. Once it's installed you can enable ingress for traefik.

Operator [cloudnative-pg] has to be installed first

Failed to install App: Operator [cloudnative-pg] has to be installed first

The solution: Install Cloudnative-PG.

TIP

Ensure the system train is enabled in the Truecharts catalog under Apps -> Discover Apps -> Manage Catalogs.

Operator [Prometheus-operator] has to be installed first

Failed to install App: Operator [rometheus-operator] has to be installed first

The solution: Install Prometheus-operator.

TIP

Ensure the system train is enabled in the Truecharts catalog under Apps -> Discover Apps -> Manage Catalogs.

Can't upgrade between ghcr.io/cloudnative-pg/postgresql

[EFAULT] Failed to update App: Error: UPGRADE FAILED: cannot patch "APPNAME-cnpg-main" with kind Cluster: admission webhook "vcluster.cnpg.io" denied the request: Cluster.cluster.cnpg.io "APPNAME-cnpg-main" is invalid: spec.imageName: Invalid value: "ghcr.io/cloudnative-pg/postgresql:16.2": can't upgrade between ghcr.io/cloudnative-pg/postgresql:15.2 and ghcr.io/cloudnative-pg/postgresql:16.2

The solution: run this in the system shell as root , replacing APPNAME with the name of your CNPG-dependant app, e.g. home-assistant:

k3s kubectl patch configmap APPNAME-cnpg-main-pgversion -n ix-APPNAME -p '{"data": {"version": "15"}}'

zfs.csi.openebs.io

1647751470954.png

报错类似图上,等待卷创建,实际上是因为openebs-zfs-nodeopenebs-zfs-controller没有部署成功。这两个东西是提供PVC的插件。

执行

k3s kubectl describe pod openebs-zfs-controller-0 -n kube-system

查看events

Events:
Type     Reason   Age                       From     Message
Warning  Failed   48m (x12 over 7h23m)      kubelet  Failed to pull image "k8s.gcr.io/sig-storage/snapshot-controller:v4.0.0": rpc error: code = Unknown desc = Error response from daemon: Get "https://k8s.gcr.io/v2/": context deadline exceeded
Warning  Failed   43m (x151 over 7h29m)     kubelet  (combined from similar events): Failed to pull image "k8s.gcr.io/sig-storage/csi-snapshotter:v4.0.0": rpc error: code = Unknown desc = Error response from daemon: Get "https://k8s.gcr.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Normal   Pulling  38m (x72 over 7h39m)      kubelet  Pulling image "k8s.gcr.io/sig-storage/csi-resizer:v1.1.0"
Normal   BackOff  3m33s (x1396 over 7h33m)  kubelet  Back-off pulling image "k8s.gcr.io/sig-storage/csi-resizer:v1.1.0"

可以发现镜像在k8s.gcr.io,它是谷歌的仓库,由于科学上网速度太慢,或者是没有科学上网,所以无法拉取

解决方法

我们可以手动从国内的阿里镜像仓库拉取镜像,然后修改标签。
根据前面的describe命令查看到需要的镜像有:csi-resizer:v1.1.0csi-snapshotter:v4.0.0snapshot-controller:v4.0.0csi-provisioner:v2.1.0csi-node-driver-registrar:v2.1.0

手动拉取并标记

docker pull registry.aliyuncs.com/google_containers/csi-resizer:v1.1.0
docker tag registry.aliyuncs.com/google_containers/csi-resizer:v1.1.0 k8s.gcr.io/sig-storage/csi-resizer:v1.1.0
docker rmi registry.aliyuncs.com/google_containers/csi-resizer:v1.1.0
docker pull registry.aliyuncs.com/google_containers/csi-snapshotter:v4.0.0
docker tag registry.aliyuncs.com/google_containers/csi-snapshotter:v4.0.0 k8s.gcr.io/sig-storage/csi-snapshotter:v4.0.0
docker rmi registry.aliyuncs.com/google_containers/csi-snapshotter:v4.0.0
docker pull registry.aliyuncs.com/google_containers/snapshot-controller:v4.0.0
docker tag registry.aliyuncs.com/google_containers/snapshot-controller:v4.0.0 k8s.gcr.io/sig-storage/snapshot-controller:v4.0.0
docker rmi registry.aliyuncs.com/google_containers/snapshot-controller:v4.0.0
docker pull registry.aliyuncs.com/google_containers/csi-provisioner:v2.1.0
docker tag registry.aliyuncs.com/google_containers/csi-provisioner:v2.1.0 k8s.gcr.io/sig-storage/csi-provisioner:v2.1.0
docker rmi registry.aliyuncs.com/google_containers/csi-provisioner:v2.1.0
docker pull registry.aliyuncs.com/google_containers/csi-node-driver-registrar:v2.1.0
docker tag registry.aliyuncs.com/google_containers/csi-node-driver-registrar:v2.1.0 k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.1.0
docker rmi registry.aliyuncs.com/google_containers/csi-node-driver-registrar:v2.1.0

22.02.1版本

docker pull registry.aliyuncs.com/google_containers/csi-resizer:v1.2.0
docker tag registry.aliyuncs.com/google_containers/csi-resizer:v1.2.0 k8s.gcr.io/sig-storage/csi-resizer:v1.2.0
docker rmi registry.aliyuncs.com/google_containers/csi-resizer:v1.2.0
docker pull registry.aliyuncs.com/google_containers/csi-snapshotter:v4.0.0
docker tag registry.aliyuncs.com/google_containers/csi-snapshotter:v4.0.0 k8s.gcr.io/sig-storage/csi-snapshotter:v4.0.0
docker rmi registry.aliyuncs.com/google_containers/csi-snapshotter:v4.0.0
docker pull registry.aliyuncs.com/google_containers/snapshot-controller:v4.0.0
docker tag registry.aliyuncs.com/google_containers/snapshot-controller:v4.0.0 k8s.gcr.io/sig-storage/snapshot-controller:v4.0.0
docker rmi registry.aliyuncs.com/google_containers/snapshot-controller:v4.0.0
docker pull registry.aliyuncs.com/google_containers/csi-provisioner:v3.0.0
docker tag registry.aliyuncs.com/google_containers/csi-provisioner:v3.0.0 k8s.gcr.io/sig-storage/csi-provisioner:v3.0.0
docker rmi registry.aliyuncs.com/google_containers/csi-provisioner:v3.0.0
docker pull registry.aliyuncs.com/google_containers/csi-node-driver-registrar:v2.3.0
docker tag registry.aliyuncs.com/google_containers/csi-node-driver-registrar:v2.3.0 k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.3.0
docker rmi registry.aliyuncs.com/google_containers/csi-node-driver-registrar:v2.3.0

permission类

一般是在于安装自定义应用(custom app)时,权限设置错误

日志中可以看到类似于 permission denied readonlyfilesystem

遇到这类问题,修改权限为root即可

image.png

image.png

K3S启动失败

这其实是scale的BUG,具体引起的原因不清楚,体现在应用启动不来,已安装页面一直转圈圈

CRITICAL
Failed to start kubernetes cluster for Applications: 18
2022-03-21 08:02:08 (Asia/Shanghai)

命令查询

k3s kubectl get node

显示not ready

查询K3S服务

systemctl status k3s

显示失败等

解决办法

  1. 重启
  2. 重新选一下应用池

KeyError: 'nodePort'

Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/middlewared/job.py", line 423, in run
    await self.future
  File "/usr/lib/python3/dist-packages/middlewared/job.py", line 459, in __run_body
    rv = await self.method(*([self] + args))
  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1129, in nf
    res = await f(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1261, in nf
    return await func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/chart_releases_linux/chart_release.py", line 404, in do_create
    new_values, context = await self.normalise_and_validate_values(item_details, new_values, False, release_ds)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/chart_releases_linux/chart_release.py", line 332, in normalise_and_validate_values
    dict_obj = await self.middleware.call(
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1318, in call
    return await self._call(
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1275, in _call
    return await methodobj(*prepared_call.args)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/chart_releases_linux/validation.py", line 58, in validate_values
    await self.validate_question(
  File "/usr/lib/python3/dist-packages/middlewared/plugins/chart_releases_linux/validation.py", line 81, in validate_question
    await self.validate_question(
  File "/usr/lib/python3/dist-packages/middlewared/plugins/chart_releases_linux/validation.py", line 81, in validate_question
    await self.validate_question(
  File "/usr/lib/python3/dist-packages/middlewared/plugins/chart_releases_linux/validation.py", line 81, in validate_question
    await self.validate_question(
  [Previous line repeated 1 more time]
  File "/usr/lib/python3/dist-packages/middlewared/plugins/chart_releases_linux/validation.py", line 112, in validate_question
    verrors, parent_value, parent_value[sub_question['variable']], sub_question,
KeyError: 'nodePort'

这个错误一般出现在安装自定义应用

解决办法

只需要随便填上nodeport。类型选择simple即可

常用命令

TrueNAS SCALE是使用的K3S,它的命令和K8S是一样的

在查看应用之前我们要先知道它的namespace

k3s kubectl get ns

然后就可以查询具体的应用了

k3s kubectl get pod -n <namespace>

重启应用

k3s kubectl delete pod <pod名> -n <namespace>

查询日志(和webUI里点击日志的效果是一样的)

k3s kubectl logs <pod名> -n <namespace>

查询事件或者描述(和webUI里点击事件的效果是一样的)

k3s kubectl describe pod <pod名> -n <namespace>

开启、停止应用(和webUI点击停止开启是一样的)

停止

k3s kubectl scale deployment <应用名> --replicas=0 -n <namespace>

开启

k3s kubectl scale deployment <应用名> --replicas=1 -n <namespace>

部分来自truecharts

最后修改:2024 年 03 月 24 日
感谢您的支持