EBS StorageClass with VolumeBindingMode Immediate is incompatible with pod topology pinning

We ran into a weird pod scheduling error on Amazon Elastic Kubernetes Service (EKS). Some pods, which scheduled just fine in the past, now stay in Pending with the following event:

Failed to schedule pod, incompatible with nodepool "default", daemonset overhead={"cpu": "300m", "memory": "2096Mi", "pods": "7"}, incompatible requirements, key nodepool, nodepool In [static] not in nodepool In [default]; key topology.kubernetes.io/zone, topology.kubernetes.1o/zone DoesNotExist not in topology.kubernetes.io/zone In [eu-west-la eu-west-1b eu-west-1c]; incompatible with nodepool "static", daemonset overhead={"cpu": "300m", "memory": "2096Mi", "pods": "7"}, incompatible requirements, key topology.kubernetes.io/zone, topology.kubernetes.io/zone DoesNotExist not in topology.kubernetes.io/zone In [eu-west-la eu-west-1b eu-west-1c]

This error is definitely related to the fact that the pod tries to pin to a specific availability zone. Removing the node selector makes the problem go away.

nodeSelector:
  topology.kubernetes.io/zone: eu-west-1c

The error event is very unreadable and does not hint at all at what the cause is. Luckily, an Internet search result for this error nudged us into the right direction:

You're trying to mount EBS volumes from three different zones. We add a zonal requirement to the pod for each one, and the intersection of those is the empty set that is represented as topology.kubernetes.io/zone DoesNotExist.

Upon double-checking the pod's volumes, we found a PersistentVolumeClaim that's bound to an EBS PersistentVolume in eu-west-1a. That's why the pod cannot be scheduled! EBS volumes can only be attached by pods living in the same availability zone.

But why did this problem suddenly occur? It turns out to be related to a recent EBS StorageClass change we rolled out: we changed the VolumeBindingMode from WaitForFirstConsumer to Immediate, so that that PersistentVolumeClaims always create a PersistentVolume even when no pod claims them yet. We did this in order to fix a problem with Velero: Velero would partially fail backups if it encounters PersistentVolumeClaims that are not yet bound to a PersistentVolume.

With VolumeBindingMode=Immediately, the PersistentVolume is immediately created in a random availability zone, which may not match the availability zone that the pod is pinned to. We conclude that for EBS StorageClasses, using VolumeBindingMode Immediate is inherently incompatible with pod topology pinning.