Internals — Reconciliation & Runtime¶
This document describes how the operator behaves at runtime: the reconciliation lifecycle, child controller pattern, status reporting, and resource cleanup. For the structural overview (CRD hierarchy, configuration model, config rendering), see Architecture. For the full lifecycle task reference (pod state machine, retry semantics, upgrade modes), see Lifecycle.
Reconciliation Lifecycle¶
When a Superset CR is created or updated, the parent controller runs through
five sequential phases:
- Preflight — Fetch the Superset CR, check the suspend flag
- Shared Resources — ServiceAccount
- Lifecycle Tasks — Create/update SupersetLifecycleTask child CRs (gates everything below)
- Component Reconciliation — Resolve shared spec (top-level + per-component) into flat child specs, create/update/delete child CRs, reconcile networking/monitoring/network policies
- Status Aggregation — Read child CR statuses, set conditions and phase
Phase 1: Preflight¶
The controller fetches the Superset CR. If it no longer exists, the
reconciler returns gracefully — Kubernetes garbage collection handles cleanup
via owner references.
If spec.suspend is true, the controller sets the Suspended condition to
True, updates status, and returns immediately. No task pods run, no child CRs
are created or updated, and no resources are deleted. This allows users to
pause reconciliation without removing the CR.
Phase 2: Shared Resources¶
ServiceAccount — Created if spec.serviceAccount.create is true (the
default). Uses the name from spec.serviceAccount.name or falls back to the
parent CR name. Owned by the parent CR and garbage-collected on parent deletion.
Phase 3: Lifecycle Tasks¶
The parent controller creates SupersetLifecycleTask child CRs:
{parentName}-clone, {parentName}-migrate, {parentName}-rotate, and {parentName}-init. The parent
uses a Get+Create/Delete pattern (never CreateOrUpdate) to avoid races with the
task controller's status writes. When a task needs to re-run (checksum mismatch),
the parent deletes the old CR and creates a fresh one on the next reconcile.
Tasks run sequentially: clone → migrate → rotate → init. Each task can be independently
disabled via disabled: true. Clone also supports periodic re-execution via
cronSchedule. Checksums cascade downstream: a re-clone forces re-migrate,
which forces re-rotate, which forces re-init.
When a task requires drain (requiresDrain: true, the default for clone,
migrate, and rotate), the operator deletes all component child CRs before running that task.
The parent verifies all component pods have terminated (not just Deployments
deleted) before proceeding to task execution. This ensures no application pods
access the metastore during schema changes. If maintenancePage is configured,
the parent brings up a maintenance Deployment and switches the web-server Service
selector before draining. After tasks complete, Phase 4 recreates all components
fresh.
Components do not deploy until all enabled lifecycle tasks complete (or lifecycle is
explicitly disabled via spec.lifecycle.disabled: true). If a task is in
progress or has failed, Reconcile() returns early with a requeue, skipping
Phase 4.
For the full lifecycle reference including pod state machine, retry/backoff semantics, upgrade modes, and drain verification, see Lifecycle.
Phase 4: Component Reconciliation¶
For each of the six deployment components, the parent controller:
- Checks if the component is enabled (field present in spec)
- If disabled, deletes the child CR (cascade-deletes all owned resources)
- If enabled:
- Renders component-appropriate
superset_config.pyfrom the parent'ssecretKey/secretKeyFrom,metastore,config, and per-componentconfigfields viaRenderConfig() - Collects secret env vars: when
secretKeyFrom,metastore.uriFrom, ormetastore.passwordFromare set, the operator produces env vars withvalueFrom.secretKeyRefpointing at the referenced Secret. In dev mode, inline values produce plainvalueenv vars instead. - Resolves the shared spec (top-level + per-component) into a
flat
FlatComponentSpecviaResolveChildSpec() - Computes a config checksum from shared inputs and rendered config
- Creates or updates the child CR with the fully-flattened spec
- Renders component-appropriate
After components, the controller reconciles cluster-scoped resources: networking (Ingress or HTTPRoute), monitoring (ServiceMonitor), and network policies (one NetworkPolicy per enabled component).
Phase 5: Status Aggregation¶
The controller reads each child CR's status via unstructured GET (using the
correct GVK per component type), extracts the ready field (format:
"readyReplicas/desiredReplicas"), and aggregates into the parent status.
| All components ready | Phase | Available condition |
|---|---|---|
| Yes | Running |
True |
| No | Degraded |
False |
Child Controller Pattern¶
Each child CRD (SupersetLifecycleTask, SupersetWebServer, SupersetCeleryWorker, etc.) has its own controller that reconciles the Kubernetes resources for that component.
Scalable components (WebServer, CeleryWorker, CeleryFlower, WebsocketServer,
McpServer) manage a Deployment and support replicas, HPA, and PDB. Their specs
embed ScalableComponentSpec, which has DeploymentTemplate, PodTemplate,
and scaling fields.
Singleton components (SupersetLifecycleTask, CeleryBeat) run exactly one instance.
SupersetLifecycleTask manages bare Pods with retry logic (uses PodTemplate only).
CeleryBeat manages a Deployment but forces replicas: 1 (has both
DeploymentTemplate and PodTemplate but no scaling fields).
All deployment controllers follow the same pattern: reconcile ConfigMap (if applicable), reconcile Deployment, reconcile Service (if the component exposes a port), reconcile scaling (HPA + PDB for scalable components), and update status. The task controller reconciles a ConfigMap and manages bare Pods.
Why ConfigMaps¶
Superset imports superset_config as a standard Python module, which means the
config must exist as a .py file on the filesystem. A ConfigMap volume mount is
the standard Kubernetes mechanism for projecting files into containers:
- Python import requirement —
superset_config.pymust be a real file on disk; environment variables and downward API projections cannot serve as importable Python modules - Operability —
kubectl get cmshows exactly what config each component is running, making debugging straightforward - Clean pod manifests — Without ConfigMaps, the rendered Python config would need to be inlined on the pod spec (as annotations or env vars), making Deployment manifests difficult to read. ConfigMaps keep pod specs focused on container configuration
Ownership and Checksum Flow¶
ConfigMaps are created and owned by the parent Superset controller (not by child CRs). This means:
- ConfigMaps survive child CR deletion (e.g., during drain)
- The parent is the single writer of config content
- Child controllers mount ConfigMaps by conventional name without managing them
The parent computes a ConfigChecksum and passes it to child CRs via
spec.configChecksum. Child controllers stamp this as a pod template annotation
to trigger rolling restarts when config changes. This design follows the
principle that the checksum should be computed by whoever writes the data — since
the parent renders and writes the ConfigMap, it is the authority on when content
changed. Passing the checksum to child CRs avoids requiring child controllers to
watch or read ConfigMaps they don't own.
What Each Component Creates¶
| Component | ConfigMap | Workload | Service | HPA | PDB |
|---|---|---|---|---|---|
| Migrate (task) | superset_config.py | bare Pod | — | — | — |
| Init (task) | superset_config.py | bare Pod | — | — | — |
| WebServer | superset_config.py | Deployment (gunicorn) | port 8088 | if set | if set |
| CeleryWorker | superset_config.py | Deployment (celery worker) | — | if set | if set |
| CeleryBeat | superset_config.py | Deployment (celery beat) | — | — | — |
| CeleryFlower | superset_config.py | Deployment (celery flower) | port 5555 | if set | if set |
| WebsocketServer | — | Deployment (node.js) | port 8080 | if set | if set |
| McpServer | superset_config.py | Deployment (fastmcp) | port 8088 | if set | if set |
CeleryBeat is a singleton — the controller forces replicas: 1 regardless
of the spec, and does not create an HPA or PDB.
WebsocketServer is Node.js-based and does not get a superset_config.py
ConfigMap.
Deployment Builder¶
All child controllers delegate to buildDeploymentSpec(), which constructs a
complete Deployment spec from the flat FlatComponentSpec and a
component-specific DeploymentConfig:
type DeploymentConfig struct {
ContainerName string // e.g., "superset-web-server"
DefaultCommand []string // e.g., ["/usr/bin/run-server.sh"]
DefaultArgs []string // optional
DefaultPorts []corev1.ContainerPort // e.g., [{Name: "http", Port: 8088}]
ForceReplicas *int32 // non-nil only for beat (=1)
}
Replicas resolution order:
ForceReplicas(beat singleton) — always winsnilif HPA is configured — HPA manages scalingspec.Replicasotherwise
Idempotent Reconciliation¶
All resource creation uses controllerutil.CreateOrUpdate(): creates the
resource if it doesn't exist, updates it if the spec has drifted. This makes
every reconciliation cycle safe to re-run.
Labels and Annotations¶
The operator sets reserved labels on child CRs (SupersetLifecycleTask, SupersetWebServer, etc.) and NetworkPolicies for resource discovery and orphan cleanup.
Operator-Managed Labels¶
| Label | Value | Purpose |
|---|---|---|
app.kubernetes.io/name |
superset |
Application identity |
app.kubernetes.io/component |
Component type (e.g., web-server) |
Component type filtering |
superset.apache.org/parent |
Parent Superset CR name | Parent-scoped discovery |
These labels are set by the operator on every reconciliation and cannot be overridden — operator-managed labels are applied last, taking precedence over any existing values.
Sub-resources (Deployments, Services, ConfigMaps) created by child controllers
use the standard app.kubernetes.io/* labels with app.kubernetes.io/instance
set to the child CR name for selector matching.
Orphan Cleanup¶
When a component is disabled, the operator uses label-based discovery to find and delete orphaned child CRs. On each reconcile, it lists all child CRs matching the parent and component type labels, then deletes any whose name does not match the currently desired name. Deleting a child CR cascades to all its owned sub-resources via owner references.
Checksum-Driven Rollouts¶
Config changes must trigger pod restarts for the new config to take effect. The operator achieves this through checksum annotations on the pod template.
How It Works¶
- Parent controller computes checksums when building child CRs
- Checksums are stored on the child CR spec
- Child controller stamps them as pod template annotations
- When a checksum changes, the pod template changes, and Kubernetes triggers a rolling restart
Checksum Types¶
| Annotation | Source | Scope |
|---|---|---|
superset.apache.org/config-checksum |
Rendered superset_config.py | Per-component |
Per-component isolation: Changing a component's config only
changes that component's config checksum -- only its pods restart. Other
components are unaffected.
Secret safety: In prod mode, operator-managed secret values (secretKeyFrom,
metastore.uriFrom, metastore.passwordFrom, valkey.passwordFrom) are never
read by the operator and therefore never appear in checksums, annotations, or
ConfigMaps. In dev mode, inline secret values (secretKey, metastore.password,
valkey.password) influence the shared config checksum (as a hash, not
plaintext) because changes to these values must trigger a rollout.
Garbage Collection¶
The operator uses Kubernetes owner references for automatic cleanup. The parent
Superset CR owns child CRDs (SupersetLifecycleTask, SupersetWebServer, etc.),
the web-server Service, networking resources, ServiceMonitor, and NetworkPolicies.
Each child CR owns its managed resources — deployment CRDs own their Deployment,
ConfigMap, Service (except web-server, which is parent-owned), HPA, and PDB; the
SupersetLifecycleTask CRDs own their ConfigMap and Pods.
Deleting the parent cascades to all child CRs, which cascade to all their
owned resources. Removing a component from the parent spec (e.g. deleting
spec.celeryWorker) deletes its child CR, cascading to all owned resources.
Maintenance Page (Parent-Owned Service Selector Switch)¶
When spec.lifecycle.maintenancePage is set, the operator serves a maintenance
page during drain and lifecycle tasks. This section documents the design decision
behind the traffic switchover mechanism.
Problem¶
During drain, component child CRs are deleted. GC cascades this to Deployments and Pods. Without intervention, users experience connection errors instead of a friendly maintenance message.
Solution: Parent-Owned Web-Server Service¶
The parent controller owns the web-server Service directly (not the child CR). During lifecycle drain, the parent:
- Creates a maintenance Deployment (parent-owned) running a lightweight HTTP server (nginx:alpine by default or a user-provided image).
- Switches the web-server Service's selector to match the maintenance-page pod labels, instantly routing traffic to maintenance pods.
- Drains all component child CRs (GC cascades to Deployments and Pods, but the Service is unaffected because it belongs to the parent).
- Runs lifecycle tasks (clone → migrate → rotate → init).
- After tasks complete and the web-server child CR is recreated, waits for the web-server Deployment to become ready.
- Switches the Service selector back to the web-server pod labels.
- Deletes the maintenance Deployment and its ConfigMap.
Why Parent-Owned Service¶
- Service selector changes propagate in ~1 second via the endpoints controller, giving instant traffic switchover regardless of ingress implementation
- Works for all access patterns: Ingress, Gateway API, direct Service
- No orphan deletion complexity — the Service is always owned by the parent, so GC of child CRs never affects it
- The child
SupersetWebServerreconciler skips Service management (the parent handles it), keeping the child controller simple
Note for developers using
kubectl port-forward: port-forward establishes a tunnel to a specific pod, not through the Service selector. When that pod is deleted during drain, the tunnel breaks with a "lost connection to pod" error. This does not affect Ingress/Gateway users — they route through EndpointSlices and see seamless transitions. Restart port-forward to reconnect to the maintenance pod.
Alternatives Considered¶
Orphan deletion + selector patch (previous design): Used propagationPolicy:
Orphan when deleting the SupersetWebServer child CR to preserve the Service,
then patched the selector. Rejected because orphan lifecycle was fragile — race
conditions between GC finalization and reconciliation, plus the child had to
detect and re-adopt the orphaned Service on recreation.
Separate maintenance Service + Ingress/HTTPRoute backend swap: Architecturally pure (clean separation, no interaction with web-server resources), but rejected because Ingress/HTTPRoute propagation latency varies significantly by controller implementation — from ~1s (Envoy-based) to 1-3 minutes (cloud load balancers like GCP/AWS). This creates an unacceptable error window where users hit the draining backend. Also doesn't work for users without networking configured.
Status and Conditions¶
Parent Status¶
The parent Superset CR reports aggregate status:
status:
phase: Running
observedGeneration: 3
version: "latest"
components:
webServer:
ready: "2/2"
celeryWorker:
ready: "4/4"
celeryBeat:
ready: "1/1"
conditions:
- type: Available
status: "True"
reason: AllComponentsReady
- type: InitComplete
status: "True"
reason: InitComplete
- type: Suspended
status: "False"
Parent Phase¶
The top-level status.phase reflects the overall instance state:
| Phase | Meaning |
|---|---|
Initializing |
First deployment — lifecycle tasks running for the first time |
Upgrading |
Image change detected — lifecycle tasks running against new version |
Draining |
Drain strategy active — components being removed before running tasks |
Running |
All enabled components are ready and lifecycle is complete |
Degraded |
One or more components are not fully ready |
Suspended |
spec.suspend: true — all reconciliation paused |
Blocked |
Downgrade detected — lifecycle tasks will not run (manual intervention required) |
AwaitingApproval |
Supervised upgrade mode — waiting for approval annotation before proceeding |
Child Status¶
Each child CR reports its own status:
status:
ready: "2/3"
observedGeneration: 5
conditions:
- type: Ready
status: "False"
reason: PartiallyReady
message: "2 of 3 replicas ready"
- type: Progressing
status: "True"
reason: RolloutInProgress
Ready condition states:
| State | Meaning |
|---|---|
True / AllReplicasReady |
readyReplicas >= desiredReplicas and > 0 |
False / PartiallyReady |
Some replicas ready, not all |
False / NotReady |
Zero replicas ready |
Progressing condition states:
| State | Meaning |
|---|---|
True / RolloutInProgress |
Deployment is rolling out new pods |
False / RolloutComplete |
New ReplicaSet is fully available |
Error Handling Summary¶
| Scenario | Behavior |
|---|---|
| Superset CR deleted during reconcile | Graceful return (not found) |
| Init pod fails | Retry with backoff up to maxRetries, then permanent failure |
| Init pod times out | Counts as failed attempt, same retry logic |
| Child CR creation fails | Error propagated, reconcile retried by controller-runtime |
| Optional CRD missing (Gateway API, ServiceMonitor) | Log and continue — feature disabled gracefully |
| Referenced Secret values change | Pods see new values only after restart; update forceReload to trigger rollout |
| Component removed from spec | Child CR deleted, cascade cleans up all resources |
| Suspend enabled | All reconciliation paused, no resources created or deleted |