Architecture Overview¶
For runtime behavior details — reconciliation lifecycle, parent-owned resource management, and status reporting — see Internals. For lifecycle task orchestration (migrations, upgrades, drain strategies), see Lifecycle.
Single Superset CRD Architecture¶
The operator exposes one user-facing CRD: Superset. A single Superset
resource defines the complete deployment. The parent controller resolves shared
top-level configuration and per-component overrides into concrete Kubernetes
resources: Deployments, Services, ConfigMaps, HPAs, PDBs, lifecycle task Jobs,
networking, monitoring, and NetworkPolicies.
Why one CRD?¶
Superset is the reconciliation boundary for one Superset installation. The
runtime components do not have independent desired state: they share instance
configuration, secret material, database migration ordering, drain/maintenance
behavior, rollout gates, and aggregate readiness. A lifecycle task can block or
replace every component, and a component rollout may depend on lifecycle state
that only the parent can evaluate.
Because of that coupling, separate component CRDs would mostly expose internal controller decomposition as Kubernetes APIs. They would imply that components can be created, updated, or observed as independently managed custom resources, even though the controller cannot safely reconcile them without the parent's configuration, lifecycle, and rollout context.
A single CRD matches the actual ownership model:
- Users declare one desired state: the
Supersetresource. - The controller reconciles Deployments, Services, ConfigMaps, HPAs, PDBs, lifecycle task Jobs, networking, monitoring, and NetworkPolicies as parent-owned secondary resources.
- The
Supersetstatus subresource is the canonical visibility surface for component readiness, resource references, lifecycle task progress, and failure messages.
This keeps the public API small, avoids partially managed intermediate custom
resources, and makes kubectl describe superset <name> the place to inspect the
state of the whole installation.
The implementation remains modular internally. Component descriptors, deployment defaults, config rendering, lifecycle task orchestration, resource reconciliation, and status projection are separated in code and covered by focused tests. The single CRD is an API and ownership decision, not a mandate for one monolithic controller implementation.
How it works¶
For each enabled component, the parent controller renders any needed
superset_config.py, merges top-level and per-component templates into a flat
runtime spec, reconciles the parent-owned Kubernetes resources, and projects
workload state back into status.components.
Lifecycle tasks follow the same model: the parent resolves the task Job Pod spec,
creates a parent-owned ConfigMap when needed, runs a parent-owned Job, and
stores durable task state in status.lifecycle.
API Shape¶
Users create one top-level resource:
apiVersion: superset.apache.org/v1alpha1
kind: Superset
metadata:
name: my-superset
spec:
image: { tag: "latest" }
environment: Development
secretKey: thisIsNotSecure_changeInProduction!
metastore:
uri: postgresql+psycopg2://superset:superset@postgres:5432/superset
Components fall into two runtime categories:
Scalable components support replicas, HPA, and PodDisruptionBudgets:
| Parent field | Suffix | Creates |
|---|---|---|
webServer |
-web-server |
Deployment, Service, ConfigMap, HPA, PDB |
celeryWorker |
-celery-worker |
Deployment, ConfigMap, HPA, PDB |
celeryFlower |
-celery-flower |
Deployment, Service, ConfigMap, HPA, PDB |
websocketServer |
-websocket-server |
Deployment, Service, HPA, PDB |
mcpServer |
-mcp-server |
Deployment, Service, ConfigMap, HPA, PDB |
Singleton components run exactly one instance and don't support scaling:
| Parent field | Suffix | Creates |
|---|---|---|
lifecycle |
-clone, -migrate, -rotate, -init |
Jobs, ConfigMap for Superset-image tasks |
celeryBeat |
-celery-beat |
Deployment, ConfigMap |
Presence = enabled: Setting celeryWorker: {} deploys workers with
defaults. Omitting celeryWorker entirely means no workers. No
enabled: true/false toggles. The exception is lifecycle tasks, which are
enabled by default even when spec.lifecycle is nil; disable them explicitly
with spec.lifecycle.disabled: true.
Configuration Model¶
All Deployment, Pod, and container configuration flows through two sibling template fields:
deploymentTemplate → DeploymentSpec-level (strategy, revisionHistoryLimit, ...)
podTemplate → PodSpec-level (affinity, tolerations, volumes, ...)
└── container → Container-level (resources, env, probes, ...)
Top-level deploymentTemplate and podTemplate provide defaults
inherited by all components. Per-component values are field-level merged
with the top-level — only specify what's different. Scaling fields
(replicas, autoscaling, podDisruptionBudget) are outside the templates
since they interact with operator logic (HPA, Beat singleton).
Merge semantics per field type:
- Scalars/structs (resources, affinity, securityContext, probes, etc.) — component wins if set
- Named collections (env, volumes, volumeMounts, sidecars) — merge by name, component wins on conflict
- Maps (annotations, labels, nodeSelector) — merge by key, component wins on conflict
- Unnamed collections (tolerations, topologySpreadConstraints) — append
- command/args — component-only, not inherited from top-level
- Operator-managed labels (
app.kubernetes.io/*) — applied last, cannot be overridden
Lifecycle tasks use podTemplate only (no deploymentTemplate) since they
create Jobs, not Deployments. See the Configuration guide for
the full field reference and examples.
Example: How resources resolve for celeryWorker¶
spec:
podTemplate:
container:
resources:
limits:
cpu: "2"
memory: "4Gi"
celeryWorker:
podTemplate:
container:
resources:
limits:
cpu: "8" # component replaces entire resources struct
Result on the celery worker Deployment: resources.limits = {cpu: "8"}
(resources is a scalar/struct field — component replaces entirely).
Example: How env vars resolve for webServer¶
spec:
podTemplate:
container:
env:
- {name: LOG_LEVEL, value: INFO}
webServer:
podTemplate:
container:
env:
- {name: GUNICORN_WORKERS, value: "4"} # merged with top-level
Result on the web server Deployment: both env vars present.
Configuration Philosophy: Typed Fields vs. Raw Python¶
Superset is configured through superset_config.py — a Python module that exposes hundreds of knobs across Flask, Flask-AppBuilder, Celery, caching, security, and Superset itself. Mirroring every one of those knobs as a typed CRD field would turn the operator into a partial Superset fork, balloon CRD size, and lag behind upstream changes. Hiding everything behind a single Python blob, on the other hand, gives up the validation, discoverability, and operator-side reasoning that a CRD makes possible.
The operator splits the difference:
-
Typed CRD fields are reserved for settings where the operator adds clear value: anything tied to Kubernetes resources (images, ports, replicas, autoscaling), anything sourced from Secrets (
secretKey, metastore credentials, Valkey passwords), and managed connectivity that the operator can validate, render uniformly, or wire up across components (metastore URIs, Valkey-backed caches, Celery broker/backend URLs, lifecycle gating). Typed fields earn their place because they can't be expressed as plain Python without re-implementing what the operator already does. -
Raw Python in
spec.configandspec.<component>.configis the default home for application behavior: feature flags beyond a curated set, custom security managers, OAuth providers, thumbnail executors, custom Celery imports, routes, beat schedules, task annotations, and anything else that is naturally a Python expression or class. Admins are comfortable writing Python insuperset_config.py; forcing those settings through YAML schemas tends to obfuscate rather than clarify.
For Celery, this boundary is explicit: the CRD provides managed connectivity to the broker and result backend; users provide the Celery app behavior they want in Python. The operator does not default task imports, task annotations, acknowledgement behavior, beat schedules, or scheduler expiration, because those are Superset application settings that can change across Superset versions and differ across production deployments.
To make Python-side configuration ergonomic, the operator exposes a few resolved values as env vars (SUPERSET_OPERATOR__INSTANCE_NAME, SUPERSET_OPERATOR__VALKEY_HOST, etc.) so admins can reference them from raw Python without templating. Operator-rendered objects like the CeleryConfig class are regular Python — admins extend them by mutating attributes, subclassing, or replacing the assignment outright.
This split is intentionally a moving boundary. As specific knobs prove to be widely used, frequently misconfigured, or worth cross-component validation, they migrate from raw Python into typed CRD fields. The starting position favors raw Python; promotions are made deliberately and case by case.
Config Rendering Pipeline¶
The operator generates per-component superset_config.py files by
concatenating three sections in order. Both spec.config (base) and
spec.<component>.config (component) are appended — they are not mutually
exclusive. If both are set, the component receives all three sections:
How config is built¶
- Operator-generated configs —
SECRET_KEYrendered from theSUPERSET_OPERATOR__SECRET_KEYenv var,SQLALCHEMY_DATABASE_URIrendered from operator-internal env vars (both passthrough and structured metastore modes), plusSUPERSET_WEBSERVER_PORTfor the web server. - SQLAlchemy engine options —
SQLALCHEMY_ENGINE_OPTIONSdict, computed per component from the resolvedsqlaEngineOptionspreset and the component's worker/thread configuration (Gunicorn workers × threads for the web server, Celery concurrency for workers). Presets range fromconservative(NullPool) throughbalanced(pool_size=1, max_overflow=-1) toaggressive(pool_size=workers×threads). See SQLAlchemy Engine Options for details. - Valkey cache config — When
spec.valkeyis set, the operator rendersCACHE_CONFIG,DATA_CACHE_CONFIG,FILTER_STATE_CACHE_CONFIG,EXPLORE_FORM_DATA_CACHE_CONFIG,THUMBNAIL_CACHE_CONFIG,CeleryConfig, andRESULTS_BACKENDbacked by Valkey. Connection details are read fromSUPERSET_OPERATOR__VALKEY_*env vars at Python runtime. The renderedCeleryConfigcontains only broker/backend connectivity and SSL settings; users define Celery app behavior in raw Python. SSL/mTLS cert paths are baked directly into the rendered config. - Base config (
spec.config) — Raw Python from the top-levelconfigfield, shared by all Python components. Appended after operator-generated configs. - Component config (
spec.<component>.config) — Raw Python from the per-componentconfigfield. Appended last, so it can override anything above.
For example, given a structured metastore configuration:
spec:
metastore:
host: db.example.com
database: superset
username: superset
passwordFrom:
name: db-credentials
key: password
config: |
ROW_LIMIT = 10000
celeryWorker:
config: |
CELERY_ANNOTATIONS = {"tasks.add": {"rate_limit": "10/s"}}
The celery worker's superset_config.py contains all three sections:
# Operator-generated configs
SQLALCHEMY_DATABASE_URI = f"postgresql+psycopg2://..." # assembled from env vars
# Base config (spec.config)
ROW_LIMIT = 10000
# Component config
CELERY_ANNOTATIONS = {"tasks.add": {"rate_limit": "10/s"}}
Note: All operator-managed settings (SECRET_KEY, SQLALCHEMY_DATABASE_URI,
web server port) are rendered into the config file from operator-internal
SUPERSET_OPERATOR__* env vars. Both passthrough and structured metastore
modes render SQLALCHEMY_DATABASE_URI from SUPERSET_OPERATOR__DB_URI
(passthrough) or SUPERSET_OPERATOR__DB_* (structured).
| Config section | WebServer | CeleryWorker | CeleryBeat | CeleryFlower | McpServer |
|---|---|---|---|---|---|
| SECRET_KEY | yes | yes | yes | yes | yes |
| Passthrough DB URI | if set | if set | if set | if set | if set |
| Structured DB URI (f-string) | if set | if set | if set | if set | if set |
| Web server port (8088) | yes | ||||
| Top-level config | yes | yes | yes | yes | yes |
| Per-component config | yes | yes | yes | yes | yes |
WebsocketServer is Node.js-based -- it does NOT get superset_config.py.
Optional websocket config.json is handled separately through Development-only
inline config or a Secret-backed configFrom mount.
Secret Handling¶
In dev mode (environment: Development), secretKey, metastore.uri, and
metastore.password can be set as plain strings directly in the CR. The
operator injects them as environment variables on the container spec.
In prod mode (environment: Production, the default), CRD validation rejects
these inline fields. Instead, use the *From fields to reference Kubernetes
Secrets:
secretKeyFrom— references a Secret key for the Flask secret keymetastore.uriFrom— references a Secret key for the full database URImetastore.passwordFrom— references a Secret key for the database password (structured mode)
The operator injects the corresponding env vars (SUPERSET_OPERATOR__SECRET_KEY,
SUPERSET_OPERATOR__DB_URI, SUPERSET_OPERATOR__DB_PASS) with
valueFrom.secretKeyRef pointing at the referenced Secret. Secret values
never appear in ConfigMaps or CRD status fields.
Config Mount Structure¶
/app/pythonpath/— ConfigMap withsuperset_config.py
Checksum-Driven Rollouts¶
The parent computes a per-component config checksum and stamps it on the component Deployment pod template. When the checksum changes (due to config or secret reference changes on the CR), Kubernetes triggers a rolling restart of the affected component. Note: rotating a referenced Secret's value without changing the CR does not trigger a rollout — use Force Reload for this case. See Internals for the full checksum table and per-component isolation details.
Resource Ownership¶
All resources use Kubernetes owner references for automatic cleanup. The parent
Superset CR owns component Deployments, Services, ConfigMaps, HPAs, PDBs,
lifecycle task Jobs, networking resources (Ingress/HTTPRoute), ServiceMonitor,
and NetworkPolicies. Deleting the parent cascades to everything. Removing a
component from the parent spec deletes the resources for that component.