Skip to content

Configuration

This guide covers the full configuration reference for the Superset operator. For installation instructions, see Installation. For lifecycle (migrations, upgrades), see Lifecycle.

Environment Mode

The environment field controls validation strictness (enforced by CEL rules in the CRD schema):

  • Production (default) — inline secretKey, previousSecretKey, metastore.uri, metastore.password, valkey.password, and websocketServer.config are rejected by CRD validation. Use the corresponding *From fields (secretKeyFrom, previousSecretKeyFrom, metastore.uriFrom, metastore.passwordFrom, valkey.passwordFrom, websocketServer.configFrom) to reference Kubernetes Secrets.
  • Staging — same secret restrictions as Production, but allows lifecycle.clone for database cloning from an external source. lifecycle.clone.source.password must still be supplied via passwordFrom.
  • Development — allows plain-text secretKey, previousSecretKey, metastore.uri, metastore.password, valkey.password, websocketServer.config, and lifecycle.clone.source.password directly in the CR for quick local development. Also permits lifecycle.clone, lifecycle.init.adminUser, and lifecycle.init.loadExamples.

Dev Mode Example

spec:
  environment: Development
  secretKey: thisIsNotSecure_changeInProduction!
  metastore:
    uri: postgresql+psycopg2://superset:superset@postgres:5432/superset
  featureFlags:
    ENABLE_TEMPLATE_PROCESSING: true
  webServer: {}
  lifecycle:
    init:
      adminUser: {}
      loadExamples: true

Prod Mode Example

Use secretKeyFrom and metastore.uriFrom to reference Kubernetes Secrets. The operator injects the corresponding env vars with valueFrom.secretKeyRef:

spec:
  image:
    tag: "6.1.0"
  secretKeyFrom:
    name: superset-secret
    key: secret-key
  metastore:
    uriFrom:
      name: db-credentials
      key: connection-string
  featureFlags:
    ENABLE_TEMPLATE_PROCESSING: true
    ALERT_REPORTS: true
  config: |
    ROW_LIMIT = 10000
  webServer: {}

Metastore

The metastore field provides database connection configuration. There are two modes:

Passthrough URI — provide the full SQLAlchemy connection string. In Development mode, use uri inline. In Staging or Production, use uriFrom to reference a Kubernetes Secret:

# Development mode: inline URI
spec:
  environment: Development
  metastore:
    uri: postgresql+psycopg2://superset:superset@postgres:5432/superset
# Staging/Production: URI from Secret
spec:
  metastore:
    uriFrom:
      name: db-credentials
      key: connection-string

uri and uriFrom are mutually exclusive with each other and with the structured fields below.

Structured fields — the operator sets individual env vars (SUPERSET_OPERATOR__DB_HOST, SUPERSET_OPERATOR__DB_PORT, SUPERSET_OPERATOR__DB_NAME, SUPERSET_OPERATOR__DB_USER, SUPERSET_OPERATOR__DB_PASS) that the generated config assembles into a connection URI. In Staging or Production, use passwordFrom to reference a Secret for the password:

# Development mode: inline password
spec:
  environment: Development
  metastore:
    type: PostgreSQL
    host: db.example.com
    port: 5432
    database: superset
    username: superset
    password: secret
# Staging/Production: password from Secret
spec:
  metastore:
    type: PostgreSQL
    host: db.example.com
    port: 5432
    database: superset
    username: superset
    passwordFrom:
      name: db-credentials
      key: password

password and passwordFrom are mutually exclusive.

Structured mode defaults to postgresql+psycopg2 for PostgreSQL and mysql+mysqldb for MySQL. The operator only selects the SQLAlchemy scheme; it does not install Python driver packages into the Superset image. The official lean Superset images do not include database drivers, so production images should add the driver package required by the selected scheme. For the default MySQL scheme, install mysqlclient; for the default PostgreSQL scheme, install psycopg2 or a compatible package. See Superset's Docker Builds and MySQL docs for the upstream driver guidance. If your image installs a different SQLAlchemy driver, set metastore.driver:

spec:
  metastore:
    type: MySQL
    driver: pymysql
    host: mysql.example.com
    database: superset
    username: superset
    passwordFrom:
      name: db-credentials
      key: password

Auto-creating the database

Setting metastore.createDatabase: true instructs the operator to attach a one-shot init container to the migrate Job that issues CREATE DATABASE against the server before superset db upgrade runs. The step is idempotent — existing databases are detected and the init container exits cleanly, so re-applying or re-running migrations is safe.

spec:
  metastore:
    type: PostgreSQL
    host: db.example.com
    database: superset
    username: superset
    passwordFrom:
      name: db-credentials
      key: password
    createDatabase: true

Requirements and caveats:

  • Structured metastore only. Rejected by CRD validation when uri or uriFrom is set — the operator needs the individual host/database/username fields to issue admin-level statements.
  • Privileges. The configured metastore user must have CREATEDB (PostgreSQL) or CREATE (MySQL) privilege on the server. The init container connects to the postgres admin database (PostgreSQL) or runs CREATE DATABASE IF NOT EXISTS (MySQL).
  • Init container image. The operator uses postgres:17-alpine or mysql:8-alpine (matching the clone task) — the Superset image is not assumed to ship database client tools.
  • Resources and securityContext are inherited from spec.lifecycle.podTemplate.container. Whatever you set on spec.lifecycle.podTemplate.container.resources and spec.lifecycle.podTemplate.container.securityContext is applied to the create-database init container. This lets you satisfy strict admission policies (Pod Security Standards restricted, Kyverno, OPA) without a dedicated knob. The init container also defaults to a non-root UID (matching its DB-tool image), so it starts cleanly under a pod-level runAsNonRoot: true even when you don't pin a UID — an explicit runAsUser at the pod or container level is always respected.
  • Redundant with lifecycle.clone. Clone already drops and re-creates its target database every time it runs, so toggling createDatabase on alongside clone is harmless but does no extra work in practice — the init container detects the existing database (created by clone) and no-ops.

Valkey

The valkey field configures Valkey (or Redis) as the cache backend, Celery message broker, and SQL Lab results backend. Setting valkey.host auto-generates all cache, Celery, and results backend configuration with sensible defaults:

# Minimal: one field configures everything
spec:
  valkey:
    host: valkey.default.svc

This generates a superset_config.py with CACHE_CONFIG, DATA_CACHE_CONFIG, FILTER_STATE_CACHE_CONFIG, EXPLORE_FORM_DATA_CACHE_CONFIG, THUMBNAIL_CACHE_CONFIG, DISTRIBUTED_COORDINATION_CONFIG, a connectivity-only CeleryConfig, and RESULTS_BACKEND — each cache section backed by a separate Valkey database for isolation (Celery broker and result backend share database 0 by default). Celery application behavior such as imports, task routes, and beat schedules remains explicit Python config.

Default database assignments:

Section Superset Config Key Valkey DB Key Prefix Timeout
cache CACHE_CONFIG 1 superset_ 300s
dataCache DATA_CACHE_CONFIG 2 superset_data_ 86400s
filterStateCache FILTER_STATE_CACHE_CONFIG 3 superset_filter_ 3600s
exploreFormDataCache EXPLORE_FORM_DATA_CACHE_CONFIG 4 superset_explore_ 3600s
thumbnailCache THUMBNAIL_CACHE_CONFIG 5 superset_thumbnail_ 3600s
distributedCoordination DISTRIBUTED_COORDINATION_CONFIG 7 coordination_ 300s
celeryBroker CeleryConfig.broker_url 0
celeryResultBackend CeleryConfig.result_backend 0
resultsBackend RESULTS_BACKEND 6 superset_results_

distributedCoordination (DISTRIBUTED_COORDINATION_CONFIG) backs Superset's real-time pub/sub messaging, atomic distributed locks (Redis SET NX EX), and Global Task Framework signaling. It is recommended for production deployments and will eventually replace GLOBAL_ASYNC_QUERIES_CACHE_BACKEND as the standard signaling backend.

Instance-scoped key prefixes. Every rendered CACHE_KEY_PREFIX (and the results backend's key_prefix) is automatically prefixed with the parent CR name at runtime — e.g. a Superset named prod produces prod_superset_, prod_superset_data_, prod_coordination_, etc. This prevents key collisions when multiple Superset deployments share a single Valkey instance. The prefix value you set on a section is appended after the instance name; setting keyPrefix: "myapp_" on cache yields prod_myapp_.

Each section can be individually tuned or disabled:

spec:
  valkey:
    host: valkey.default.svc
    port: 6380
    passwordFrom:
      name: valkey-credentials
      key: password
    cache:
      defaultTimeout: 600
    dataCache:
      database: 10
      defaultTimeout: 43200
    filterStateCache:
      disabled: true    # fall back to Superset's built-in default
    celeryBroker:
      database: 14
    celeryResultBackend:
      database: 15

In Development mode, valkey.password can be set inline. In Staging or Production, use valkey.passwordFrom to reference a Kubernetes Secret — the operator injects the password via valueFrom.secretKeyRef.

SSL/TLS

Enable SSL by setting the ssl field. For simple SSL (encrypted connection, no client certificates), set ssl: {}. For mTLS, provide certificate file paths:

spec:
  valkey:
    host: valkey.default.svc
    passwordFrom:
      name: valkey-credentials
      key: password
    ssl:
      certRequired: required   # "required" (default), "optional", or "none"
      keyFile: /mnt/tls/client.key.pem
      certFile: /mnt/tls/client.crt.pem
      caCertFile: /mnt/tls/ca.pem

Mount the certificate files via the top-level podTemplate so they are available to all components:

spec:
  podTemplate:
    volumes:
      - name: valkey-tls
        secret:
          secretName: valkey-tls-certs
    container:
      volumeMounts:
        - name: valkey-tls
          mountPath: /mnt/tls
          readOnly: true
  valkey:
    host: valkey.default.svc
    ssl:
      keyFile: /mnt/tls/tls.key
      certFile: /mnt/tls/tls.crt
      caCertFile: /mnt/tls/ca.crt

Reserved Environment Variables

The operator sets certain env vars automatically based on the CR spec. These are organized into tiers:

Env Var Tier Set by Description
SUPERSET_OPERATOR__INSTANCE_NAME Operator-internal Operator (from parent CR metadata.name) Parent CR name; available for use in raw spec.config (e.g. instance-scoped Celery queue names)
SUPERSET_OPERATOR__SECRET_KEY Operator-internal Operator (from secretKey or secretKeyFrom) Superset session signing key
SUPERSET_OPERATOR__DB_URI Operator-internal Operator (from metastore.uri or metastore.uriFrom) Full database connection URI
SUPERSET_OPERATOR__DB_HOST, SUPERSET_OPERATOR__DB_PORT, SUPERSET_OPERATOR__DB_NAME Operator-internal Operator (structured metastore) Database connection fields
SUPERSET_OPERATOR__DB_USER, SUPERSET_OPERATOR__DB_PASS Operator-internal Operator (from metastore structured fields or passwordFrom) Database credentials
SUPERSET_OPERATOR__VALKEY_HOST, SUPERSET_OPERATOR__VALKEY_PORT Operator-internal Operator (from valkey) Valkey connection fields
SUPERSET_OPERATOR__VALKEY_PASS Operator-internal Operator (from valkey.password or valkey.passwordFrom) Valkey password
SUPERSET_OPERATOR__FORCE_RELOAD Operator-internal Operator (from spec.forceReload) Triggers rolling restart
SUPERSET_WEBSERVER_PORT Standard Rendered in config Web server port (8088)

The operator does not set PYTHONPATH — it relies on the upstream Superset image's default (which already includes /app/pythonpath, where the operator mounts the rendered superset_config.py). Custom images must preserve this entry on PYTHONPATH for the rendered config to be picked up.

Tier descriptions:

  • Operator-internal transport vars (SUPERSET_OPERATOR__ prefix) are used by the operator to pass values into the rendered superset_config.py. They are not recognized by Superset directly — the operator renders them as Python assignments (e.g., SECRET_KEY = os.environ['SUPERSET_OPERATOR__SECRET_KEY']).
  • Standard env vars have no special prefix.

Which env vars are set per metastore mode:

Env Var metastore.uri metastore.uriFrom Structured (host, ...)
SUPERSET_OPERATOR__DB_URI Set (plain text) Set (valueFrom)
SUPERSET_OPERATOR__DB_HOST Set
SUPERSET_OPERATOR__DB_PORT Set
SUPERSET_OPERATOR__DB_NAME Set (if database provided)
SUPERSET_OPERATOR__DB_USER Set (if username provided)
SUPERSET_OPERATOR__DB_PASS Set (plain text or valueFrom)

In both passthrough and structured modes, the operator renders SQLALCHEMY_DATABASE_URI in superset_config.py from the operator-internal env vars. Passthrough mode reads from SUPERSET_OPERATOR__DB_URI, while structured mode assembles an f-string URI from the SUPERSET_OPERATOR__DB_* env vars.

Which env vars are set when valkey is configured:

Env Var Set when
SUPERSET_OPERATOR__VALKEY_HOST Always (from valkey.host)
SUPERSET_OPERATOR__VALKEY_PORT Always (from valkey.port, default 6379)
SUPERSET_OPERATOR__VALKEY_PASS valkey.password (dev, plain text) or valkey.passwordFrom (prod, valueFrom)

Custom Python Config

The config field accepts raw Python that is appended after the operator-generated config. It is available at the top level (base config, shared by all Python components) and per component (component config).

The operator exposes a curated set of knobs as typed CRD fields — Kubernetes resources, Kubernetes Secret references, and managed connectivity that the operator can safely validate and wire across components (for example metastore URIs, Valkey-backed caches, Celery broker/backend URLs, lifecycle gating, and feature presets). Application behavior stays in config as Python. Over time, settings that prove broadly useful or error-prone may graduate from raw Python to typed fields. See Configuration Philosophy for the rationale.

spec:
  # Base config: appended to ALL Python components
  config: |
    ROW_LIMIT = 10000
    LOG_LEVEL = "INFO"

  # Component config: appended after base config for this component only
  celeryWorker:
    config: |
      CELERY_ANNOTATIONS = {"tasks.add": {"rate_limit": "10/s"}}

Both fields are concatenated, not mutually exclusive. In this example, the celery worker's superset_config.py contains the operator-generated configs (SECRET_KEY, structured DB URI if applicable), then the base config (ROW_LIMIT, LOG_LEVEL), then the component config (CELERY_ANNOTATIONS). The web server receives only the operator-generated configs and the base config, since it has no component-specific config field set.

See Config Rendering Pipeline for the full rendering order and an example of the generated output.

Extensions

Superset loads extension bundles (.supx files) from the directory configured by EXTENSIONS_PATH in superset_config.py. The operator does not currently provide a typed extension field. Use one of two deployment patterns.

Bake Extensions Into The Image

Build a Superset image that already contains the extension bundles, then point EXTENSIONS_PATH at that directory:

spec:
  image:
    repository: example.com/superset-with-extensions
    tag: "6.1.0-extensions-v1"
  config: |
    EXTENSIONS_PATH = "/app/extensions"

This is the most repeatable path when extension contents should version with the Superset image.

Mount Extensions With PodTemplate

Mount the extension directory with native Kubernetes volume fields, then point EXTENSIONS_PATH at the mount path:

spec:
  config: |
    EXTENSIONS_PATH = "/app/extensions"

  podTemplate:
    volumes:
      - name: superset-extensions
        persistentVolumeClaim:
          claimName: superset-extensions
    container:
      volumeMounts:
        - name: superset-extensions
          mountPath: /app/extensions
          readOnly: true

Top-level spec.podTemplate is inherited by every operator-managed Superset workload Pod, so the extension volume is mounted consistently across the deployment and lifecycle workloads.

Because .supx files are binary zip archives, use a storage source suited for binary artifacts.

When you know mounted extension contents have changed, update spec.forceReload to roll the component pods:

spec:
  forceReload: "extensions-v2"

See Superset's extension deployment documentation for up-to-date guidance on deploying extensions.

Bootstrap Script

bootstrapScript is an escape hatch for the default Python component and lifecycle task commands. When set, the operator writes it as superset_bootstrap.sh in the component or lifecycle ConfigMap and sources it before the default command starts.

spec:
  bootstrapScript: |
    pip install my-superset-plugin

The top-level value applies to web server, Celery worker, Celery Beat, Celery Flower, MCP server, and lifecycle migrate, rotate, and init task Jobs. The websocket server (Node.js) and clone tasks (which run a database-tool image) do not use this script.

Components and lifecycle tasks can override the top-level script. Set the override to an empty string to disable inheritance:

spec:
  bootstrapScript: |
    pip install my-superset-plugin
  celeryWorker:
    bootstrapScript: ""
  lifecycle:
    bootstrapScript: |
      pip install migration-only-helper

If you override podTemplate.container.command or a lifecycle task command, that command is responsible for sourcing /app/pythonpath/superset_bootstrap.sh if it still needs the script. bootstrapScript is trusted shell code and is stored in the generated ConfigMap, so do not place secrets in it. For production dependency installation, a custom image is usually more repeatable than installing packages on every pod start.

Feature Flags

spec.featureFlags is a typed map of Superset feature flags rendered into superset_config.py as FEATURE_FLAGS = {...}. Keys conventionally use UPPER_SNAKE_CASE (e.g. ALERT_REPORTS, THUMBNAILS); values are booleans.

spec:
  featureFlags:
    ALERT_REPORTS: true
    THUMBNAILS: true
    DASHBOARD_NATIVE_FILTERS: true

Keys are emitted in alphabetical order so the rendered config is deterministic and config checksums stay stable across reconciles. Setting featureFlags: {} (or omitting the field) leaves FEATURE_FLAGS unrendered, falling back to upstream Superset defaults.

For feature flags whose values aren't booleans (rare), use raw spec.config instead.

Celery Configuration

Enable Celery workers for background tasks (caching, scheduled reports, long-running queries) by setting celeryWorker and celeryBeat. When spec.valkey is configured, the operator renders the Celery connectivity fields it can derive from the CRD: broker URL, result backend URL, and optional SSL settings.

# With Valkey: connectivity is rendered from the CRD.
spec:
  valkey:
    host: valkey.default.svc
  celeryWorker: {}
  celeryBeat: {}

What the operator renders

When spec.valkey is set, the operator renders a CeleryConfig class assigned to CELERY_CONFIG. This class intentionally contains only managed connectivity:

Field Source Notes
broker_url spec.valkey f-string from operator-internal Valkey env vars
result_backend spec.valkey same
broker_use_ssl / redis_backend_use_ssl spec.valkey.ssl rendered when SSL is configured

Application-level Celery behavior is not defaulted by the operator. Define imports, task routes, task annotations, acknowledgement behavior, beat schedules, scheduler expiration, and other Celery app settings explicitly in spec.config. This keeps the CRD focused on managed connectivity and avoids freezing Superset application defaults into the Kubernetes API.

Because assigning CELERY_CONFIG replaces Superset's own Celery config class, production deployments that enable Celery should include the Celery app settings they rely on. Put settings needed by multiple Python components in top-level spec.config; put Beat-only settings such as beat_schedule in spec.celeryBeat.config so schedule changes roll only the Celery Beat Deployment. The example below is a starting point; review the superset_config.py from the Superset version you deploy and tune it for your environment.

spec:
  valkey:
    host: valkey.default.svc
  celeryWorker: {}
  celeryBeat:
    config: |
      from celery.schedules import crontab

      CELERY_BEAT_SCHEDULER_EXPIRES = 7 * 24 * 60 * 60

      CeleryConfig.beat_schedule = {
          "reports.scheduler": {
              "task": "reports.scheduler",
              "schedule": crontab(minute="*", hour="*"),
              "options": {"expires": CELERY_BEAT_SCHEDULER_EXPIRES},
          },
          "reports.prune_log": {
              "task": "reports.prune_log",
              "schedule": crontab(minute=0, hour=0),
          },
      }
  config: |
    CeleryConfig.imports = (
        "superset.sql_lab",
        "superset.tasks.scheduler",
        "superset.tasks.thumbnails",
        "superset.tasks.cache",
        "superset.tasks.slack",
    )
    CeleryConfig.worker_prefetch_multiplier = 1
    CeleryConfig.task_acks_late = False
    CeleryConfig.task_annotations = {
        "sql_lab.get_sql_results": {
            "rate_limit": "100/s",
        },
    }

Without Valkey

When spec.valkey is unset, the operator emits no CeleryConfig class. Provide both connectivity and app behavior manually via spec.config:

spec:
  config: |
    from celery.schedules import crontab
    class CeleryConfig:
        broker_url = "redis://valkey:6379/0"
        result_backend = "redis://valkey:6379/1"
        imports = ("superset.sql_lab",)
        beat_schedule = {}
    CELERY_CONFIG = CeleryConfig
  celeryWorker: {}
  celeryBeat: {}

Custom Queues And Routes

The operator-rendered CeleryConfig is a regular Python class. Extend it from spec.config by mutating attributes, subclassing, or replacing CELERY_CONFIG outright. For instance-scoped queue naming (preventing cross-instance queue collisions on a shared broker), the operator exposes the parent CR name as SUPERSET_OPERATOR__INSTANCE_NAME:

spec:
  valkey:
    host: valkey.default.svc
  celeryWorker: {}
  celeryBeat: {}
  config: |
    import os
    from kombu import Queue

    INSTANCE = os.environ["SUPERSET_OPERATOR__INSTANCE_NAME"]

    CeleryConfig.task_queues = (
        Queue(f"{INSTANCE}-prio1", routing_key=f"{INSTANCE}-prio1.tasks", priority=0),
        Queue(f"{INSTANCE}-prio2", routing_key=f"{INSTANCE}-prio2.tasks", priority=1),
        Queue(f"{INSTANCE}-prio3", routing_key=f"{INSTANCE}-prio3.tasks", priority=2),
    )
    CeleryConfig.task_default_queue = f"{INSTANCE}-prio1"
    CeleryConfig.task_routes = {
        "sql_lab*": {"queue": f"{INSTANCE}-prio2", "routing_key": f"{INSTANCE}-prio2.tasks"},
        "cache*":   {"queue": f"{INSTANCE}-prio3", "routing_key": f"{INSTANCE}-prio3.tasks"},
        "reports*": {"queue": f"{INSTANCE}-prio3", "routing_key": f"{INSTANCE}-prio3.tasks"},
    }

Use SUPERSET_OPERATOR__INSTANCE_NAME whenever you need the parent CR name inside your Python config.

Gunicorn Configuration

The operator manages Gunicorn worker parameters for the web server by injecting environment variables that Superset's run-server.sh reads. By default, even without an explicit gunicorn field, the operator injects balanced defaults (2 workers, 8 threads, gthread worker class).

Presets control workers, threads, and workerClass. All other fields have static defaults that you can override individually.

Field conservative balanced (default) performance aggressive
workers 1 2 4 8
threads 4 8 8 16
workerClass gthread gthread gthread gthread

Set preset: disabled to suppress env var injection entirely — Superset's run-server.sh built-in defaults will apply instead.

spec:
  webServer:
    gunicorn:
      preset: performance      # 4 workers, 8 threads
      timeout: 120             # override static default (60)
      maxRequests: 1000        # enable worker recycling
      maxRequestsJitter: 50

The full set of configurable fields (static defaults in parentheses): timeout (60), keepAlive (2), maxRequests (0 = disabled), maxRequestsJitter (0), limitRequestLine (0 = unlimited), limitRequestFieldSize (0 = unlimited), logLevel (info).

Celery Worker Configuration

The operator constructs the celery worker command from structured fields instead of the hardcoded default. Presets control concurrency and pool type:

Field conservative balanced (default) performance aggressive
concurrency 2 4 8 16
pool prefork prefork prefork prefork

Set preset: disabled to use the operator's built-in fallback command (--pool=prefork -O fair -c 4).

spec:
  celeryWorker:
    celery:
      preset: performance        # 8 concurrency, prefork
      maxTasksPerChild: 1000     # recycle workers after 1000 tasks
      softTimeLimit: 3600        # 1h soft limit (raises SoftTimeLimitExceeded)
      timeLimit: 7200            # 2h hard kill

Additional fields (static defaults in parentheses): optimization (fair), maxTasksPerChild (0 = unlimited), maxMemoryPerChild (0 = disabled), prefetchMultiplier (4), softTimeLimit (0 = disabled), timeLimit (0 = disabled).

SQLAlchemy Engine Options

The operator renders SQLALCHEMY_ENGINE_OPTIONS in each component's superset_config.py, with pool sizing computed from the component's execution model. By default (balanced preset), all components get sensible pool settings without any explicit configuration.

Presets control poolClass, poolSize, and maxOverflow:

Preset Pool class pool_size max_overflow
disabled (no rendering)
conservative NullPool
balanced (default) QueuePool 1 (web/celery/flower), 5 (MCP) -1 (unlimited)
performance QueuePool workers (web), concurrency (celery), 1 (flower), 10 (MCP) -1
aggressive QueuePool workers × threads (web), concurrency (celery), 1 (flower), 20 (MCP) -1

CeleryBeat and lifecycle tasks always use NullPool regardless of preset (singleton/short-lived components with minimal DB interaction). CeleryFlower uses standard pool sizing (defaults to 1 for performance/aggressive since it has no worker configuration).

spec.sqlaEngineOptions sets the baseline for all Python components. Per-component sqlaEngineOptions on webServer, celeryWorker, celeryBeat, mcpServer, or lifecycle replaces the top-level entirely (override semantics, not merge).

spec:
  sqlaEngineOptions:
    preset: balanced             # applies to all components
    poolRecycle: 1800            # override static default (3600)
  webServer:
    gunicorn:
      preset: performance        # 4 workers, 8 threads
    sqlaEngineOptions:
      preset: performance        # pool_size=4 (gunicorn workers)
  celeryWorker:
    celery:
      concurrency: 12
    sqlaEngineOptions:
      preset: aggressive         # pool_size=12 (celery concurrency)
  celeryBeat: {}                 # always NullPool

Static defaults (same regardless of preset, overridable per-field): poolRecycle (3600), poolPrePing (false), poolTimeout (omitted — SQLAlchemy default 30s).

Individual field overrides take precedence over the preset computation:

spec:
  sqlaEngineOptions:
    preset: balanced
    poolSize: 10                 # explicit: overrides preset calculation
    poolPrePing: true            # explicit: overrides static default

Websocket Server

Experimental

The websocket server is experimental and pending security hardening. It is not yet well supported and may exhibit gaps, either in the operator (e.g. unvalidated path-based gateway/ingress routing) or upstream in the Node.js websocket image. It requires a custom Node.js image (below). Treat its spec and behavior as subject to change, and avoid enabling it in production until it is hardened. See the security reference for details.

Enable Superset's async event streaming by setting websocketServer. This deploys a Node.js application (not Python) that pushes real-time updates to dashboards via WebSocket connections.

Requires a dedicated image

The websocket server is a separate Node.js application and does not run from the default Superset image. You must provide an image that contains websocket_server.js — the CRD enforces this with a CEL rule that rejects websocketServer set without an image.repository override. A community-maintained image is available at oneacrefund/superset-websocket (experimental, not officially supported by Apache Superset).

spec:
  websocketServer:
    image:
      repository: oneacrefund/superset-websocket
      tag: "latest"

Because the websocket server is Node.js-based, it does not receive a superset_config.py, and sqlaEngineOptions is not available on this component. Configuration can be provided with environment variables, inline Development-only config, or a Secret-backed configFrom.

spec:
  websocketServer:
    image:
      repository: oneacrefund/superset-websocket
      tag: "latest"
    podTemplate:
      container:
        env:
          - name: SUPERSET_WEBSERVER_URL
            value: "http://my-superset-web-server:8088"

Inline config renders config.json and mounts it at /home/superset-websocket/config.json. It is allowed only in Development mode because websocket config commonly contains jwtSecret or Redis credentials:

spec:
  environment: Development
  websocketServer:
    image:
      repository: oneacrefund/superset-websocket
      tag: "latest"
    config:
      port: 8080
      logLevel: debug
      jwtSecret: CHANGE-ME
      jwtCookieName: async-token
      redis:
        host: redis.default.svc
        port: 6379
        db: 0

In Staging and Production, store config.json in a Secret and reference it:

spec:
  websocketServer:
    image:
      repository: oneacrefund/superset-websocket
      tag: "latest"
    configFrom:
      name: superset-websocket-config
      key: config.json

The operator mounts the Secret key without reading or copying the Secret. If the Secret content changes, update spec.forceReload to roll websocket pods.

The websocket server creates a Service (default port 8080) and supports the same scaling, deployment template, and pod template fields as other scalable components.

MCP Server

Enable the Model Context Protocol server by setting mcpServer. This deploys a Python-based FastMCP server that exposes Superset's API via MCP, allowing AI assistants and LLM-based tools to interact with Superset:

spec:
  mcpServer: {}

The MCP server receives a superset_config.py with core config (SECRET_KEY, structured DB URI if applicable) and top-level/per-component config — but not web server port. It runs as a separate Deployment with its own Service (port 8088). The MCP server supports per-component sqlaEngineOptions with higher default pool sizes than other components (5 for balanced, 10 for performance, 20 for aggressive) to accommodate concurrent tool invocations.

Health Probes

spec:
  webServer:
    podTemplate:
      container:
        livenessProbe:
          httpGet:
            path: /health
            port: 8088
          initialDelaySeconds: 15
          periodSeconds: 15
        readinessProbe:
          httpGet:
            path: /health
            port: 8088
          initialDelaySeconds: 15
          periodSeconds: 15

Security Context

Security context can be set at the top level (shared by all components) or overridden per component:

spec:
  podTemplate:
    podSecurityContext:
      runAsUser: 1000
      runAsNonRoot: true
      fsGroup: 1000
    container:
      securityContext:
        readOnlyRootFilesystem: true
        allowPrivilegeEscalation: false

Or override per component (replaces the top-level value):

spec:
  webServer:
    podTemplate:
      podSecurityContext:
        runAsUser: 1000
        runAsNonRoot: true

Autoscaling (HPA)

spec:
  webServer:
    autoscaling:
      minReplicas: 2
      maxReplicas: 10
      metrics:
        - type: Resource
          resource:
            name: cpu
            target:
              type: Utilization
              averageUtilization: 75

When HPA is configured, the replicas field is ignored (HPA manages scaling).

Pod Disruption Budget

spec:
  webServer:
    podDisruptionBudget:
      minAvailable: 1

Component Enable/Disable

All components follow the same rule: presence = enabled, absence = disabled. Set a component's spec to enable it (use {} for defaults), omit it or set to null to disable:

spec:
  webServer:
    replicas: 2          # enabled with 2 replicas
  celeryWorker:
    replicas: 3          # enabled with 3 replicas
  celeryBeat: {}         # enabled with defaults
  celeryFlower: {}       # enabled with defaults
  websocketServer: null  # disabled (or omit entirely)
  mcpServer: {}          # enabled with defaults

A Superset CR with no components enabled is valid — the operator will run initialization (if not disabled) but deploy no workloads. The parent status will report Phase: Running with condition reason NoComponentsEnabled.

Resource Names

Component resources are named {parentName}-{componentType}. For example, a parent named my-superset creates resources such as my-superset-web-server, my-superset-celery-worker, and my-superset-mcp-server. ConfigMaps add the -config suffix, for example my-superset-web-server-config. Lifecycle task Jobs use deterministic names based on {parentName}-{taskName}, such as my-superset-migrate.

The parent name must be a valid DNS label: lowercase alphanumeric and hyphens only (^[a-z0-9]([-a-z0-9]*[a-z0-9])?$), at most 63 characters. Since sub-resource names append a component suffix, the parent name is further constrained by the longest enabled component's suffix. The longest suffix is -websocket-server (17 characters), so parent names must be at most 46 characters when websocket-server is enabled. CRD validation enforces the appropriate limit for each enabled component.

Deployment Template

The deploymentTemplate and podTemplate fields configure the Kubernetes Deployment and Pod for each component. They mirror the Kubernetes hierarchy as siblings:

deploymentTemplate                  → DeploymentSpec-level
podTemplate                         → PodSpec-level
└── container                       → main container

deploymentTemplate carries Deployment-level fields: strategy, revisionHistoryLimit, minReadySeconds, progressDeadlineSeconds, and labels/annotations for the Deployment object's own metadata. Deployment labels/annotations are merged by key (component wins over top-level); operator-managed labels are applied last and cannot be overridden. They land on the Deployment metadata only, so changing them does not roll the pods — for pod metadata use podTemplate.labels/annotations.

Three usage patterns

1. Omit entirely — use operator defaults (most users start here):

spec:
  image:
    tag: "6.1.0"
  webServer:
    replicas: 2

2. Set top-level defaults — apply to all components:

spec:
  deploymentTemplate:
    revisionHistoryLimit: 3
  podTemplate:
    terminationGracePeriodSeconds: 60
    nodeSelector:
      workload: superset
    container:
      resources:
        limits:
          cpu: "2"
          memory: "4Gi"
      env:
        - name: LOG_LEVEL
          value: INFO

  webServer:
    replicas: 2
  celeryWorker:
    replicas: 4

All components inherit the deployment template, pod template (node selector, termination grace period), and container template (resources, env vars).

3. Per-component customization — field-level merge with top-level:

spec:
  deploymentTemplate:
    revisionHistoryLimit: 3
  podTemplate:
    container:
      resources:
        limits:
          cpu: "2"
          memory: "4Gi"
      env:
        - name: LOG_LEVEL
          value: INFO

  webServer:
    replicas: 2
    deploymentTemplate:
      strategy:
        type: RollingUpdate
        rollingUpdate:
          maxSurge: 1
          maxUnavailable: 0
    podTemplate:
      container:
        resources:
          limits:
            cpu: "4"             # replaces entire resources struct
            memory: "8Gi"
        command: ["gunicorn"]
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 15"]

  celeryWorker:
    replicas: 8
    podTemplate:
      container:
        env:
          - name: CELERY_CONCURRENCY
            value: "8"           # merged with top-level LOG_LEVEL env var

Merge semantics

Per-component deploymentTemplate and podTemplate are each field-level merged independently with the top-level — you only specify what's different.

Behavior Fields
Component wins if set resources (both pod-level and container-level), affinity, securityContext, podSecurityContext, priorityClassName, strategy, revisionHistoryLimit, probes, lifecycle, dnsPolicy, dnsConfig, runtimeClassName, shareProcessNamespace, enableServiceLinks, terminationGracePeriodSeconds, minReadySeconds, progressDeadlineSeconds
Merge by name env, volumes, volumeMounts, sidecars, initContainers
Merge by key annotations, labels, nodeSelector, hostAliases (by IP)
Append tolerations, topologySpreadConstraints, envFrom
No inheritance command, args (component-only, not inherited from top-level)

Note on append fields: tolerations, topologySpreadConstraints, and envFrom are concatenated (top-level first, then component-level) without deduplication. To avoid duplicates in the final pod spec, define each entry at one level only — typically top-level for shared entries and component-level for component-specific ones.

Available fields

Deployment level (deploymentTemplate.*):

Field Description
revisionHistoryLimit Old ReplicaSets to retain for rollback
minReadySeconds Seconds before a pod is considered available
progressDeadlineSeconds Seconds before a rollout is considered failed
strategy Update strategy (RollingUpdate or Recreate)

Pod level (podTemplate.*):

Field Description
annotations Pod annotations (merged with operator-managed annotations)
labels Pod labels (merged; operator labels cannot be overridden)
affinity Pod/node affinity and anti-affinity
tolerations Node tolerations (appended)
nodeSelector Node label selector (merged by key)
topologySpreadConstraints Topology spread constraints (appended)
hostAliases /etc/hosts entries (merged by IP)
podSecurityContext Pod-level security context
priorityClassName Pod priority class
volumes Volumes (merged by name with operator-injected volumes)
sidecars Sidecar containers (merged by name)
initContainers Init containers (merged by name)
terminationGracePeriodSeconds Grace period for pod shutdown
dnsPolicy DNS resolution policy
dnsConfig Custom DNS configuration
runtimeClassName RuntimeClass (e.g., gVisor, Kata)
shareProcessNamespace Share PID namespace between containers
enableServiceLinks Inject service environment variables
resources Pod-level resource requirements (Kubernetes 1.34+, requires PodLevelResources feature gate)

Container level (podTemplate.container.*):

Field Description
resources CPU/memory requests and limits
env Environment variables (merged by name)
envFrom ConfigMap/Secret env sources (appended)
volumeMounts Volume mounts (merged by name)
ports Container ports (replaces operator defaults when set; the first resolved port is used as the Service targetPort, the ingress port for the operator-managed NetworkPolicy, and as the target port for any default probes the user did not override)
securityContext Container-level security context
command Container entrypoint (no inheritance)
args Container arguments (no inheritance)
livenessProbe Liveness probe
readinessProbe Readiness probe
startupProbe Startup probe
lifecycle preStop/postStart lifecycle hooks

Pod-level resources (Kubernetes 1.34+): When podTemplate.resources is set, it defines the total resource budget for the entire pod, enabling resource sharing among containers (main + sidecars). Container-level podTemplate.container.resources remains available for per-container limits.

spec:
  podTemplate:
    resources:
      requests:
        cpu: "4"
        memory: "8Gi"
      limits:
        cpu: "8"
        memory: "16Gi"
    container:
      resources:
        requests:
          cpu: "2"
          memory: "4Gi"

Force Reload

Trigger a rolling restart of all components:

spec:
  forceReload: "2026-03-14T12:00:00Z"

Change the value to any new string to trigger a restart. This is primarily useful for secret rotation: when you update a Kubernetes Secret's data, pods don't automatically restart because the operator references secrets via valueFrom.secretKeyRef (resolved at pod creation time). Changing forceReload forces new pods that pick up the updated secret values.

Use the same mechanism when you know mounted extension contents have changed.

Suspend Reconciliation

Temporarily pause reconciliation without deleting resources:

spec:
  suspend: true

When suspended, the operator stops all reconciliation — no lifecycle task Jobs run, no component resources are created or updated, and no resources are deleted. Set suspend: false (or remove the field) to resume.

Connecting PostgreSQL and Valkey

The operator does not manage database or cache infrastructure. Use one of these approaches:

Managed Services

Set connection details via secretKeyFrom, metastore.uriFrom, and valkey:

spec:
  secretKeyFrom:
    name: superset-secrets
    key: secret-key
  metastore:
    uriFrom:
      name: superset-db
      key: uri
  valkey:
    host: valkey.default.svc
    passwordFrom:
      name: valkey-credentials
      key: password

CloudNativePG

Use CloudNativePG for PostgreSQL:

# CloudNativePG Cluster (separate CR)
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: superset-pg
spec:
  instances: 3
  storage:
    size: 10Gi

Then reference the connection secret via metastore.uriFrom on your Superset CR.

Redis Operator

Use the Redis Operator or Bitnami Redis Helm chart for Redis or Valkey, and configure the connection via valkey.