Azure Kubernetes Service (AKS) and Apache Superset: An Overview

What is Azure Kubernetes Service (AKS)?

Azure Kubernetes Service (AKS) is a managed container orchestration service provided by Microsoft Azure. AKS simplifies the deployment, management, and operations of Kubernetes, an open-source platform designed to automate the deployment, scaling, and management of containerized applications. Key Features of AKS:

  • Managed Kubernetes: Azure handles the Kubernetes control plane, including upgrades, patching, and scaling, freeing you from the need to manage the underlying infrastructure.

  • Integrated DevOps: AKS integrates seamlessly with Azure DevOps, providing continuous integration and continuous deployment (CI/CD) capabilities.

  • Scalability and High Availability: AKS enables you to scale your applications seamlessly and maintain high availability with automatic failover and load balancing.

  • Security and Compliance: AKS integrates with Azure Active Directory (AAD) for role-based access control (RBAC) and provides options for network security, data encryption, and compliance with industry standards.

  • Cost-Effectiveness: AKS optimizes resource usage and reduces costs by allowing you to scale your Kubernetes clusters based on demand.

What is Apache Superset?

Apache Superset is an open-source data exploration and visualization platform. It allows users to create and share interactive dashboards, slice and dice data, and build complex visualizations without needing extensive programming skills.

Key Features of Apache Superset:

  • Data Connectivity: Superset connects to a wide range of databases, including relational databases (e.g., MySQL, PostgreSQL) and big data platforms (e.g., Apache Druid, Presto).

  • Interactive Dashboards: Users can create, customize, and share dashboards that are both visually appealing and data-rich.

  • SQL Editor: Superset includes a SQL editor with syntax highlighting, autocomplete, and query history, enabling advanced users to perform in-depth data exploration.

  • Security and Authentication: Superset supports granular access control, allowing you to manage user roles and permissions effectively.

  • Extensibility: The platform is extensible via plugins, allowing developers to add new visualizations and data sources.

How AKS and Apache Superset Work Together

Deploying Apache Superset on Azure Kubernetes Service (AKS) allows you to combine the benefits of a modern BI tool with the robust infrastructure and management capabilities of Azure’s Kubernetes offering.

Why Use AKS for Superset?

  • Scalability: Running Superset on AKS allows you to scale your BI environment according to your needs. Whether you have a few users or thousands, AKS can handle the load.

  • High Availability: By deploying Superset on a managed Kubernetes service like AKS, you ensure that your BI platform remains available, even during maintenance or unexpected failures.

  • Security: AKS offers a secure environment for your Superset deployment, with integrated Azure security services like Azure Active Directory (AAD) and Azure Key Vault.

  • Operational Efficiency: AKS automates many of the operational aspects of running Kubernetes, allowing your team to focus on building dashboards and visualizing data rather than managing infrastructure.

Use Cases for Superset on AKS:

  • Enterprise BI Solutions: Large organizations can use Superset on AKS to provide a scalable, secure, and reliable BI platform that integrates with their existing data infrastructure.

  • Cloud-Native Analytics: Companies adopting cloud-native architectures can leverage AKS to run Superset alongside other microservices, ensuring smooth data operations across their cloud environment.

  • Data-Driven Applications: Superset on AKS can be used to embed data visualizations into applications, providing users with real-time insights directly within their workflows.

Conclusion

Using Azure Kubernetes Service (AKS) to deploy and manage Apache Superset provides a powerful combination of scalability, security, and ease of management. Organizations can leverage the capabilities of AKS to ensure their Superset deployment is robust, responsive, and aligned with best practices in cloud-native architecture.

Kubernetes Configuration for Apache Superset Deployment on AKS

The deployment of Apache Superset on Azure Kubernetes Service (AKS) utilizes Helm, a package manager for Kubernetes that simplifies the management of Kubernetes applications. This configuration ensures that the Superset deployment is scalable, secure, and manageable within the AKS environment. Key Components of the Configuration

AKS Cluster

The Kubernetes cluster is provisioned on Azure, using Azure Kubernetes Service (AKS). AKS manages the Kubernetes control plane, while the worker nodes (agent nodes) are managed within your Azure subscription.

Node Pools

The AKS cluster can have multiple node pools, allowing for the separation of workloads and scaling based on demand. Each node pool can be configured with different VM sizes and scaling policies.

Namespace

The deployment of Superset is isolated within a specific Kubernetes namespace, ensuring that the resources are separated from other applications and services running in the cluster. This provides a logical partition for managing Superset’s resources, such as Pods, Services, and ConfigMaps.

Helm Chart Configuration

Helm

Helm is used to deploy Superset on the Kubernetes cluster. Helm charts are pre-configured Kubernetes resources that define the deployment, service, ingress, and other configurations for Superset.

Values.yaml

The values.yaml file is customized to specify the environment-specific configurations, such as resource limits, replica counts, database connections, and more. This allows for consistent and repeatable deployments across different environments (e.g., development, staging, production).

Superset Configuration

Replicas

The number of replicas for the Superset service can be specified to ensure high availability. Multiple instances of Superset can run simultaneously, managed by Kubernetes, to handle load and provide redundancy.

Environment Variables

Superset’s configuration parameters, such as database connection strings, authentication settings, and feature flags, are provided through environment variables. These variables can be securely stored and managed using Kubernetes secrets.

Database Configuration

Persistent Storage

Superset requires a backend database to store metadata, dashboards, and other configurations. This can be deployed as a managed database service (e.g., Azure Database for PostgreSQL) or as a StatefulSet within Kubernetes, with persistent volume claims ensuring data persistence.

Connection Strings

The connection string for the Superset database is specified within the Helm chart’s values.yaml or through Kubernetes secrets. This connection allows Superset to interact with its database securely.

Ingress Controller

Ingress Resource

An ingress resource is configured to manage external access to the Superset service. This typically includes hostname-based routing, TLS termination, and path-based routing.

Azure Application Gateway (Optional)

If using Azure’s Application Gateway as the ingress controller, it can be integrated with the AKS cluster to provide advanced routing, SSL offloading, and Web Application Firewall (WAF) features.

TLS/SSL Configuration

Certificates

TLS certificates are used to secure communications between clients and the Superset service. Certificates can be provisioned and managed using Azure Key Vault, with automatic rotation and integration into the Kubernetes secrets.

Secret Management

Kubernetes secrets are used to store sensitive information, such as TLS certificates, database credentials, and API keys. These secrets are injected into the Superset containers as environment variables or mounted as files.

Scaling and Autoscaling

Horizontal Pod Autoscaler (HPA)

HPA can be configured to automatically scale the number of Superset pods based on CPU or memory usage. This ensures that the service can handle increased load while optimizing resource usage.

Cluster Autoscaler

AKS cluster autoscaler can be enabled to automatically adjust the number of nodes in the cluster based on the demands of the workloads, providing additional capacity as needed.

Monitoring and Logging

Azure Monitor

Azure Monitor and Azure Log Analytics can be integrated with the AKS cluster to collect and analyze logs, metrics, and traces from the Superset deployment. This provides insights into application performance, resource utilization, and potential issues.

Prometheus/Grafana (Optional)

If using a custom monitoring stack, Prometheus and Grafana can be deployed alongside Superset to provide real-time monitoring and alerting.

Security and Compliance

Role-Based Access Control (RBAC)

AKS integrates with Azure Active Directory (AAD) to provide RBAC for Kubernetes resources. This ensures that only authorized users can manage and interact with the Superset deployment.

Network Policies

Network policies can be implemented to control the flow of traffic between the Superset service and other services within the cluster. This enhances security by limiting exposure to only necessary components.

Azure Policy

Azure Policy can enforce compliance with organizational policies and industry standards across the AKS cluster.

Conclusion

This configuration for deploying Apache Superset on Azure Kubernetes Service (AKS) provides a scalable, secure, and manageable solution for running business intelligence (BI) applications in a cloud-native environment. By leveraging Kubernetes and Helm, the deployment is flexible and can be tailored to meet specific needs, from development to production environments. The integration with Azure services further enhances the deployment, ensuring that it aligns with best practices in cloud infrastructure and security.

Requirements

Repository Name Version
https://charts.bitnami.com/bitnami postgresql 12.1.6
https://charts.bitnami.com/bitnami redis 17.9.4

Values

Key Type Default Description
affinity object {}  
bootstrapScript string see values.yaml Install additional packages and do any other bootstrap configuration in this script For production clusters it’s recommended to build own image with this step done in CI
configFromSecret string "template \"superset.fullname\" . -config" The name of the secret which we will use to generate a superset_config.py file Note: this secret must have the key superset_config.py in it and can include other files as well
configMountPath string "/app/pythonpath"  
configOverrides object {} A dictionary of overrides to append at the end of superset_config.py - the name does not matter WARNING: the order is not guaranteed Files can be passed as helm –set-file configOverrides.my-override=my-file.py
configOverridesFiles object {} Same as above but the values are files
envFromSecret string "template \"superset.fullname\" . -env" The name of the secret which we will use to populate env vars in deployed pods This can be useful for secret keys, etc.
envFromSecrets list [] This can be a list of templated strings
extraConfigMountPath string "/app/configs"  
extraConfigs object {} Extra files to mount on /app/pythonpath
extraEnv object {} Extra environment variables that will be passed into pods
extraEnvRaw list [] Extra environment variables in RAW format that will be passed into pods
extraSecretEnv object {} Extra environment variables to pass as secrets
extraSecrets object {} Extra files to mount on /app/pythonpath as secrets
extraVolumeMounts list []  
extraVolumes list []  
fullnameOverride string nil Provide a name to override the full names of resources
hostAliases list [] Custom hostAliases for all superset pods # https://kubernetes.io/docs/tasks/network/customize-hosts-file-for-pods/
image.pullPolicy string "IfNotPresent"  
image.repository string "apachesuperset.docker.scarf.sh/apache/superset"  
image.tag string ""  
imagePullSecrets list []  
ingress.annotations object {}  
ingress.enabled bool false  
ingress.extraHostsRaw list []  
ingress.hosts[0] string "chart-example.local"  
ingress.ingressClassName string nil  
ingress.path string "/"  
ingress.pathType string "ImplementationSpecific"  
ingress.tls list []  
init.adminUser.email string "admin@superset.com"  
init.adminUser.firstname string "Superset"  
init.adminUser.lastname string "Admin"  
init.adminUser.password string "admin"  
init.adminUser.username string "admin"  
init.affinity object {}  
init.command list a superset_init.sh command Command
init.containerSecurityContext object {}  
init.createAdmin bool true  
init.enabled bool true  
init.initContainers list a container waiting for postgres List of initContainers
init.initscript string a script to create admin user and initailize roles A Superset init script
init.jobAnnotations.”helm.sh/hook” string "post-install,post-upgrade"  
init.jobAnnotations.”helm.sh/hook-delete-policy” string "before-hook-creation"  
init.loadExamples bool false  
init.podAnnotations object {}  
init.podSecurityContext object {}  
init.resources object {}  
init.tolerations list []  
init.topologySpreadConstraints list [] TopologySpreadConstrains to be added to init job
initImage.pullPolicy string "IfNotPresent"  
initImage.repository string "apache/superset"  
initImage.tag string "dockerize"  
nameOverride string nil Provide a name to override the name of the chart
nodeSelector object {}  
postgresql object see values.yaml Configuration values for the postgresql dependency. ref: https://github.com/bitnami/charts/tree/main/bitnami/postgresql
redis object see values.yaml Configuration values for the Redis dependency. ref: https://github.com/bitnami/charts/blob/master/bitnami/redis More documentation can be found here: https://artifacthub.io/packages/helm/bitnami/redis
resources object {}  
runAsUser int 0 User ID directive. This user must have enough permissions to run the bootstrap script Running containers as root is not recommended in production. Change this to another UID - e.g. 1000 to be more secure
service.annotations object {}  
service.loadBalancerIP string nil  
service.nodePort.http int "nil"  
service.port int 8088  
service.type string "ClusterIP"  
serviceAccount.annotations object {}  
serviceAccount.create bool false Create custom service account for Superset. If create: true and serviceAccountName is not provided, superset.fullname will be used.
serviceAccountName string nil Specify service account name to be used
supersetCeleryBeat.affinity object {} Affinity to be added to supersetCeleryBeat deployment
supersetCeleryBeat.command list a celery beat command Command
supersetCeleryBeat.containerSecurityContext object {}  
supersetCeleryBeat.deploymentAnnotations object {} Annotations to be added to supersetCeleryBeat deployment
supersetCeleryBeat.enabled bool false This is only required if you intend to use alerts and reports
supersetCeleryBeat.forceReload bool false If true, forces deployment to reload on each upgrade
supersetCeleryBeat.initContainers list a container waiting for postgres List of init containers
supersetCeleryBeat.podAnnotations object {} Annotations to be added to supersetCeleryBeat pods
supersetCeleryBeat.podLabels object {} Labels to be added to supersetCeleryBeat pods
supersetCeleryBeat.podSecurityContext object {}  
supersetCeleryBeat.resources object {} Resource settings for the CeleryBeat pods - these settings overwrite might existing values from the global resources object defined above.
supersetCeleryBeat.topologySpreadConstraints list [] TopologySpreadConstrains to be added to supersetCeleryBeat deployments
supersetCeleryFlower.affinity object {} Affinity to be added to supersetCeleryFlower deployment
supersetCeleryFlower.command list a celery flower command Command
supersetCeleryFlower.containerSecurityContext object {}  
supersetCeleryFlower.deploymentAnnotations object {} Annotations to be added to supersetCeleryFlower deployment
supersetCeleryFlower.enabled bool false Enables a Celery flower deployment (management UI to monitor celery jobs) WARNING: on superset 1.x, this requires a Superset image that has flower<1.0.0 installed (which is NOT the case of the default images) flower>=1.0.0 requires Celery 5+ which Superset 1.5 does not support
supersetCeleryFlower.initContainers list a container waiting for postgres and redis List of init containers
supersetCeleryFlower.livenessProbe.failureThreshold int 3  
supersetCeleryFlower.livenessProbe.httpGet.path string "/api/workers"  
supersetCeleryFlower.livenessProbe.httpGet.port string "flower"  
supersetCeleryFlower.livenessProbe.initialDelaySeconds int 5  
supersetCeleryFlower.livenessProbe.periodSeconds int 5  
supersetCeleryFlower.livenessProbe.successThreshold int 1  
supersetCeleryFlower.livenessProbe.timeoutSeconds int 1  
supersetCeleryFlower.podAnnotations object {} Annotations to be added to supersetCeleryFlower pods
supersetCeleryFlower.podLabels object {} Labels to be added to supersetCeleryFlower pods
supersetCeleryFlower.podSecurityContext object {}  
supersetCeleryFlower.readinessProbe.failureThreshold int 3  
supersetCeleryFlower.readinessProbe.httpGet.path string "/api/workers"  
supersetCeleryFlower.readinessProbe.httpGet.port string "flower"  
supersetCeleryFlower.readinessProbe.initialDelaySeconds int 5  
supersetCeleryFlower.readinessProbe.periodSeconds int 5  
supersetCeleryFlower.readinessProbe.successThreshold int 1  
supersetCeleryFlower.readinessProbe.timeoutSeconds int 1  
supersetCeleryFlower.replicaCount int 1  
supersetCeleryFlower.resources object {} Resource settings for the CeleryBeat pods - these settings overwrite might existing values from the global resources object defined above.
supersetCeleryFlower.service.annotations object {}  
supersetCeleryFlower.service.loadBalancerIP string nil  
supersetCeleryFlower.service.nodePort.http int "nil"  
supersetCeleryFlower.service.port int 5555  
supersetCeleryFlower.service.type string "ClusterIP"  
supersetCeleryFlower.startupProbe.failureThreshold int 60  
supersetCeleryFlower.startupProbe.httpGet.path string "/api/workers"  
supersetCeleryFlower.startupProbe.httpGet.port string "flower"  
supersetCeleryFlower.startupProbe.initialDelaySeconds int 5  
supersetCeleryFlower.startupProbe.periodSeconds int 5  
supersetCeleryFlower.startupProbe.successThreshold int 1  
supersetCeleryFlower.startupProbe.timeoutSeconds int 1  
supersetCeleryFlower.topologySpreadConstraints list [] TopologySpreadConstrains to be added to supersetCeleryFlower deployments
supersetNode.affinity object {} Affinity to be added to supersetNode deployment
supersetNode.command list See values.yaml Startup command
supersetNode.connections.db_host string ".Release.Name -postgresql"  
supersetNode.connections.db_name string "superset"  
supersetNode.connections.db_pass string "superset"  
supersetNode.connections.db_port string "5432"  
supersetNode.connections.db_user string "superset"  
supersetNode.connections.redis_host string ".Release.Name -redis-headless" Change in case of bringing your own redis and then also set redis.enabled:false
supersetNode.connections.redis_port string "6379"  
supersetNode.containerSecurityContext object {}  
supersetNode.deploymentAnnotations object {} Annotations to be added to supersetNode deployment
supersetNode.deploymentLabels object {} Labels to be added to supersetNode deployment
supersetNode.env object {}  
supersetNode.extraContainers list [] Launch additional containers into supersetNode pod
supersetNode.forceReload bool false If true, forces deployment to reload on each upgrade
supersetNode.initContainers list a container waiting for postgres Init containers
supersetNode.livenessProbe.failureThreshold int 3  
supersetNode.livenessProbe.httpGet.path string "/health"  
supersetNode.livenessProbe.httpGet.port string "http"  
supersetNode.livenessProbe.initialDelaySeconds int 15  
supersetNode.livenessProbe.periodSeconds int 15  
supersetNode.livenessProbe.successThreshold int 1  
supersetNode.livenessProbe.timeoutSeconds int 1  
supersetNode.podAnnotations object {} Annotations to be added to supersetNode pods
supersetNode.podLabels object {} Labels to be added to supersetNode pods
supersetNode.podSecurityContext object {}  
supersetNode.readinessProbe.failureThreshold int 3  
supersetNode.readinessProbe.httpGet.path string "/health"  
supersetNode.readinessProbe.httpGet.port string "http"  
supersetNode.readinessProbe.initialDelaySeconds int 15  
supersetNode.readinessProbe.periodSeconds int 15  
supersetNode.readinessProbe.successThreshold int 1  
supersetNode.readinessProbe.timeoutSeconds int 1  
supersetNode.replicaCount int 1  
supersetNode.resources object {} Resource settings for the supersetNode pods - these settings overwrite might existing values from the global resources object defined above.
supersetNode.startupProbe.failureThreshold int 60  
supersetNode.startupProbe.httpGet.path string "/health"  
supersetNode.startupProbe.httpGet.port string "http"  
supersetNode.startupProbe.initialDelaySeconds int 15  
supersetNode.startupProbe.periodSeconds int 5  
supersetNode.startupProbe.successThreshold int 1  
supersetNode.startupProbe.timeoutSeconds int 1  
supersetNode.strategy object {}  
supersetNode.topologySpreadConstraints list [] TopologySpreadConstrains to be added to supersetNode deployments
supersetWebsockets.affinity object {} Affinity to be added to supersetWebsockets deployment
supersetWebsockets.command list []  
supersetWebsockets.config object see values.yaml The config.json to pass to the server, see https://github.com/apache/superset/tree/master/superset-websocket Note that the configuration can also read from environment variables (which will have priority), see https://github.com/apache/superset/blob/master/superset-websocket/src/config.ts for a list of supported variables
supersetWebsockets.containerSecurityContext object {}  
supersetWebsockets.deploymentAnnotations object {}  
supersetWebsockets.enabled bool false This is only required if you intend to use GLOBAL_ASYNC_QUERIES in ws mode see https://github.com/apache/superset/blob/master/CONTRIBUTING.md#async-chart-queries
supersetWebsockets.image.pullPolicy string "IfNotPresent"  
supersetWebsockets.image.repository string "oneacrefund/superset-websocket" There is no official image (yet), this one is community-supported
supersetWebsockets.image.tag string "latest"  
supersetWebsockets.ingress.path string "/ws"  
supersetWebsockets.ingress.pathType string "Prefix"  
supersetWebsockets.livenessProbe.failureThreshold int 3  
supersetWebsockets.livenessProbe.httpGet.path string "/health"  
supersetWebsockets.livenessProbe.httpGet.port string "ws"  
supersetWebsockets.livenessProbe.initialDelaySeconds int 5  
supersetWebsockets.livenessProbe.periodSeconds int 5  
supersetWebsockets.livenessProbe.successThreshold int 1  
supersetWebsockets.livenessProbe.timeoutSeconds int 1  
supersetWebsockets.podAnnotations object {}  
supersetWebsockets.podLabels object {}  
supersetWebsockets.podSecurityContext object {}  
supersetWebsockets.readinessProbe.failureThreshold int 3  
supersetWebsockets.readinessProbe.httpGet.path string "/health"  
supersetWebsockets.readinessProbe.httpGet.port string "ws"  
supersetWebsockets.readinessProbe.initialDelaySeconds int 5  
supersetWebsockets.readinessProbe.periodSeconds int 5  
supersetWebsockets.readinessProbe.successThreshold int 1  
supersetWebsockets.readinessProbe.timeoutSeconds int 1  
supersetWebsockets.replicaCount int 1  
supersetWebsockets.resources object {}  
supersetWebsockets.service.annotations object {}  
supersetWebsockets.service.loadBalancerIP string nil  
supersetWebsockets.service.nodePort.http int "nil"  
supersetWebsockets.service.port int 8080  
supersetWebsockets.service.type string "ClusterIP"  
supersetWebsockets.startupProbe.failureThreshold int 60  
supersetWebsockets.startupProbe.httpGet.path string "/health"  
supersetWebsockets.startupProbe.httpGet.port string "ws"  
supersetWebsockets.startupProbe.initialDelaySeconds int 5  
supersetWebsockets.startupProbe.periodSeconds int 5  
supersetWebsockets.startupProbe.successThreshold int 1  
supersetWebsockets.startupProbe.timeoutSeconds int 1  
supersetWebsockets.strategy object {}  
supersetWebsockets.topologySpreadConstraints list [] TopologySpreadConstrains to be added to supersetWebsockets deployments
supersetWorker.affinity object {} Affinity to be added to supersetWorker deployment
supersetWorker.command list a celery worker command Worker startup command
supersetWorker.containerSecurityContext object {}  
supersetWorker.deploymentAnnotations object {} Annotations to be added to supersetWorker deployment
supersetWorker.deploymentLabels object {} Labels to be added to supersetWorker deployment
supersetWorker.extraContainers list [] Launch additional containers into supersetWorker pod
supersetWorker.forceReload bool false If true, forces deployment to reload on each upgrade
supersetWorker.initContainers list a container waiting for postgres and redis Init container
supersetWorker.livenessProbe.exec.command list a celery inspect ping command Liveness probe command
supersetWorker.livenessProbe.failureThreshold int 3  
supersetWorker.livenessProbe.initialDelaySeconds int 120  
supersetWorker.livenessProbe.periodSeconds int 60  
supersetWorker.livenessProbe.successThreshold int 1  
supersetWorker.livenessProbe.timeoutSeconds int 60  
supersetWorker.podAnnotations object {} Annotations to be added to supersetWorker pods
supersetWorker.podLabels object {} Labels to be added to supersetWorker pods
supersetWorker.podSecurityContext object {}  
supersetWorker.readinessProbe object {} No startup/readiness probes by default since we don’t really care about its startup time (it doesn’t serve traffic)
supersetWorker.replicaCount int 1  
supersetWorker.resources object {} Resource settings for the supersetWorker pods - these settings overwrite might existing values from the global resources object defined above.
supersetWorker.startupProbe object {} No startup/readiness probes by default since we don’t really care about its startup time (it doesn’t serve traffic)
supersetWorker.strategy object {}  
supersetWorker.topologySpreadConstraints list [] TopologySpreadConstrains to be added to supersetWorker deployments
tolerations list []  
topologySpreadConstraints list [] TopologySpreadConstrains to be added to all deployments