Cassandra schema manager on Kubernetes

Reddeppa S
2 min readDec 18, 2021

Recently I have deployed Cassandra on Kubernetes AKS cluster by following the steps from https://docs.k8ssandra.io/install/aks/ . K8ssandra provides a production-ready Apache Cassandra cluster on Kubernetes Cluster. All required tools like reaper for repairs, medusa for backup, and monitoring of the Cassandra are integrated as part of the cluster setup.

k8ssandra also provides good documentation on upgrading the Apache Cassandra from 3.11 to 4.0 and above. Since Helm Charts will be taken care of all your Deployments, scaling up the cluster, upgrading the Apache Cassandra is straightforward.

There are a couple of things that still require a manual trigger. These are backups and schema upgrades. I have developed a python script and a helm chart for managing the Cassandra Schema. The script runs as Cronjob and parses the given config maps. These config maps should follow a certain format given below.

test_schema.json: |-
{
"keyspacename": "test",
"schemaversions": {
"1": [
"CREATE TABLE IF NOT EXISTS test.schema_migration (applied_successful boolean, version int, script_name varchar, script text, executed_at timestamp, PRIMARY KEY (applied_successful, version))"
],
"2": [
"CREATE TABLE IF NOT EXISTS test.schema_migration_leader (keyspace_name text, leader uuid, took_lead_at timestamp, leader_hostname text, PRIMARY KEY (keyspace_name))"
]
}
}

Helm Chart and script is available in the GitHub repo

The first time when the job executes it creates a table keyspace_versions in schema manager keyspace. and It parses the configmaps and executes the CQL queries. For further updates, the versions in the keyspace_versions table are updated by script. If there is a new version is available in the config map cronjob will run the script and upgrade the schema to the latest version

The script requires the below environment variables to be set in the cronjob:

CONTACT_POINTS

A list of node hostnames or IP addresses with which to create initial connections. In Kubernetes, the headless service for the Cassandra statefulset can be used instead.

REPLICATION The replication settings, e.g.,

{‘class’: ‘NetworkTopologyStrategy’, ‘dc1’: 1}

I hope this helps manage the Cassandra schema in Kubernetes.

--

--