kubernetes

Run InfluxDB v1 on Kubernetes? Swap the Operator, Keep the URLs.

Your InfluxDB v1 Kubernetes stack is already half the battle—Helm charts, StatefulSets, monitoring, alerting. HyperbyteDB's operator drops in as a replacement. Same InfluxDB v1 API endpoints, same client configs, same Grafana dashboards.

Austin

02 Jun 2026 • 6 min read

Your InfluxDB v1 Kubernetes deployment probably looks like this: a Helm chart for the operator or StatefulSet, a few manifests for Telegraf sidecars, a Grafana datasource pointing at port 8086, and Prometheus scraping the /metrics endpoint. You've tuned the flush intervals, set up PodDisruptionBudgets, maybe wired up alerting for pod restarts. The hard parts are done.

The problem isn't the deployment—it's the database underneath. InfluxDB v1 OSS gives you a single node with no replication. InfluxDB Enterprise gives you replication, but only after a sales call and a licensing conversation. Either way, you're stuck with TSM storage that crawls at high cardinality and a single-writer model that doesn't survive a node failure without manual intervention.

HyperbyteDB solves this with a Kubernetes operator that drops into your existing cluster, exposes the same InfluxDB v1 HTTP endpoints, and lets your Telegraf, Grafana, and custom clients keep working without touching a single config.

The operator you already know how to deploy

HyperbyteDB's operator installs via Helm from an OCI registry:

kubectl create namespace hyperbytedb-system

helm install hyperbytedb-operator \
  oci://ghcr.io/hyperbyte-cloud/hyperbytedb-operator \
  --namespace hyperbytedb-system

Prerequisites are Kubernetes 1.26+ and Helm 3.12+—the same versions your existing cluster likely runs. The operator registers three CRDs: HyperbytedbCluster, HyperbytedbBackup, and HyperbytedbRestore. No sidecar injection, no webhook gymnastics, no custom admission controllers beyond what the operator itself manages.

You can also install with a single YAML manifest if Helm isn't your thing:

kubectl apply -f https://raw.githubusercontent.com/hyperbyte-cloud/hyperbytedb-operator/main/dist/install.yaml

Declarative clusters with HyperbytedbCluster

The HyperbytedbCluster custom resource is the entire cluster spec. The operator reconciles it into StatefulSets, headless and client Services, PVCs, ConfigMaps, and optional monitoring resources. Here's a production-ready three-node cluster:

apiVersion: hyperbytedb.hyperbyte.cloud/v1alpha1
kind: HyperbytedbCluster
metadata:
  name: hyperbytedb-cluster
  namespace: default
spec:
  replicas: 3
  image: hyperbytedb:latest
  version: "1.0.0"
  server:
    port: 8086
    requestTimeoutSecs: 30
    queryTimeoutSecs: 30
  storage:
    backend: local
    volumeClaimTemplate:
      size: 10Gi
  flush:
    intervalSecs: 10
    walSizeThresholdMb: 64
  cluster:
    heartbeatIntervalSecs: 2
    heartbeatMissThreshold: 5
    replicationMaxRetries: 5
    raftHeartbeatIntervalMs: 300
    raftElectionTimeoutMs: 1000
  monitoring:
    enabled: true
    serviceMonitor: true
  failover:
    enabled: true
    maxFailoverCount: 1
    failoverTimeoutSecs: 300

That's it. The operator creates the StatefulSet, the headless Service for peer discovery, a client Service for routing traffic, and wires up Raft consensus from the StatefulSet ordinals. No manual peer registration, no bootstrap scripts.

Scaling is a kubectl command away:

kubectl scale hyperbytedbcluster hyperbytedb-cluster --replicas=5

The operator adds nodes, triggers Raft membership changes, and waits for the new replica to sync before marking it ready. Rolling upgrades work the same way—change the version field and the operator upgrades one pod at a time, waiting for readiness before proceeding.

Health-aware routing with hyperbytedb-proxy

Here's where HyperbyteDB's Kubernetes story diverges from every other InfluxDB replacement. The operator can deploy hyperbytedb-proxy—a health-aware HTTP reverse proxy that sits between your clients and the database pods.

The proxy does three things your standard Kubernetes Service can't:

Health-aware routing. The proxy probes each backend on the /health endpoint and classifies backends as Active, Draining, Down, or Unknown. Only Active backends receive traffic. A backend returning 503 with a drain-style body is treated as retryable—your request moves to the next healthy pod automatically.
Hold-and-wait during rolling restarts. When all backends are temporarily Down or Draining (say, during a rolling upgrade), the proxy doesn't immediately return 503. It waits up to holdTimeoutSecs (default 10 seconds) for an Active backend to appear. If a pod comes back healthy during the hold window, the request routes through without the client ever seeing an error.
Retry across backends. On transport errors, upstream 502/504, or 503 responses that look like drain/sync events, the proxy retries on another backend up to max_retries (default 2). With the default config, a single request can be forwarded up to three times—initial try plus two retries.

The proxy exposes Kubernetes-native probe endpoints:

/healthz — Liveness probe. Always 200 once the process is up.
/readyz — Readiness probe. Returns 200 only when at least one backend is Active. Configure your Kubernetes readiness probe against this so the proxy Service doesn't route traffic until the database is actually healthy.
/admin/backends — JSON snapshot of the backend pool with health state, inflight request counts, and probe stats.

Enable the proxy in your cluster manifest by setting spec.proxy.enabled: true. The operator creates a separate Deployment and Service for the proxy, configured via environment variables that it injects from the cluster spec.

Your clients don't change

The proxy listens on port 8086 and forwards InfluxDB v1 endpoints unchanged. Your Telegraf configs still point at http://hyperbytedb-proxy.default.svc:8086. Your Grafana datasources still use the InfluxDB v1 plugin pointing at the same URL. Your custom scripts still POST line protocol to /write?db=mydb and query via /query?db=mydb.

The InfluxDB v1 HTTP API is the surface area. Underneath, HyperbyteDB runs ClickHouse (via embedded chDB) for queries, Parquet for columnar storage, and RocksDB for the write-ahead log and metadata. But that's invisible to your clients. The wire protocol is the contract.

For Telegraf specifically: the standard InfluxDB output plugin works without modification. Batch sizes, write intervals, and timestamps transfer directly. The same is true for any tool that POSTs line protocol to /write?db=name—Python requests, Go net/http, bash curl, it all works.

High availability without the licensing conversation

InfluxDB Enterprise charges per-node for replication and high availability. HyperbyteDB includes master-master replication in the open-source operator. The cluster spec above gives you:

Async or sync_quorum replication. The default is async—writes fan out to all peers, and the client gets a 204 as soon as the local WAL append succeeds. Set cluster.replication.mode: sync_quorum if you need acknowledgment from a quorum of nodes before returning success.
Raft consensus for cluster membership. Schema mutations and membership changes go through Raft. Data writes use master-master HTTP fan-out for throughput.
Automatic failover. The operator detects unhealthy members (configurable via failoverTimeoutSecs, minimum 60 seconds) and deletes the pod. If you're using the proxy, the hold-and-wait semantics keep your clients alive during the failover window.
Self-repair. Nodes that rejoin after a failure sync via a manifest comparison protocol—the WAL sequence and Parquet file inventory are compared, and only the gap is transferred. No full resync required.

For production workloads, the high-availability manifest adds S3 storage, autoscaling, and zone-aware scheduling:

apiVersion: hyperbytedb.hyperbyte.cloud/v1alpha1
kind: HyperbytedbCluster
metadata:
  name: hyperbytedb-ha
  namespace: default
spec:
  replicas: 5
  storage:
    backend: s3
    s3:
      bucket: hyperbytedb-data
      prefix: "production/"
      region: us-east-1
      credentialsSecretName: hyperbytedb-s3-credentials
  autoscaling:
    enabled: true
    minReplicas: 3
    maxReplicas: 10
    targetCPUUtilizationPercentage: 70
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule

S3 storage means your data survives node loss entirely. Autoscaling means you don't overprovision for peak load. Zone-aware scheduling means a single availability zone failure doesn't take out all your replicas.

Monitoring that plugs into your existing stack

The operator exposes Prometheus metrics on port 8086 and optionally creates a ServiceMonitor resource when monitoring.serviceMonitor: true is set. Your existing Prometheus Operator picks it up automatically—no manual scrape config edits.

For Grafana, set monitoring.grafanaDashboard: true and the operator creates a ConfigMap with the grafana_dashboard label. If your Grafana instance is configured to auto-discover dashboard ConfigMaps, the dashboards appear without manual import.

The proxy also exposes its own Prometheus metrics at /metrics: hyperbytedb_proxy_requests_total (with outcomes ok, fatal, exhausted), hyperbytedb_proxy_request_duration_seconds, and hyperbytedb_proxy_no_backend_total for tracking hold-window expirations.

Backups and restores as first-class CRDs

The operator includes HyperbytedbBackup and HyperbytedbRestore custom resources. Backups can be one-shot or scheduled via cron expression, targeting S3 with configurable retention. Restores handle the full lifecycle: scale down the cluster, copy data from the backup, scale back up. Point-in-time restore is supported via restoreTimestamp.

The migration path

If you're running InfluxDB v1 on Kubernetes today, the migration is a three-step process:

Install the operator. One Helm command. Your existing cluster keeps running.
Deploy a HyperbytedbCluster. Point it at the same namespace, same storage class, same monitoring stack. The operator creates the StatefulSet and Services.
Switch the URL. Update your Telegraf output plugin, Grafana datasource, and custom client configs to point at the HyperbyteDB Service. If you're using the proxy, point at the proxy Service instead.

Existing data in InfluxDB v1 requires export and import—there's no direct TSM file migration. Use influx_inspect export to dump your data to line protocol, then POST it to HyperbyteDB's /write endpoint. The same line protocol format works on both sides.

Your retention policies, downsampling rules, and Telegraf collection intervals all transfer. The storage layer underneath is different—Parquet and ClickHouse instead of TSM files—but the InfluxQL query surface is the same. DERIVATIVE(), FLOOR(), GROUP BY time(), subqueries, all of it works on the /query endpoint.

What this means for platform teams

The strongest signal for product-market fit isn't a benchmark or a feature list—it's that teams can point their existing Influx-shaped clients at HyperbyteDB and only change the URL. On Kubernetes, this extends to the deployment surface: same InfluxDB v1 API endpoints, same Telegraf configs, same Grafana dashboards, same Prometheus metrics. The only change is the database URL.

HyperbyteDB's Kubernetes operator gives you the deployment lifecycle (rolling upgrades, scaling, failover, backups) and the runtime properties (master-master replication, health-aware routing, high cardinality handling) that InfluxDB Enterprise charges enterprise pricing for—in the open-source binary.

Install the operator. Deploy the cluster. Change the URL. Your stack keeps working.