How We Built vm-migrate: A CLI for KubeVirt VM Migration Between Clusters
TL;DR: We created vm-migrate, a tool that streamlines moving KubeVirt VMs between Kubernetes clusters by
orchestrating VirtualMachineExport, cloud storage uploads, and CDI imports. This post covers why we built it, the
architecture decisions, and how it solves real-world migration pain points.
The Problem: KubeVirt VM Migration Was Too Many Steps
When we started running KubeVirt at scale across multiple clusters, we hit a wall: there was no simple way to move a VM from cluster A to cluster B. The manual process looked like this:
- Create a
VirtualMachineExporton the source cluster - Generate a temporary token and download URL
curlthe VM disk image down to a workstation- Manually decompress and verify the
.imgfile - Port-forward to the destination cluster’s
cdi-uploadproxy - Run
virtctl image-uploadwith the right flags - Create a
DataVolumeand wait for it to populate - Attach it to a new
VirtualMachine
This was error-prone, required multiple tools (kubectl, virtctl, aws/az CLI), and needed persistent network
connections for large disk images. Worse, it didn’t fit into CI/CD pipelines or GitOps workflows.
Design Goals
We wanted a single binary that could:
- Run anywhere: Local workstation, CI runner, or in-cluster as a Job
- Handle the entire lifecycle: Export → compress → upload → import
- Support multiple storage backends: S3, Azure Blob, and generic HTTP/S
- Be idempotent and pipeline-friendly: Environment variables, non-interactive
- Respect security: Short-lived credentials, TLS verification options
Architecture: Just Enough Orchestration
vm-migrate is a thin orchestration layer over existing KubeVirt and CDI primitives. Here’s the flow:
Export Path
PVC
→ VirtualMachineExport
→ HTTP Stream
→ Tar.gz
→ S3/Azure
- Create VMExport: We generate a
VirtualMachineExportCR targeting the PVC. KubeVirt handles the heavy lifting of making the disk readable. - Download stream: Instead of saving to disk, we stream the HTTP response directly into a compression writer.
- Dual upload: The tarball is written to both local temp storage (if
--outfileis set) and uploaded to cloud storage in parallel. - Presigned URL generation: For private storage, we optionally generate a short-lived URL that the destination cluster can use.
Import Path
URL
→ Decompress
→ .img detection
→ virtctl
→ CDI upload proxy
→ DataVolume
- Download & sniff: We download the first few KB to determine if it’s a tarball or raw
.img. - Streaming decompress: If compressed, we extract on-the-fly to find the first
.imgfile (convention over configuration). - Upload via CDI: We shell out to
virtctl image-uploadafter optionally port-forwarding to thecdi-uploadproxyService. We chose to wrapvirtctlrather than reimplement the CDI upload protocol to stay compatible with future CDI changes.
Key Technical Decisions
1. Don’t Reinvent the Export Wheel
KubeVirt’s VirtualMachineExport is robust and handles token rotation, link expiration, and multi-volume VMs. We just
consume it.
2. Streaming Over Temp Files
VM disks can be 500GB+. We stream through io.Pipe to avoid filling local disk:
// Simplified from the source
pr, pw := io.Pipe()
go func () {
defer pw.Close()
compress(pw, vmExportReader)
}()
uploader.UploadWithContext(ctx, &s3manager.UploadInput{
Bucket: &bucket,
Key: &key,
Body: pr, // Stream directly from pipe
})
3. Shell Out to virtctl
The CDI upload protocol uses custom HTTP multipart streams and token exchange. Rather than maintain parity, we bundle
virtctl in our container image and exec it. This adds ~20MB but eliminates fragility.
4. Azure Pseudo-URL Scheme
We invented azure://container/blob syntax to differentiate from generic HTTPS URLs and avoid parsing ambiguity. The
SDK handles SAS token injection.
Challenges We Hit
TLS Verification in Cluster
The VirtualMachineExport download URL uses a cluster-internal Service cert. Outside the cluster, this fails TLS
validation. We default --insecure-skip-tls-verify=true but warn users in docs to set it false in production with
proper CA trust.
Port-Forwarding Race Conditions
When vm-migrate port-forwards to cdi-uploadproxy, we need to wait for the port to be ready and handle cleanup on
SIGINT. We used os/exec with context cancellation and a small retry loop.
Multi-Disk VMs
KubeVirt’s VMExport can export all volumes, but we only support the first .img found. For most VMs, this is the boot
disk. Future versions will support --volume-name filtering.
Real-World Usage Pattern
Here’s how we migrate 100 VMs during a cluster upgrade:
# In-cluster export Job (see examples/k8s/job-export.yaml)
for vm in $(kubectl get vm -n prod -o name); do
vm-migrate export \
--namespace prod \
--pvc $(basename $vm) \
--provider s3 \
--bucket migrations-$(date +%Y) \
--key prod/$(basename $vm)-$(date +%s).tar.gz
done
# Destination cluster import (from CI)
for url in $(aws s3 ls s3://migrations-2025/prod/ --recursive | awk '{print $4}'); do
vm-migrate import "s3://migrations-2025/${url}" \
--name "$(basename "$url" | sed -E 's/-[0-9]+\.tar\.gz$//')" \
--size 100Gi \
--namespace prod
done