February 2, 20268 min read

Introducing QuMat v0.5.0

Jie-Kai Chang

Apache Mahout PMC Member

Mahout Team

Apache Mahout

We are excited to announce QuMat v0.5.0, the next release of Mahout's quantum machine learning stack.

The main theme of this release is moving QuMat from a circuit abstraction into a more complete quantum ML development platform. The release introduces the first public proof-of-concept of QDP (Quantum Data Plane), adds GPU-accelerated data-to-state encoding, improves QuMat's backend behavior, and tightens the development workflow around testing, packaging, and documentation.

Special thanks to everyone who contributed to this release. We would like to thank PMC members Jie-Kai Chang, Guan-Ming Chiu, Andrew Musselman, Shannon Quinn (PMC Chair), and Trevor Grant. We also thank Committers Hsien-Cheng Huang, Kuan-Hao Huang and Krishna Dave, along with contributors Nai-Jui Yeh (Union.ai), Che-Yu Wu (Synology), Jia-Wei Jiang (TSMC), Guan-Hua Wen, Yu-Chen Lai, and the broader Apache Mahout community.

What's New in QuMat v0.5.0

QuMat v0.5.0 introduces several important changes:

QDP makes its first release appearance. qumat 0.5.0 can now install the QDP extension through the qumat[qdp] extra; the native extension itself is distributed as qumat-qdp 0.1.0.
GPU state encoding is now part of the QuMat workflow. QDP prepares quantum states using Rust and CUDA kernels rather than Python-side preprocessing loops.
Encoded states can move into PyTorch without a host copy. QDP exposes DLPack-compatible tensors, making it possible to hand CUDA-backed state vectors directly to PyTorch.
QuMat's circuit APIs are more complete and better validated. The release expands gate coverage, improves parameter handling, and fixes backend-specific behavior across Qiskit, Cirq, and Amazon Braket.
The project is easier to develop and test. The release consolidates Python packaging, adds pytest markers for GPU/QDP tests, and expands CI and documentation around QuMat and QDP.

Let's look at the major pieces in more detail.

QDP: The New Data Plane for Quantum ML

A common bottleneck in quantum machine learning is preparing quantum states from classical data. Most QML frameworks need to simulate state-preparation circuits to load data into a simulator. For amplitude encoding, that can mean applying a long sequence of rotation gates before the model has done any useful work.

We call this the Hiking Problem: spending most of the time hiking up the mountain, simulating gates only to load data, instead of flying the helicopter and running the actual QML model.

QDP is Mahout's answer to that problem. Instead of simulating a state-preparation circuit, QDP constructs the mathematically equivalent state vector directly in GPU memory. Conceptually, this turns an expensive gate-simulation problem into a memory and vector math problem.

The v0.5.0 release includes QDP as an early proof-of-concept. The target is to reduce data-loading overhead from circuit simulation complexity toward something closer to memory-copy cost, with the long-term goal of 50x-100x faster dataset loading in realistic QML simulation workflows.

The high-level architecture is:

[Disk / Parquet]
      |
      v  Apache Arrow
[CPU RAM]  Rust preprocessing: normalize and pad
      |
      v  PCIe host-to-device
[GPU VRAM]  CUDA kernels: direct state construction
      |
      v  DLPack
[Downstream: PyTorch / Qiskit / PennyLane]

In v0.5.0, QDP supports four encoding families:

Amplitude encoding for normalized state-vector preparation.
Angle encoding for one-value-per-qubit rotation inputs.
Basis encoding for computational basis states.
IQP-style encoding for entangled feature maps used in quantum ML workflows.

QDP can read from Python lists, NumPy arrays, PyTorch tensors, and file-backed sources such as Parquet, Arrow, NumPy, and PyTorch files.

Zero-Copy PyTorch Integration

The first QDP extension package shipped with QuMat v0.5.0, qumat-qdp 0.1.0, focuses on a practical workflow: encode data on the GPU, then pass the encoded state into a tensor framework without copying it back through host memory.

import qumat.qdp as qdp
import torch

engine = qdp.QdpEngine(device_id=0)

qtensor = engine.encode(
    [1.0, 2.0, 3.0, 4.0],
    num_qubits=2,
    encoding_method="amplitude",
)

tensor = torch.from_dlpack(qtensor)
print(tensor)  # Complex tensor on CUDA

This DLPack path is the foundation for building faster quantum ML pipelines: preprocessing can happen in QDP, encoded tensors can stay on the GPU, and downstream training code can consume them through PyTorch.

The proposal-level interface for QDP started from a lower-level direct-amplitude API:

import pyarrow as pa
import torch

import mahout_qdp as qdp

raw_vector = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8]
input_data = pa.array(raw_vector, type=pa.float64())

state_vector_gpu = qdp.encode_amplitude(input_data)
dlpack_handle = state_vector_gpu.to_dlpack()
tensor = torch.utils.dlpack.from_dlpack(dlpack_handle)

print(tensor)
print("device:", tensor.device)

The packaged v0.5.0 user-facing API is exposed through qumat.qdp.QdpEngine, but the underlying idea is the same: direct state preparation, GPU-resident buffers, and zero-copy handoff to downstream tensor tools.

Installing QuMat with QDP

The base Qumat package can be installed as usual:

pip install qumat==0.5.0

To install Qumat with QDP support:

pip install "qumat[qdp]==0.5.0"

This installs qumat 0.5.0 and resolves the QDP native extension package, qumat-qdp 0.1.0. These are intentionally different package versions: Qumat is the top-level Mahout quantum ML package, while qumat-qdp is the first separately published QDP extension behind the qdp extra.

The qumat-qdp 0.1.0 wheels support Python 3.10, 3.11, and 3.12 on manylinux x86_64. The accelerated QDP path requires an NVIDIA CUDA-capable GPU.

The core package is also available directly on PyPI:

QuMat: https://pypi.org/project/qumat/0.5.0/
QDP extension: https://pypi.org/project/qumat-qdp/0.1.0/

For the QDP extension, the development target is Rust 1.85 or newer, CUDA Toolkit 12.2.x or newer, Python 3.11 or newer, PyTorch 2.2.x or newer, PyArrow 16.x or newer, and PyO3 0.21.

Verification and Benchmarking Strategy

The QDP is designed to be verified in two ways: correctness and performance.

For correctness, the plan is to compare QDP against reference circuit-based initialization:

Generate random input vectors across a broad range of sizes.
Use QDP to construct a state vector directly on GPU.
Use a reference framework such as Qiskit to build and simulate a standard amplitude-initialization circuit.
Compare the resulting states for fidelity.

For performance, the goal is to isolate the Data-to-State pipeline: the time from having a classical vector in CPU RAM to obtaining a ready-to-use quantum state in GPU memory. This explicitly excludes downstream model training so the benchmark measures state preparation itself.

The key baselines are:

Qiskit initialize() — the circuit-simulation path.
PennyLane AmplitudeEmbedding — a common ML-oriented workflow.
Qiskit Statevector(data) — a raw data-loading path.

The two key metrics are:

Time-to-State latency — how long one vector takes to become a valid GPU-backed quantum state.
Throughput — how many vectors per second the loader can sustain under batched workloads.

The initial benchmark plan targets NVIDIA RTX 30-series GPUs, with scaling tests across state sizes and DataLoader-style batch tests that simulate hybrid QML training loops.

QuMat API and Backend Improvements

QuMat 0.5.0 also improves the core circuit layer. The release expands gate coverage, adds validation, and fixes backend behavior that could previously make circuits behave differently depending on the selected framework.

Highlights include:

Additional gate support, including rotational gates, T gate, CSWAP, and U gate coverage.
Better parameter validation, including partially bound parameter checks.
Stronger qubit index validation to catch invalid circuits earlier.
Backend fixes for Qiskit, Cirq, and Amazon Braket.
AWS session support for Amazon Braket backend initialization.
draw_circuit() fixes so circuit visualization returns a usable string.
Measurement improvements, including overlap measurement and swap-test support.

These changes make Qumat more predictable as a cross-framework API. Users can write circuits at the Qumat layer and move more confidently between simulators and hosted backends.

Better Testing, Packaging, and Documentation

This release also invests in project infrastructure. That work is less visible than QDP, but it matters for contributors and downstream users.

The v0.5.0 cycle includes:

uv-based Python dependency management.
Expanded pytest configuration and custom GPU/QDP markers.
Ruff and pre-commit integration.
Python build workflows and broader CI coverage.
New QDP getting-started, concepts, and benchmark documentation.
Documentation restructuring under the Qumat and QDP namespaces.

Together, these changes make it easier to reproduce local development environments, run CPU-only tests on ordinary machines, and reserve GPU-specific tests for machines with CUDA available.

What's Next

QuMat v0.5.0 establishes QDP as a first-class part of Mahout's quantum stack. The next step is to build on that foundation: more encoding coverage, better GPU benchmark coverage, stronger end-to-end quantum ML examples, and smoother documentation for users moving from Qumat circuits into QDP-backed training pipelines.

The QDP roadmap after this release focuses on:

Hardening the Rust workspace, CUDA build configuration, pre-commit checks, and integration tests.
Expanding the direct-state-preparation core with CUDA kernels, CPU-side normalization and padding, GPU memory management, and Apache Arrow/Parquet ingestion.
Improving DLPack safety and resource management around tensor ownership and deleters.
Continuing PyO3-based Python binding work.
Adding broader fidelity and performance testing across devices.
Expanding encoder support and future integrations, including TensorFlow/Cirq paths and larger dataset or multi-GPU workflows.

What's New in QuMat v0.5.0​

QDP: The New Data Plane for Quantum ML​

Zero-Copy PyTorch Integration​

Installing QuMat with QDP​

Verification and Benchmarking Strategy​

QuMat API and Backend Improvements​

Better Testing, Packaging, and Documentation​

What's Next​

Links​