This page explains the core concepts behind QDP (Quantum Data Plane): the architecture, the supported encoding methods, GPU memory management, DLPack zero-copy integration, and key performance characteristics.
At a high level, QDP is organized as layered components:
qumat.qdp exposes a friendly Python entry point (QdpEngine).qdp/qdp-python builds the _qdp module, bridging Python ↔ Rust and implementing Python-side DLPack (__dlpack__).qdp/qdp-core contains the engine (QdpEngine), encoder implementations, IO readers, GPU pipelines, and DLPack export.qdp/qdp-kernels provides the CUDA kernels invoked by the Rust encoders.Data flow (conceptual):
Python (qumat.qdp) → _qdp (PyO3) → qdp-core (Rust) → qdp-kernels (CUDA)
│ │ │ │
└──── torch.from_dlpack(qtensor) ◄──────┴── DLPack DLManagedTensor (GPU ptr)
QDP encodes classical data into a state vector $\vert\psi\rangle$ for $n$ qubits.
[1, 2^n] (always 2D)[batch_size, 2^n]QDP supports two output precisions:
float32float64QDP uses a strategy pattern for encoding methods:
encoding_method is a string, e.g. "amplitude", "basis", "angle"All encoders perform input validation (at minimum):
len(data) <= 2^nNote: $n=30$ is already very large—just the output state for a single sample is on the order of $2^{30}$ complex numbers.
Goal: represent a real-valued feature vector $x$ as quantum amplitudes:
\[\vert\psi\rangle = \sum_{i=0}^{2^{n}-1} \frac{x_i}{\|x\|_2}\,\vert i\rangle\]Key properties in QDP:
len(x) < 2^n, the remaining amplitudes are treated as zeros.When to use it:
Trade-offs:
num_qubits ($2^n$), so it can become memory-heavy quickly.Goal: map an integer index $i$ into a computational basis state $\vert i\rangle$.
For $n$ qubits with $0 \le i < 2^n$:
Key properties in QDP:
[index]sample_size = 1)num_qubitsWhen to use it:
Angle encoding typically maps features to rotation angles (e.g., via $R_x(\theta)$, $R_y(\theta)$, $R_z(\theta)$) and constructs a state by applying rotations across qubits.
Current status in this codebase:
"angle" encoder exists as a placeholder and returns an error stating it is not implemented yet."amplitude" or "basis" for now.QDP is designed to keep the encoded states on the GPU and to avoid unnecessary allocations/copies where possible.
For each encoded sample, QDP allocates a state vector of size $2^n$. Memory usage grows exponentially:
QDP performs pre-flight checks before large allocations to fail fast with an OOM-aware message (e.g., suggesting smaller num_qubits or batch size).
For high-throughput IO → GPU workflows (especially streaming from Parquet), QDP uses:
Streaming Parquet encoding is implemented as a producer/consumer pipeline:
In the current implementation, streaming Parquet supports:
"amplitude""basis"("angle" is not supported for streaming yet.)
For large workloads, QDP uses multiple CUDA streams and CUDA events to overlap:
This reduces time spent waiting on PCIe transfers and can improve throughput substantially for large batches.
QDP exposes results using the DLPack protocol, which allows frameworks like PyTorch to consume GPU memory without copying.
Conceptually:
DLManagedTensor.__dlpack__.torch.from_dlpack(qtensor) and takes ownership via DLPack’s deleter.Important details:
num_qubits realistic. Output size is $2^n$ and becomes the dominant cost quickly.If you need to understand where time is spent (copy vs compute), QDP supports NVTX-based profiling. See qdp/docs/observability/NVTX_USAGE.md.