Version: next

QDP

Backend detection and selection for QDP.

Available backends:

_qdp (Rust+CUDA) -- native extension, highest performance
torch (PyTorch) -- reference implementation, must be explicitly selected

Auto-detection only activates the Rust backend. To use the PyTorch reference backend, call force_backend(Backend.PYTORCH) or use the .backend("pytorch") builder method on QdpBenchmark / QuantumDataLoader.

Backend

class Backend(enum.Enum)

Available QDP encoding backends.

get_qdp

@lru_cache(maxsize=1)
def get_qdp() -> ModuleType | None

Return the _qdp Rust extension module, or None if unavailable.

get_torch

@lru_cache(maxsize=1)
def get_torch() -> ModuleType | None

Return the torch module, or None if unavailable.

get_backend

def get_backend() -> Backend

Return the active backend.

Only the Rust backend is auto-detected. The PyTorch reference backend must be selected explicitly via :func:force_backend.

force_backend

def force_backend(backend: Backend | None) -> None

Override automatic backend detection.

Pass None to restore auto-detection. Primarily useful for testing and benchmarking.

require_backend

def require_backend() -> Backend

Return the current backend or raise if none is available.

Benchmark API: supports Rust-optimized pipeline and PyTorch reference backend.

Usage: from qumat_qdp import QdpBenchmark, ThroughputResult, LatencyResult

# Rust backend (default):
result = (QdpBenchmark(device_id=0).qubits(16).encoding("amplitude")
          .batches(100, size=64).warmup(2).run_throughput())

# PyTorch reference (must be explicitly selected):
result = (QdpBenchmark(device_id=0).backend("pytorch").qubits(16)
          .encoding("amplitude").batches(100, size=64).run_throughput())

ThroughputResult

@dataclass
class ThroughputResult()

Throughput benchmark measurement.

Returned by :meth:QdpBenchmark.run_throughput. duration_sec is the measured timed section after any configured warmup batches. vectors_per_sec is computed over total_batches * batch_size encoded input vectors.

LatencyResult

@dataclass
class LatencyResult()

Latency benchmark measurement.

Returned by :meth:QdpBenchmark.run_latency. duration_sec is the same timed interval used for throughput, and latency_ms_per_vector is the average milliseconds per encoded input vector across the measured batches.

QdpBenchmark

class QdpBenchmark()

Builder for throughput/latency benchmarks.

Supports two backends:

"rust" (default): Rust-optimized pipeline (no Python for-loop, GIL released).
"pytorch": Pure PyTorch reference implementation (must be explicitly selected).

qubits

def qubits(n: int) -> QdpBenchmark

Set the number of qubits for benchmarked encodings.

Arguments:

n: Number of qubits in each encoded output state.

Returns:

This builder for fluent chaining.

encoding

def encoding(method: str) -> QdpBenchmark

Set the encoding method to benchmark.

Arguments:

method: Encoding name, for example "amplitude", "angle", "basis", "iqp", or "iqp-z".

Returns:

This builder for fluent chaining.

batches

def batches(total: int, size: int = 64) -> QdpBenchmark

Set the number and size of benchmark batches.

Arguments:

total: Number of timed batches to process.
size: Number of vectors in each batch.

Returns:

This builder for fluent chaining.

prefetch

def prefetch(n: int) -> QdpBenchmark

Accept a prefetch setting for fluent API compatibility.

The current Rust benchmark pipeline manages work internally and the PyTorch reference path does not use a Python-side prefetch queue, so n is intentionally ignored.

Arguments:

n: Requested prefetch depth; currently unused.

Returns:

self for fluent builder chaining.

warmup

def warmup(n: int) -> QdpBenchmark

Set the number of warmup batches run before timing.

Arguments:

n: Number of batches to execute before measurements begin.

Returns:

This builder for fluent chaining.

backend

def backend(name: str) -> QdpBenchmark

Select the benchmark execution backend.

"rust" (the default) uses the native optimized pipeline exposed by the _qdp extension and raises at run time if that extension or entry point is unavailable. "pytorch" uses the pure-PyTorch reference implementation on the selected CUDA device when usable, otherwise CPU.

Arguments:

name: Backend name, either "rust" or "pytorch".

Raises:

ValueError: If name is not a supported backend.

Returns:

self for fluent builder chaining.

dtype

def dtype(dtype: str) -> QdpBenchmark

Set pipeline element dtype: 'f64' (default) or 'f32'.

'f32' activates the zero-copy float32 batch path where the encoding supports it; encodings without an f32 kernel automatically fall back to f64 inside the Rust pipeline.

run_throughput

def run_throughput() -> ThroughputResult

Run the configured throughput benchmark.

qubits() and batches() must be configured before calling this method. The default "rust" backend calls the native _qdp pipeline with any configured warmup batches; "pytorch" runs the reference encoder loop and synchronizes CUDA timing when applicable.

Raises:

ValueError: If required benchmark parameters are missing.
RuntimeError: If the Rust backend is selected but unavailable.

Returns:

A :class:ThroughputResult containing elapsed seconds and encoded vectors per second.

run_latency

def run_latency() -> LatencyResult

Run the configured latency benchmark.

qubits() and batches() must be configured before calling this method. The Rust backend reports latency from the native pipeline; the PyTorch backend derives average latency from its throughput run.

Raises:

ValueError: If required benchmark parameters are missing.
RuntimeError: If the Rust backend is selected but unavailable.

Returns:

A :class:LatencyResult containing elapsed seconds and mean milliseconds per encoded vector.

Unified Python facade for explicit QDP backend selection.

QdpEngine

class QdpEngine()

Select and delegate to a native QDP encoding backend.

QdpEngine is the small public Python facade used by callers that want explicit backend selection. backend="cuda" routes to the Rust/CUDA extension-backed engine. backend="amd" and backend="triton_amd" route to the AMD/Triton implementation. The selected backend is exposed as self.backend ("cuda" or "amd") and all encode() calls are forwarded to that engine.

Arguments:

device_id: GPU device ordinal to use.
precision: Numeric precision requested from the backend, such as "float32" or "float64" when supported by that backend.
backend: Backend selector. Valid values are "cuda", "amd", and "triton_amd".

Raises:

ValueError: If backend is not one of the supported selectors.

init

def __init__(device_id: int = 0,
             precision: str = "float32",
             backend: str = "cuda") -> None

Create a backend-selecting QDP engine facade.

Arguments:

device_id: GPU device ordinal to use.
precision: Numeric precision requested from the backend.
backend: Backend selector, either "cuda", "amd", or "triton_amd".

Raises:

ValueError: If backend is not supported.

encode

def encode(data: Any,
           num_qubits: int,
           encoding_method: str = "amplitude") -> Any

Encode input samples with the configured backend.

Arguments:

data: Input samples accepted by the selected backend.
num_qubits: Number of qubits in the output state vector.
encoding_method: Encoding strategy, such as "amplitude", "angle", "basis", "iqp", "iqp-z", or "phase" when supported by the backend.

Raises:

ValueError: If the backend does not support encoding_method.

Returns:

Backend-native encoded tensor or tensor-like result.

Unified tensor facade for backend-native QDP results.

QdpTensor

@dataclass
class QdpTensor()

DLPack-compatible wrapper for backend-native QDP tensor results.

The Rust/CUDA path and other native backends may return objects whose concrete tensor type is backend-specific. QdpTensor preserves that object in value while exposing __dlpack__ and __dlpack_device__ so consumers such as PyTorch can import it without a copy.

Arguments:

value: Backend-native tensor-like object. It must implement the DLPack protocol when converted with to_torch() or torch.from_dlpack.
backend: Human-readable backend name used in error messages.

Raises:

RuntimeError: If value does not implement the required DLPack methods when conversion is attempted.

dlpack

def __dlpack__(stream: int | None = None) -> Any

Return a DLPack capsule for the wrapped backend tensor.

Arguments:

stream: Optional consumer stream to pass through to the wrapped tensor's __dlpack__ implementation.

Raises:

RuntimeError: If the wrapped value does not implement __dlpack__.

Returns:

A DLPack capsule representing value.

__dlpack_device__

def __dlpack_device__() -> Any

Return the DLPack device descriptor for the wrapped tensor.

Raises:

RuntimeError: If the wrapped value does not implement __dlpack_device__.

Returns:

The (device_type, device_id) tuple reported by value.

to_torch

def to_torch() -> Any

Convert the wrapped tensor to a PyTorch tensor via DLPack.

Returns:

A torch.Tensor sharing storage with the backend tensor when the backend's DLPack producer supports zero-copy exchange.

QuantumTensor

Backward-compatible alias for :class:QdpTensor.

Quantum Data Loader: Python builder for Rust-backed batch iterator.

Usage: from qumat_qdp import QuantumDataLoader

loader = (QuantumDataLoader(device_id=0).qubits(16).encoding("amplitude")
          .batches(100, size=64).source_synthetic())
for qt in loader:
    batch = torch.from_dlpack(qt)
    ...

QuantumDataLoader

class QuantumDataLoader()

Builder for batched QDP encoding iterators.

QuantumDataLoader can generate synthetic input samples or read supported file formats, then encode each batch with the selected backend. The default "rust" backend returns Rust-backed QuantumTensor batches, while the explicit "pytorch" backend returns torch.Tensor batches. The "auto" backend tries the Rust extension first and falls back to PyTorch when the native extension is unavailable.

init

def __init__(device_id: int = 0,
             num_qubits: int = 16,
             batch_size: int = 64,
             total_batches: int = 100,
             encoding_method: str = "amplitude",
             seed: int | None = None) -> None

Create a loader builder with default synthetic batching settings.

Arguments:

device_id: GPU device ordinal used by native and PyTorch backends.
num_qubits: Number of qubits in each encoded output state.
batch_size: Number of samples per emitted batch.
total_batches: Maximum number of batches to emit.
encoding_method: Encoding method name.
seed: Optional synthetic data seed.

Raises:

ValueError: If any initial setting is invalid.

qubits

def qubits(n: int) -> QuantumDataLoader

Set the number of qubits used by subsequent encodings.

n must be a positive integer. The value controls the encoded state size (for example, amplitude and phase-style encodings produce vectors of length 2**n) and the expected input width for encodings such as "angle" and "iqp-z".

Arguments:

n: Positive qubit count.

Raises:

ValueError: If n is not a positive integer.

Returns:

self for fluent builder chaining.

encoding

def encoding(method: str) -> QuantumDataLoader

Set the quantum feature encoding method.

Valid values are "amplitude", "angle", "basis", "iqp", "iqp-z", and "phase". Use these canonical lowercase names because the selected backend receives the string exactly as supplied. The PyTorch reference backend supports the same methods as

Arguments:

method: Encoding method name.

Raises:

ValueError: If method is empty, not a string, or not a supported encoding.

Returns:

self for fluent builder chaining.

batches

def batches(total: int, size: int = 64) -> QuantumDataLoader

Set the number of batches to produce and samples per batch.

Both total and size must be positive integers. For synthetic sources, total is the exact number of generated batches. For file sources handled by the PyTorch fallback, iteration stops at the smaller of total and the number of complete/partial batches available from the loaded file.

Arguments:

total: Positive maximum number of batches to emit.
size: Positive number of samples per encoded batch.

Raises:

ValueError: If either argument is not a positive integer.

Returns:

self for fluent builder chaining.

source_synthetic

def source_synthetic(total_batches: int | None = None) -> QuantumDataLoader

Select the synthetic data source.

Synthetic data is the default when no file source is configured, but calling this method records the source choice explicitly. Use seed() to make generated samples reproducible where the selected backend supports seeded generation. If total_batches is provided, it overrides the current batch count and must be a positive integer. Selecting both source_synthetic() and source_file() on the same loader is rejected when iteration starts.

Arguments:

total_batches: Optional positive replacement for the configured number of batches.

Raises:

ValueError: If total_batches is provided but is not a positive integer.

Returns:

self for fluent builder chaining.

source_file

def source_file(path: str, streaming: bool = False) -> QuantumDataLoader

Use a file data source.

Non-streaming native loading accepts .parquet, .arrow, .feather, .ipc, .npy, .pt, .pth, and .pb files. The PyTorch fallback path supports only .npy, .pt, and .pth inputs because it loads the full tensor into memory before encoding. Streaming mode is native-only and currently accepts .parquet files. Remote s3:// and gs:// paths are accepted when the native remote I/O feature is enabled; remote query strings and fragments are rejected.

Arguments:

path: Local or supported remote input path.
streaming: Whether to request native streaming file loading.

Raises:

ValueError: If path is empty, includes an unsupported remote query/fragment, or requests streaming for an unsupported extension.

Returns:

self for fluent builder chaining.

seed

def seed(s: int | None = None) -> QuantumDataLoader

Set or clear the synthetic data seed.

None leaves the loader unseeded for the native Rust path and maps to the PyTorch reference path's default deterministic seed. Integer seeds must fit Rust u64 so the same configuration can be passed to the native backend.

Arguments:

s: None or an integer in [0, 2**64 - 1].

Raises:

ValueError: If s is not None or a valid Rust u64.

Returns:

self for fluent builder chaining.

null_handling

def null_handling(policy: str) -> QuantumDataLoader

Set how nullable file inputs are handled by the native loader.

Valid policies are "fill_zero" (replace nulls with zero before encoding) and "reject" (fail on null input). The policy is passed through to Rust file and synthetic loader creation when available. The PyTorch fallback loaders do not consume this setting because supported .npy/.pt/.pth inputs are loaded as dense tensors.

Arguments:

policy: Null handling policy, either "fill_zero" or "reject".

Raises:

ValueError: If policy is not supported.

Returns:

self for fluent builder chaining.

backend

def backend(name: str) -> QuantumDataLoader

Set encoding backend: 'rust', 'pytorch', or 'auto'.

'auto': tries the Rust backend first and falls back to the PyTorch reference backend if the Rust extension is unavailable, emitting a RuntimeWarning when the fallback occurs. 'rust' raises if the extension is missing. 'pytorch' always uses the pure-PyTorch path. Returns self for chaining.

as_torch_dataset

def as_torch_dataset()

Wrap this loader as a torch.utils.data.IterableDataset.

Returns a dataset that yields one encoded batch (torch.Tensor) per iteration step, compatible with torch.utils.data.DataLoader.

Example::

from qumat_qdp import QuantumDataLoader import torch

dataset = (QuantumDataLoader() .qubits(16).encoding("amplitude") .batches(100, size=64) .source_synthetic() .as_torch_dataset()) loader = torch.utils.data.DataLoader(dataset, batch_size=None, num_workers=0) for batch in loader: ... # batch is torch.Tensor, shape (64, 2**16)

Note: batch_size=None in DataLoader disables DataLoader's own batching; num_workers=0 is required because the Rust backend holds GPU state that cannot be pickled for multi-process workers.

iter

def __iter__() -> Iterator[object]

Return iterator that yields one encoded batch per step.

With the default "rust" backend, yields QuantumTensor (use torch.from_dlpack(qt)). With .backend("pytorch"), yields torch.Tensor directly.

Triton AMD backend for QDP encodings on ROCm.

is_triton_amd_available

def is_triton_amd_available() -> bool

Return whether the Triton AMD backend appears usable.

Returns:

True when PyTorch reports a ROCm device, Triton imports, and the active Triton target is HIP or cannot be queried reliably.

TritonAmdEngine

@dataclass
class TritonAmdEngine()

ROCm/Triton implementation of the QDP encoder interface.

This engine targets AMD GPUs through a PyTorch ROCm runtime plus the Triton Python package. encode() accepts "amplitude", "angle", "basis", "iqp", "iqp-z", and "phase". The phase encoder uses a fused Triton HIP kernel for float32 and 1 <= num_qubits <= 32; other supported cases fall back to vectorized PyTorch operations on the same ROCm device.

precision accepts "float32"/"f32"/"float" and "float64"/"f64"/"double". Runtime availability is checked when encode() is called and raises a descriptive RuntimeError if PyTorch ROCm or Triton is unavailable.

Arguments:

device_id: ROCm device ordinal, addressed through PyTorch as cuda:{device_id}.
precision: Floating-point precision for real inputs and complex outputs.

check_runtime

def check_runtime() -> None

Validate that the process can use the Triton AMD backend.

Raises:

RuntimeError: If PyTorch ROCm support or Triton is unavailable.

encode_amplitude

def encode_amplitude(data: Any, num_qubits: int) -> Any

Encode real-valued samples as normalized amplitudes.

Arguments:

data: One- or two-dimensional samples with width at most 2**num_qubits.
num_qubits: Number of qubits in the encoded state.

Raises:

ValueError: If the sample width exceeds the state length.

Returns:

Complex tensor of shape (batch, 2**num_qubits).

encode_angle

def encode_angle(data: Any, num_qubits: int) -> Any

Encode samples with product-state angle encoding.

Arguments:

data: One- or two-dimensional angle samples with exactly num_qubits values per sample.
num_qubits: Number of qubits in the encoded state.

Raises:

ValueError: If the sample width is not num_qubits.

Returns:

Complex tensor of shape (batch, 2**num_qubits).

encode_basis

def encode_basis(data: Any, num_qubits: int) -> Any

Encode integer basis-state indices as one-hot quantum states.

Arguments:

data: One-dimensional indices or a two-dimensional column of indices in [0, 2**num_qubits - 1].
num_qubits: Number of qubits in the encoded state.

Raises:

ValueError: If indices are empty, malformed, or out of range.

Returns:

Complex one-hot tensor of shape (batch, 2**num_qubits).

encode_iqp

def encode_iqp(data: Any, num_qubits: int, *, enable_zz: bool = True) -> Any

Encode samples with the IQP feature map.

Arguments:

data: IQP parameters. With enable_zz=True, each sample must contain num_qubits + num_qubits * (num_qubits - 1) // 2 values; otherwise each sample must contain num_qubits values.
num_qubits: Number of qubits in the encoded state.
enable_zz: Include pairwise ZZ interactions when True.

Raises:

ValueError: If the parameter width does not match the IQP variant.

Returns:

Complex tensor of shape (batch, 2**num_qubits).

encode_phase

def encode_phase(data: Any, num_qubits: int) -> Any

Encode samples as equal-magnitude states with data-dependent phase.

Arguments:

data: One- or two-dimensional phase samples with exactly num_qubits values per sample.
num_qubits: Number of qubits in the encoded state.

Raises:

ValueError: If the sample width is not num_qubits.

Returns:

Complex tensor of shape (batch, 2**num_qubits).

encode

def encode(data: Any,
           num_qubits: int,
           encoding_method: str = "amplitude") -> Any

Encode input samples using a named Triton AMD encoding.

Arguments:

data: Input samples for the selected encoding method.
num_qubits: Number of qubits in the encoded state.
encoding_method: One of "amplitude", "angle", "basis", "iqp", "iqp-z", or "phase".

Raises:

RuntimeError: If the Triton AMD runtime is unavailable.
ValueError: If encoding_method is unsupported or inputs are invalid for the selected encoder.

Returns:

Complex tensor of shape (batch, 2**num_qubits).

Backend​

get_qdp​

get_torch​

get_backend​

force_backend​

require_backend​

ThroughputResult​

LatencyResult​

QdpBenchmark​

qubits​

encoding​

batches​

prefetch​

warmup​

backend​

dtype​

run_throughput​

run_latency​

QdpEngine​

__init__​

encode​

QdpTensor​

__dlpack__​

__dlpack_device__​

to_torch​

QuantumTensor​

QuantumDataLoader​

__init__​

qubits​

encoding​

batches​

source_synthetic​

source_file​

seed​

null_handling​

backend​

as_torch_dataset​

__iter__​

is_triton_amd_available​

TritonAmdEngine​

check_runtime​

encode_amplitude​

encode_angle​

encode_basis​

encode_iqp​

encode_phase​

encode​

Backend

get_qdp

get_torch

get_backend

force_backend

require_backend

ThroughputResult

LatencyResult

QdpBenchmark

qubits

encoding

batches

prefetch

warmup

backend

dtype

run_throughput

run_latency

QdpEngine

init

encode

QdpTensor

dlpack

__dlpack_device__

to_torch

QuantumTensor

QuantumDataLoader

init

qubits

encoding

batches

source_synthetic

source_file

seed

null_handling

backend

as_torch_dataset

iter

is_triton_amd_available

TritonAmdEngine

check_runtime

encode_amplitude

encode_angle

encode_basis

encode_iqp

encode_phase

encode