arxiv:2606.00487

TAPS: Target-Aware Prefix Tree Selection for Diffusion-Drafted Speculative Decoding

Published on May 30

Authors:

Abstract

Diffusion models enable parallel token prediction for speculative decoding, but existing methods suffer from verification bottlenecks; TAPS addresses this by using target-aware prefix selection to improve acceptance-cost tradeoff in draft tree verification.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Using a diffusion model for parallel drafting is a promising approach for speculative decoding. By predicting tokens at multiple future positions in a single forward pass, diffusion drafters substantially reduce drafting latency. However, this shifts the bottleneck to verification: verifying a single sequence limits acceptance length, while verifying large draft trees incurs excessive target-model latency. We identify a key mismatch in existing draft-tree methods: existing diffusion-tree methods rank nodes by the marginal probability, ignoring that verification is prefix-conditioned. As a result, they may verify unreachable descendants of rejected prefixes, increasing latency with limited acceptance gains. To address this, we propose TAPS, a target-aware prefix selection method that turns diffusion marginals into path-conditioned acceptance estimates. TAPS then selects a compact prefix-closed subtree under a fixed verification budget, improving the acceptance-cost tradeoff rather than simply expanding the draft tree. Experiments across diverse datasets and model families demonstrate that TAPS achieves up to 7.9x lossless end-to-end speedup over vanilla autoregressive decoding, outperforming state-of-the-art DFlash and DDTree by 1.36x and 1.74x respectively. Our work is available at https://anonymous.4open.science/r/TAPS-EMNLP2026-53DD

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.00487

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.00487 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.00487 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.00487 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.