A differentially-private framework that exploits workload predictions to dramatically reduce error on streaming linear queries — without weakening (ε, δ) guarantees.
Modern database workloads are highly predictable — recurring jobs, dashboards, templated drill-downs. Yet most online differentially-private mechanisms treat each query as if it arrived from nowhere. LAPRAS asks a sharper question: given an oracle hint about which queries are likely to arrive, can we spend privacy budget like the offline Matrix Mechanism — and still degrade gracefully when the oracle is wrong?
Online DP mechanisms add independent noise per query, so error scales linearly with stream length. The optimal Matrix Mechanism sits offline, out of reach, because the workload isn't known in advance.
An oracle predicts a likely query set P. LAPRAS runs the Matrix Mechanism on P offline, caches its low-noise releases, and adapts to the unpredicted remainder online from a residual budget.
When the oracle is accurate (high overlap), LAPRAS approaches near-offline-optimal utility — over an order of magnitude lower MAE than online baselines on Adult and Gowalla.
When the oracle is wrong (low overlap), LAPRAS stays comparable to standard online Independent-Noise mechanisms. No catastrophic failure when predictions don't materialize.
An online stopping-time estimator from T = ceil(log2 S)^2 bad-query positions
continuously recalibrates per-query spend — without violating composition.
Step through the algorithm — from oracle prediction to adaptive online streaming.
LAPRAS treats the oracle as a black box — any predictor is admissible. Quality is summarized by prediction overlap ρ: the fraction of stream queries also in P. Queries in P ∩ S are good (G); queries in S \ P are bad (B = S − G). We assume the realized stream is a uniformly random permutation — the load-bearing assumption behind the unbiased stopping-time estimator.
Before the stream begins, LAPRAS solves the Matrix-Mechanism SDP (Li, Hay, Rastogi, Miklau & McGregor, 2010) on P with budget εMM, producing a low-noise correlated release. When a good query later arrives, it is answered from the cache — at zero additional privacy cost by post-processing.
Queries arrive one by one. Good = predicted (free via the cache). Bad = unpredicted (consumes residual budget).
The challenge: the residual budget ε − εMM must cover an unknown number of bad queries. LAPRAS exploits the random-order assumption to estimate that count online (see the next panel and the interactive sandbox below).
Practical guidance. Use query-heavy when ρ ≤ 0.5 (more budget for surprises) and matrix-heavy when ρ ≥ 0.8 (the MM investment dominates). Smooth merges the three online pools, allowing per-query ε to be derived from a single running estimator rather than four pre-committed slices.
B-hat = S * (T - 1) / (n - 1) at the T-th bad arrival.
eps_pool / (B-hat_rem + 1), updated after every bad query.
Both methods converge identically at ρ = 1.0 (no bad queries). The crossover at ρ ≈ 0.5 holds across all four budget strategies — a remarkable invariant we explore in the experiments below.
The novel theoretical contribution of LAPRAS: an unbiased online estimator
B-hat = S * (T - 1) / (n - 1) of the unknown total bad-query count B,
formed from the position n at which the T-th bad query arrives,
with T = ceil(log2 S)^2. Adjust the controls below to see it lock and converge.
B-hat = S * (T - 1) / (n - 1) locks. Static commits to that estimate for the
rest of the stream; Smooth keeps recalibrating B-hat_rem after every subsequent
bad query, redistributing the residual budget on the fly.
Evaluated on Adult, Gowalla, and IDEBench/Flights workloads (ε = 1.0, δ ∈ {10⁻⁵, 10⁻¹⁰}, 5 runs per setting, error bars = ±1σ). Baselines: Offline MM (knows the full stream — theoretical optimum), Online IndepNoise (independent Gaussian noise per query).
~186s @ P=50 → ~875s @ P=500.
Pick P just large enough to cover expected good queries; further padding burns CPU without
significant utility return.
| ρ | LAPRAS Static · NMAE | LAPRAS Smooth · NMAE | Offline MM · NMAE | CacheDP · NMAE |
|---|---|---|---|---|
| 0.0 | 0.0019 | 0.0026 | 0.0002 | 0.2076 |
| 0.5 | 0.0007 | 0.0011 | 0.0001 | 0.1242 |
| 1.0 | 0.0014 | 0.0014 | 0.0002 | 0.0588 |
CacheDP splits ε across many independent releases, accumulating noise per query; LAPRAS jointly optimizes the predicted-set workload via the Matrix Mechanism, achieving roughly two orders of magnitude lower normalized error at the same global budget.