When you build a RAG system over sensitive data — medical records, legal documents, financial reports — every query is a potential privacy leak. The user's question reveals intent; the retrieved chunks reveal data. Privacy-preserving retrieval addresses both threats without sacrificing utility.

The Threat Model

Before choosing a technique, be precise about what you're protecting against:

  • Query privacy: Server shouldn't learn what the user searched for.
  • Content privacy: Model provider shouldn't see document contents.
  • Membership inference: Adversary shouldn't confirm whether a specific record exists in the index.
  • Reconstruction attacks: From embeddings alone, can an attacker reconstruct the original text?
⚠️

Embeddings are NOT anonymous

Research shows that ~80% of sentences can be reconstructed from their embeddings alone using inversion attacks. Never store raw embeddings of PII without additional protection.

Private Information Retrieval (PIR)

PIR lets a client retrieve a record from a database without the server learning which record was requested. The mathematical guarantees come from cryptography:

  • Computational PIR: Uses homomorphic encryption. Client encrypts the query, server computes on ciphertext, returns encrypted result. Server sees nothing.
  • ORAM (Oblivious RAM): Client accesses a re-shuffled, encrypted data store. Even access patterns are hidden.

Differential Privacy for Embeddings

Add calibrated Gaussian noise to query embeddings before sending to a retrieval service. The noise is small enough that semantically similar queries still retrieve similar results, but the exact query cannot be recovered:

Python
import numpy as np def privatise_embedding(embedding: np.ndarray, epsilon: float = 1.0) -> np.ndarray: """Add Gaussian noise calibrated to (epsilon, delta)-DP guarantee.""" sensitivity = 2.0 # L2 sensitivity of normalised embeddings delta = 1e-5 sigma = sensitivity * np.sqrt(2 * np.log(1.25 / delta)) / epsilon noise = np.random.normal(0, sigma, embedding.shape) noisy = embedding + noise return noisy / np.linalg.norm(noisy) # re-normalise query_emb = encoder.encode("Patient John Doe's last HbA1c result") private_q = privatise_embedding(query_emb, epsilon=0.5) results = vector_db.search(private_q, top_k=5)

Federated RAG

Instead of centralising documents in one vector DB, each data owner runs their own retriever locally. The orchestrator sends the query to all nodes, each returns anonymised, top-k results with confidence scores, the orchestrator merges them — no raw documents ever leave their origin.

TechniqueProtectionUtility CostComplexity
DP EmbeddingsQuery privacy~2-5% recall dropLow
Federated RAGContent privacyLatency overheadMedium
Crypto PIRFull query privacy10–100x slowerHigh
ORAMAccess patternSignificant overheadHigh