Data flows through applications to users. Service providers (e.g. EMR systems, online dating) hold sensitive data. An adversary may compromise the provider. Encryption protects data, but can we protect data and let the application work (e.g. search, sort)?
| Type | Characteristic |
|---|---|
| Homomorphic | Computation on encrypted data matches plaintext result |
| Property Preserving | Encrypted data in same order as plaintext |
| Searchable | Search using encrypted keywords |
| Secure Computation | Parties compute with private inputs |
| Functional | Secret key reveals function being computed |
PPE is widely deployed (e.g. Microsoft Always Encrypted). It allows SQL queries on encrypted columns without decrypting on the server.
No change to application or database servers
Supports most common SQL queries (equality, range, ORDER BY)
~25% overhead
Leaks nothing except size. Cannot perform equality or range queries on ciphertext.
Preserves equality. Leaks equality and frequency - same plaintext → same ciphertext; attacker sees frequency histogram.
Preserves order. Leaks order and frequency - plaintext order = ciphertext order.
PPE trades security for functionality. Leaked information enables inference attacks.
Electronic Medical Records (EMR) study: 200 hospitals, 12,975 to 121,664 patients. Attributes include sex, race, age, mortality risk, disease severity, length of stay, etc. Inference attack: encrypted data + auxiliary data (e.g. public demographics) → recover plaintext.
Patient-sensitive: Sex, race, age, admission month, payer, disease, etc. Hospital-sensitive: Patient died, length of stay, mortality risk, disease severity, diagnostic category.
Match ciphertext frequency histogram to auxiliary plaintext histogram. Ciphertext with frequency 10 maps to plaintext with frequency 11 (closest match). Classical technique (e.g. Al-Kindi 9th century).
Find assignment from ciphertext to plaintext that minimizes cost (e.g. histogram distance). Reduces to Linear Sum Assignment Problem (LSAP) - solved in O(n³) with Hungarian algorithm. Better recovery than simple frequency matching.
Sex: 100% patients, 95% hospitals. Patient Died: 100% patients, 100% hospitals. Mortality Risk: 100% patients, 99% hospitals. Age: 10% patients, 85% hospitals. Length of Stay: 83% patients, 50% hospitals. Major Diagnostic Category: 40% patients, 28% hospitals.
Uses empirical CDF (cumulative distribution). Ciphertext greater than 90% of values → match to plaintext greater than ~90% of auxiliary data. Combines order and frequency; solves LSAP for best mapping. More effective than sorting attack alone.
Disease Severity, Admission Month, Mortality Risk, Length of Stay: 100% patients, 100% hospitals. Age: 83% patients, 99% hospitals. Admission Type: 79% patients, 100% hospitals.
Equality-preserving: frequency-based attacks. Order-preserving: cumulative attack (order + frequency). Both rely on auxiliary data (public histograms, demographics). Confidentiality of many EMR attributes is compromised.
Even with encryption, access patterns leak information (which files, when). If we don't trust the cloud, we need to hide access patterns.
For any fixed-size request sequence, the storage accesses observed by the cloud are statistically independent of the actual requests. Cloud cannot infer what data is being accessed.
Application: get(key1), put(key1, val1),
get(key2). ORAM Client translates to multiple
download(object32),
upload(object65, data19), etc. Each logical access →
multiple cloud accesses; both read and write per access.
| System | Approach |
|---|---|
| PathORAM | Tree-based (CCS 2013) |
| LayeredORAM | Layered (CCSW 2011) |
| PracticalOS | Large messages (CODASPY 2012) |
| ObliviStore | Partition-based (S&P 2013) |
Client must have private randomness for dummy access patterns. Data must be encrypted. Each remote storage access typically includes both read and write to hide whether the application read or wrote.
Naveed, Kamara, Wright - lp-optimization and cumulative attacks on equality- and order-preserving encryption.