
This is the fourth part of a series on searching on encrypted data. See parts 1, 2 and 3.
In the previous posts we covered two different ways to search on encrypted data. The first was based on property-preserving encryption (in particular, on deterministic encryption), achieved sub-linear search time but had weak security properties. The second was based on functional encryption, achieved linear search time but provided stronger security guarantees.
We’ll now see another approach that achieves the strongest possible levels of security! But first, we need to discuss what we mean by security.
Security
So far, I have discussed the security of the encrypted search solutions informally—mostly providing intuition and describing possible attacks. This is partly because I’d like this blog to remain comprehensible to readers who are not cryptographers but also because formally defining the security properties of encrypted search is a bit messy.
So, which security properties should we expect from an encrypted search solution? What about the following:
- the encrypted database
generated by the scheme should not leak any information about the database
of the user;
- the tokens
generated by the user should not leak any information about the underlying search term
to the server.
This sounds reasonable but there are several issues. First, this intuition is not precise enough to be meaningful. What I mean is that there are many details that impact security that are not taken into account in this high-level intuition (e.g., what does it mean not to leak, how are the search terms chosen exactly). This is why cryptographers are so pedantic about security definitions—the details really do matter.
Putting aside the issue of formality, another problem with this intuition is that it says nothing about the search results. More precisely, it does not specify whether it is appropriate or not for an encrypted search solution to reveal to the server which encrypted documents match the search term. We usually refer to this information as the client’s access pattern and for concreteness you can think of it as the (matching) encrypted documents’ identifiers or their locations in memory. All we really need as an identifier is some per-document unique string that is independent of the contents of the document and of the keywords associated with it.
So the question is: Is it appropriate to reveal the access pattern? There are two possible answers to this question. On one hand, we could argue that it is fine to reveal the access pattern since the whole point of using encrypted search is so that the server can return the encrypted documents that match the query. And if we expect the server to return those encrypted documents then it clearly has to know which ones to return (though it does not necessarily need to know the contents).
On the other hand, one could argue that, in theory, the access pattern reveals some information to the server. In fact, by observing enough search results the server could use some sophisticated statistical attack to infer something about the client’s queries and data. Note that such attacks are not completely theoretical and in a future post we’ll discuss work that tries to make them practical. Furthermore, the argument that the server needs to know which encrypted documents match the query in order to return the desired documents is not technically true. In fact, we know how to design cryptographic protocols that allow one party to send items to another without knowing which item it is sending (see, e.g., private information retrieval and oblivious transfer).
Similarly, we know how to design systems that allow us to read and write to memory without the memory device knowing which locations are being accessed. The latter are called oblivious RAMs (ORAM) and we could use them to search on encrypted data without revealing the access pattern to the server. The issue, of course, is that using ORAM will slow things down.
So really the answer to our question depends on what kind of tradeoff we are willing to make between efficiency and security. If efficiency is the priority, then revealing the access pattern might not be too much to give up in terms of security for certain applications. On the other hand, if we can tolerate some inefficiency, then it’s always best to be conservative and not reveal anything if possible.
In the rest of this post we’ll explore ORAMs, see how to construct one and how to use it to search on encrypted data.
Oblivious RAM
ORAM was first proposed in a paper by Goldreich and Ostrovsky [GO96] (the link is actually Ostrovsky’s thesis which has the same content as the journal paper) on software protection. That work turned out to be really ahead of its time as several ideas explored in it turned out to be related to more modern topics like cloud storage.
An ORAM scheme
consists of:
Oblivious RAM via FHE
The simplest way to design an ORAM is to use fully-homomorphic encryption (FHE). For an overview of FHE see my previous posts here and here.
Suppose we have an FHE scheme
. Then we can easily construct an ORAM as follows [1]:
The security properties of FHE will guarantee that
leaks no information about
to the server and that the
and
protocols reveal no information about the index and values either.
The obvious downside of this FHE-based ORAM is efficiency. Let’s forget for a second that FHE is not practical yet and let’s suppose we had a very fast FHE scheme. This ORAM would still be too slow simply because the homomorphic evaluation steps in the
and
protocols require
time, i.e., time linear in the size of the memory. Again, assuming we had a super-fast FHE scheme, this would only be usable for small memories.
Oblivious RAM via Symmetric Encryption
Fortunately, we also know how to design ORAMs using standard encryption schemes and, in particular, using symmetric encryption like AES. ORAM is a very active area of research and we now have many constructions, optimizations and even implementations (e.g., see Emil Stefanov’s implementation). Because research is moving so fast, however, there really isn’t a good overview of the state-of-the-art. Since ORAMs are fairly complicated, I’ll describe here the simplest (non-FHE-based) construction which is due to Goldreich and Ostrovsky [GO96]. This particular ORAM construction is known as the Square-Root solution and it requires just a symmetric encryption scheme
, and a pseudo-random function
that maps
bits to
bits.
Setup. To setup the ORAM, the client generates two secret keys
and
for the symmetric encryption scheme and for the pseudo-random function
, respectively. It then augments each item in
by appending its address and a random tag to it. We’ll refer to the address embedded with the item as its virtual address. More precisely, it creates a new memory
such that for all
,
![\displaystyle \mathrm{RAM}_2[i] = \big\langle\mathrm{RAM}[i], i, \mathrm{tag}_i \big\rangle, \displaystyle \mathrm{RAM}_2[i] = \big\langle\mathrm{RAM}[i], i, \mathrm{tag}_i \big\rangle,](http://s0.wp.com/latex.php?latex=%5Cdisplaystyle++%5Cmathrm%7BRAM%7D_2%5Bi%5D+%3D+%5Cbig%5Clangle%5Cmathrm%7BRAM%7D%5Bi%5D%2C+i%2C+%5Cmathrm%7Btag%7D_i+%5Cbig%5Crangle%2C+&bg=ffffff&fg=000000&s=0)
where
denotes concatenation and
. It then adds
dummy items to
, i.e., it creates a new memory
such that for all
,
and such that for all
,
![\displaystyle \mathrm{RAM}_3[i] = \big\langle 0, \infty_1, \mathrm{tag}_i \big\rangle, \displaystyle \mathrm{RAM}_3[i] = \big\langle 0, \infty_1, \mathrm{tag}_i \big\rangle,](http://s0.wp.com/latex.php?latex=%5Cdisplaystyle++%5Cmathrm%7BRAM%7D_3%5Bi%5D+%3D+%5Cbig%5Clangle+0%2C+%5Cinfty_1%2C+%5Cmathrm%7Btag%7D_i+%5Cbig%5Crangle%2C+&bg=ffffff&fg=000000&s=0)
where
is some number larger than
. It then sorts
around according to the tags. Notice that the effect of this sorting will be to permute
since the tags are (pseudo-)random. It then encrypts each item in
using
. In other words, it generates a new memory
such that, for all
,
![\displaystyle \mathrm{RAM}_4[i] = \mathrm{Enc}_{K_1}(\mathrm{RAM}_3[i]). \displaystyle \mathrm{RAM}_4[i] = \mathrm{Enc}_{K_1}(\mathrm{RAM}_3[i]).](http://s0.wp.com/latex.php?latex=%5Cdisplaystyle++%5Cmathrm%7BRAM%7D_4%5Bi%5D+%3D+%5Cmathrm%7BEnc%7D_%7BK_1%7D%28%5Cmathrm%7BRAM%7D_3%5Bi%5D%29.+&bg=ffffff&fg=000000&s=0)
Finally, it appends
elements to
each of which contains an
encryption of
under key
. Needless to say, all the ciphertexts generated in this process need to be of the same size so the items need to be padded appropriately. The result of this, i.e., the combination of
and the encryptions of
, is the oblivious memory
which is sent to the server. It will be useful for us to distinguish between the two parts of
so we’ll refer to the second part (i.e., the encryptions of
) as the cache.
Read & write. Now we’ll see how to read and write to
obliviously, i.e., without the server knowing which memory locations we’re accessing. First we have to define two basic operations:
and
.
The
operation takes an index
as input and works as follows:
- the client requests from the server the item at virtual addres
in
. To do this it first re-generates the item’s tag
. It then does an (interactive) binary search to find the item with virtual address
. In other words, it asks the server for the item stored at location
(let ‘s assume
is even) decrypts it and compares its tag with
. If
is less than the tag of item
, then it asks for the item at location
; else it asks for the item at location
; and so on.
- it decrypts the item with
to recover
,
- it then re-encrypts
(using new randomness) and asks the server to store it back where it was found.
The
operation takes an index
and a value
as inputs and works as follows:
- the client requests from the server the item with
(as above);
- it then encrypts
and asks the server to store it back at the location where the previous item (i.e., the one with
) was found.
Notice that from the server’s point of view the two operations look the same. In other words, the server cannot tell whether the client is executing a
or a
operation since in either case all it sees is a binary search followed by a request to store a new ciphertext at the same location.
Now suppose for a second that
only consisted of
. If that were the case then
would be one-time oblivious in the sense that we could use it to read or write only once by executing either a
or a
operation. Why is this the case? Remember that we randomly permuted and encrypted our memory before sending it to the server. This means that asking the server for the item at location
reveals nothing about that item’s real/virtual address
. Furthermore, the binary search we do when looking for the item with virtual address
depends only
which is random and therefore reveals nothing about
.
Of course, this only works once because if we want to access
again then we’ll ask the server for the same location which immediately tells it something: namely, that we asked for the same thing twice.
So how do we hide the fact that we’re asking for the same thing twice? This is really the core difficulty in designing ORAMs and this is where the cache will come in.
We start by initializing a counter
. To read location
we execute the following
protocol:
- We
the entire cache. In other words, we execute
for all

- If any of the
operations above result in the
th item (i.e., if we get an item with virtual address
) then we
a dummy item by executing
. Also, we set
to be the item we found in the cache and
to be the cache location where we found it.
- If none of the
operations above resulted in the
th item, we execute a modified
and set
to be the result and
. The modified version of
works like a regular
operation, except that we update the item’s virtual address to
, where
. In other words, we store an encryption of
back where we found it. This will be useful for us later when we’ll need to re-structure
.
- We then process the entire cache again but slightly differently than before (we do this so that we can store the item in the cache for future accesses). In particular, for all
,
- if
we execute a
operation
- if
we execute a
.
- We increase
by
.
The first thing to notice is that this is correct in the sense that by executing this operation the client will indeed receive
.
The more interesting question, however, is why is this oblivious and, in particular, why is this more than one-time oblivious? To see why this is oblivious it helps to think of things from the server’s perspective and see why its view of the execution is independent of (i.e., not affected by)
.
First, no matter what
the client is looking for, it always
s the entire cache so Step
reveals no information about
to the server. We then have two possible cases:
- If the
th item is in the cache (at location
), we
a dummy item; and
the
th item at location
while we re-process the entire cache (in Step
).
- If the
th item is not in the cache, we
the
th item and
it in the next open location in the cache while we re-process the entire cache.
In either case, the server sees the same thing: a
for an item at some location between
and
and a sequence of
operations for all addresses in the cache, i.e., between
and
. Recall that the server cannot distinguish between
and
operations. The
protocol is similar to the
protocol. The only difference is that in Step
, we set
if the
th item is in the cache and in Step
we execute
and set
. Notice, however, that the
protocol can introduce inconsistencies between the cache and
. More precisely, if the item has been accessed before (say, due to a
operation), then a
operation will update the cache but not the item in
. This is OK, however, as it will be taken care of in the re-structuring step, which we’ll describe below.
So we can now read and write to memory without revealing which location we’re accessing and we can do this more than once! The problem, however, is that we can do it at most
times because after that the cache is full so we have to stop.
Re-structuring. So what if we want to do more than
reads? In that case we need to re-structure our ORAM. By this, I mean that we have to re-encrypt and re-permute all the items in
and reset our counter
to
.
If the client has enough space to store
locally then the easiest thing to do is just to download
, decrypt it locally to recover
, update it (in case there were any inconsistencies) and setup a new ORAM from scratch.
If, on the other hand, the client does not have enough local storage then the problem becomes harder. Here we’ll assume the client only has
storage so it can store, e.g., only two items.
Recall that in order to re-structure
, the client needs to re-permute
and re-encrypt everything obliviously while using only
space. Also, the client needs to do this in a way that updates the elements that are in an inconsistent state due to
operations. The key to doing all this will be to figure out a way for the client to sort elements obliviously while using
space. Once we can obliviously sort, the rest will follow relatively easily.
To do this, Goldreich and Ostrovsky proposed to use a sorting network like Batcher’s Bitonic network. Think of a sorting network as a circuit composed of comparison gates. The gates take two inputs
and
and output the pair
if
and the pair
if
. Given a set of input values, the sorting network outputs the items in sorted order. Sorting networks have two interesting properties:
the comparisons they perform are independent of the input sequence; and
each gate in the network is a binary operation (i.e., takes only two inputs). Of course, there is an overhead to sorting obviously so Batcher’s network requires
work as opposed to the traditional
for sorting.
So to obliviously sort a set of ciphertexts
stored at the server, the client will start executing the sorting network and whenever it reaches a comparison gate between the
th and
th item, it will just request the
th and
th ciphertexts, decrypt them, compare them, and store them back re-encrypted in the appropriate order. Note that by the first property above, the client’s access pattern reveals nothing to the server; and by the second property the client will never need to store more than two items at the same time.
Now that we can sort obliviously, let’s see how to re-structure the ORAM. We will do it in two phases. In the first phase, we sort all the items in
according to their virtual addresses. This is how we will get rid of inconsistencies. Remember that the items in
are augmented to have the form
for real items and
for dummy items. It follows that all items in the cache have the first form since they are either copies or updates of real items put there during
and
operations.
So we just execute the sorting network and, for each comparison gate, retrieve the appropriate items, decrypt them, compare their virtual addresses and return them re-encrypted in the appropriate order. The result of this process is that
will now have the following form:
- the first
items will consist of the most recent versions of the real items, i.e., all the items with virtual addresses other than
and
;
- the next
items will consist of dummy items, i.e., all items with virtual address
.
- the final
items will consist of the old/inconsistent versions of the real items, i.e., all items with virtual address
(remember that in Step
of
and
we executed a modified
that updated the item’s virtual address to
).
In the second phase, we randomly permute and re-encrypt the first
items of
. We first choose a new key
for
. We then access each item from location
to
and update their tags to
. Once we’ve updated the tags, we sort all the items according to their tags. The result will be a new random permutation of items. Note that we don’t technically have to do this in two passes; but it’s easier to explain this way.
At this point, we’re done!
is as good as new and we can start accessing it again safely.
Efficiency. So what is the efficiency of the Square-Root solution? Setup is
:
to construct the real, dummy and cache items and
to permute everything through sorting.
Each access operation (i.e.,
or
) is
:
total get/put operations to get the cache twice and
for each get/put operation due to binary search.
Restructuring is
:
to sort by virtual address and
to sort by tag. Restructuring, however, only occurs once every
accesses. Because of this, we usually average the cost of re-structuring over the number read/write operations supported to give an amortized access cost. In our case, the amortized access cost is then

which is
.
ORAM-Based Encrypted Search
So now that we know how to build an ORAM, we’ll see how to use it for encrypted search. There are two possible ways to do this.
A naive approach. The first is for the client to just dump all the
documents
in an array
, setup an ORAM
and send
to the server. To search, the client can just simulate a sequential search algorithm via the
protocol; that is, replace every read operation of the search algorithm with an execution of the
protocol. To update the documents the client can similarly simulate an update algorithm using the
protocol.
This will obviously be slow. Let’ s assume all the documents have bit-length
and that
has a block size of
bits. The document collection will then fit in (approximately)
blocks. The sequential scan algorithm is itself
, but on top of that we’ll have to execute an entire
protocol for every address of memory read.
Remember that if we’re using the Square-Root solution as our ORAM then the
protocol requires
amortized work. So in total, search would be
which would not scale. Now imagine for a second if we were using the FHE-based ORAM described above which requires
work for each
and
. In this scenario, a single search would take
time!
A better approach. [2] A better idea is for the client to build two arrays
and
[3]. In
it will store a data structure that supports fast searches on the document collection (e.g., an inverted index) and in
it will store the documents
themselves. It then builds and sends
and
to the server. To search, the client simulates a query to the data structure in
via the
protocol (i.e., it replaces each read operation in the data structure’s query algorithm with an execution of
) [4]. From this, the client will recover the identifiers of the documents that contain the keyword and with this information it can just read those documents from
.
Now suppose there are
documents that contain the keyword and that we’re using an optimal-time data structure (i.e., a structure with a query algorithm that runs in
time like an inverted index). Also, assume that the data structure fits in
blocks of
bits and that the data collection fits in
blocks.
Again, if we were using the Square-Root solution for our ORAMs, then the first step would take
time and the second step will take

In practice, the size of a fast data structure for keyword search can be large. A very conservative estimate for an inverted index, for example, would be that it is roughly the size of the data collection [5]. Setting
, the total search time would be

which is
(since
) compared to the previous approach’ s
.
In cases where the search term appears in
documents, this can be a substantial improvement.
Is This Practical?
If one were to only look at the asymptotics, one might conclude that the two-RAM solution described above might be reasonably efficient. After all it would take at least
time just to retrieve the matching files from (unencrypted) memory so the two-RAM solution adds just a
multiplicative factor over the minimum retrieval time.
Also there are much more efficient ORAM constructions than the Square-Root solution. In fact, in their paper, Goldreich and Ostrovsky also proposed the Hierarchichal solution which achieves
amortized access cost. Goodrich and Mitzenmacher [GM11] gave a solution with
amortized access cost and, recently, Kushilevitz, Lu and Ostrovsky [KLO12] a solution with
amortized cost (and there are even more recent papers that improve on this under certain conditions). There are also works that tradeoff client storage for access efficiency. For example, Williams, Sion and Carbunar [WSC08] propose a solution with
amortized access cost and
client storage while Stefanov, Shi and Song [SSS12] propose a solution with
amortized overhead for clients that have
local storage, where the underlying constant is very small. There is also a line of work that tries to de-amortize ORAM in the sense that it splits the re-structuring operation so that it happens progressively over each access. This was first considered by Ostrovsky and Shoup in [OS97] and was further studied by Goodrich, Mitzenmacher, Ohrimenko, Tamassia [GMOT11] and by Shi, Chan, Stefanov and Li [SSSL11].
All in all this may not seem that bad and, intuitively, the two-RAM solution might actually be reasonably practical for small to moderate-scale data collections—especially considering all the recent improvements in efficiency that have been proposed. For large- or massive-scale collections, however, I’d be surprised [6].
Conclusions
In this post we went over the ORAM-based solution to encrypted search which provides the most secure solution to our problem since it hides everything—even the access pattern!
In the next post we’ll cover an approach that tries to strike a balance between efficiency and security. In particular, this solution is as efficient as the deterministic-encryption-based solution while being only slightly less secure than the ORAM-based solution.
Notes
[1] I haven’t seen this construction written down anywhere. It’s fairly obvious, however, so I suspect it’s been mentioned somewhere. If anyone knows of a reference, please let me know. Also, as Jon Katz points out in the comments, this approach requires the server to compute whereas traditional ORAM constructions do not. Some recent works have started to distinguish these variants of ORAM (see comments for more).
[2] Like the FHE-based ORAM, I have not seen this construction written down anywhere so if anyone knows of a reference, please let me know.
[3] Of course, the following could be done using a single RAM, but splitting into two makes things easier to explain.
[4] Note that this will reveal to the server some information. If we’re using an inverted index as the underlying search structure, it will reveal the number of documents that contain the keyword (see the comments for a brief discussion). The natural way to address this, of course, is to use an inverted index with lists that are padded to the maximum size.
[5] In practice, this would not be the case and, in addition, we could make use of index compression techniques.
[6] I won’t attempt to draw exact lines between what’s small-, moderate- and large-scale since I think that’s a question best answered by experimental results.
