This is the fourth part of a series on searching on encrypted data. See parts 1, 2 and 3.
In the previous posts we covered two different ways to search on encrypted data. The first was based on property-preserving encryption (in particular, on deterministic encryption), achieved sub-linear search time but had weak security properties. The second was based on functional encryption, achieved linear search time but provided stronger security guarantees.
We’ll now see another approach that achieves the strongest possible levels of security! But first, we need to discuss what we mean by security.
So far, I have discussed the security of the encrypted search solutions informally—mostly providing intuition and describing possible attacks. This is partly because I’d like this blog to remain comprehensible to readers who are not cryptographers but also because formally defining the security properties of encrypted search is a bit messy.
So, which security properties should we expect from an encrypted search solution? What about the following:
- the encrypted database generated by the scheme should not leak any information about the database of the user;
- the tokens generated by the user should not leak any information about the underlying search term to the server.
This sounds reasonable but there are several issues. First, this intuition is not precise enough to be meaningful. What I mean is that there are many details that impact security that are not taken into account in this high-level intuition (e.g., what does it mean not to leak, how are the search terms chosen exactly). This is why cryptographers are so pedantic about security definitions—the details really do matter.
Putting aside the issue of formality, another problem with this intuition is that it says nothing about the search results. More precisely, it does not specify whether it is appropriate or not for an encrypted search solution to reveal to the server which encrypted documents match the search term. We usually refer to this information as the client’s access pattern and for concreteness you can think of it as the (matching) encrypted documents’ identifiers or their locations in memory. All we really need as an identifier is some per-document unique string that is independent of the contents of the document and of the keywords associated with it.
So the question is: Is it appropriate to reveal the access pattern? There are two possible answers to this question. On one hand, we could argue that it is fine to reveal the access pattern since the whole point of using encrypted search is so that the server can return the encrypted documents that match the query. And if we expect the server to return those encrypted documents then it clearly has to know which ones to return (though it does not necessarily need to know the contents).
On the other hand, one could argue that, in theory, the access pattern reveals some information to the server. In fact, by observing enough search results the server could use some sophisticated statistical attack to infer something about the client’s queries and data. Note that such attacks are not completely theoretical and in a future post we’ll discuss work that tries to make them practical. Furthermore, the argument that the server needs to know which encrypted documents match the query in order to return the desired documents is not technically true. In fact, we know how to design cryptographic protocols that allow one party to send items to another without knowing which item it is sending (see, e.g., private information retrieval and oblivious transfer).
Similarly, we know how to design systems that allow us to read and write to memory without the memory device knowing which locations are being accessed. The latter are called oblivious RAMs (ORAM) and we could use them to search on encrypted data without revealing the access pattern to the server. The issue, of course, is that using ORAM will slow things down.
So really the answer to our question depends on what kind of tradeoff we are willing to make between efficiency and security. If efficiency is the priority, then revealing the access pattern might not be too much to give up in terms of security for certain applications. On the other hand, if we can tolerate some inefficiency, then it’s always best to be conservative and not reveal anything if possible.
In the rest of this post we’ll explore ORAMs, see how to construct one and how to use it to search on encrypted data.
ORAM was first proposed in a paper by Goldreich and Ostrovsky [GO96] (the link is actually Ostrovsky’s thesis which has the same content as the journal paper) on software protection. That work turned out to be really ahead of its time as several ideas explored in it turned out to be related to more modern topics like cloud storage.
An ORAM scheme consists of:
Oblivious RAM via FHE
The simplest way to design an ORAM is to use fully-homomorphic encryption (FHE). For an overview of FHE see my previous posts here and here.
Suppose we have an FHE scheme . Then we can easily construct an ORAM as follows :
The security properties of FHE will guarantee that leaks no information about to the server and that the and protocols reveal no information about the index and values either.
The obvious downside of this FHE-based ORAM is efficiency. Let’s forget for a second that FHE is not practical yet and let’s suppose we had a very fast FHE scheme. This ORAM would still be too slow simply because the homomorphic evaluation steps in the and protocols require time, i.e., time linear in the size of the memory. Again, assuming we had a super-fast FHE scheme, this would only be usable for small memories.
Oblivious RAM via Symmetric Encryption
Fortunately, we also know how to design ORAMs using standard encryption schemes and, in particular, using symmetric encryption like AES. ORAM is a very active area of research and we now have many constructions, optimizations and even implementations (e.g., see Emil Stefanov’s implementation). Because research is moving so fast, however, there really isn’t a good overview of the state-of-the-art. Since ORAMs are fairly complicated, I’ll describe here the simplest (non-FHE-based) construction which is due to Goldreich and Ostrovsky [GO96]. This particular ORAM construction is known as the Square-Root solution and it requires just a symmetric encryption scheme , and a pseudo-random function that maps bits to bits.
Setup. To setup the ORAM, the client generates two secret keys and for the symmetric encryption scheme and for the pseudo-random function , respectively. It then augments each item in by appending its address and a random tag to it. We’ll refer to the address embedded with the item as its virtual address. More precisely, it creates a new memory such that for all ,
where denotes concatenation and . It then adds dummy items to , i.e., it creates a new memory such that for all , and such that for all ,
where is some number larger than . It then sorts around according to the tags. Notice that the effect of this sorting will be to permute since the tags are (pseudo-)random. It then encrypts each item in using . In other words, it generates a new memory such that, for all ,
Finally, it appends elements to each of which contains an encryption of under key . Needless to say, all the ciphertexts generated in this process need to be of the same size so the items need to be padded appropriately. The result of this, i.e., the combination of and the encryptions of , is the oblivious memory which is sent to the server. It will be useful for us to distinguish between the two parts of so we’ll refer to the second part (i.e., the encryptions of ) as the cache.
Read & write. Now we’ll see how to read and write to obliviously, i.e., without the server knowing which memory locations we’re accessing. First we have to define two basic operations: and .
The operation takes an index as input and works as follows:
- the client requests from the server the item at virtual addres in . To do this it first re-generates the item’s tag . It then does an (interactive) binary search to find the item with virtual address . In other words, it asks the server for the item stored at location (let ‘s assume is even) decrypts it and compares its tag with . If is less than the tag of item , then it asks for the item at location ; else it asks for the item at location ; and so on.
- it decrypts the item with to recover ,
- it then re-encrypts (using new randomness) and asks the server to store it back where it was found.
The operation takes an index and a value as inputs and works as follows:
- the client requests from the server the item with (as above);
- it then encrypts and asks the server to store it back at the location where the previous item (i.e., the one with ) was found.
Notice that from the server’s point of view the two operations look the same. In other words, the server cannot tell whether the client is executing a or a operation since in either case all it sees is a binary search followed by a request to store a new ciphertext at the same location.
Now suppose for a second that only consisted of . If that were the case then would be one-time oblivious in the sense that we could use it to read or write only once by executing either a or a operation. Why is this the case? Remember that we randomly permuted and encrypted our memory before sending it to the server. This means that asking the server for the item at location reveals nothing about that item’s real/virtual address . Furthermore, the binary search we do when looking for the item with virtual address depends only which is random and therefore reveals nothing about .
Of course, this only works once because if we want to access again then we’ll ask the server for the same location which immediately tells it something: namely, that we asked for the same thing twice.
So how do we hide the fact that we’re asking for the same thing twice? This is really the core difficulty in designing ORAMs and this is where the cache will come in.
We start by initializing a counter . To read location we execute the following protocol:
- We the entire cache. In other words, we execute for all
- If any of the operations above result in the th item (i.e., if we get an item with virtual address ) then we a dummy item by executing . Also, we set to be the item we found in the cache and to be the cache location where we found it.
- If none of the operations above resulted in the th item, we execute a modified and set to be the result and . The modified version of works like a regular operation, except that we update the item’s virtual address to , where . In other words, we store an encryption of back where we found it. This will be useful for us later when we’ll need to re-structure .
- We then process the entire cache again but slightly differently than before (we do this so that we can store the item in the cache for future accesses). In particular, for all ,
- if we execute a operation
- if we execute a .
- We increase by .
The first thing to notice is that this is correct in the sense that by executing this operation the client will indeed receive .
The more interesting question, however, is why is this oblivious and, in particular, why is this more than one-time oblivious? To see why this is oblivious it helps to think of things from the server’s perspective and see why its view of the execution is independent of (i.e., not affected by) .
First, no matter what the client is looking for, it always s the entire cache so Step reveals no information about to the server. We then have two possible cases:
- If the th item is in the cache (at location ), we a dummy item; and the th item at location while we re-process the entire cache (in Step ).
- If the th item is not in the cache, we the th item and it in the next open location in the cache while we re-process the entire cache.
In either case, the server sees the same thing: a for an item at some location between and and a sequence of operations for all addresses in the cache, i.e., between and . Recall that the server cannot distinguish between and operations. The protocol is similar to the protocol. The only difference is that in Step , we set if the th item is in the cache and in Step we execute and set . Notice, however, that the protocol can introduce inconsistencies between the cache and . More precisely, if the item has been accessed before (say, due to a operation), then a operation will update the cache but not the item in . This is OK, however, as it will be taken care of in the re-structuring step, which we’ll describe below.
So we can now read and write to memory without revealing which location we’re accessing and we can do this more than once! The problem, however, is that we can do it at most times because after that the cache is full so we have to stop.
Re-structuring. So what if we want to do more than reads? In that case we need to re-structure our ORAM. By this, I mean that we have to re-encrypt and re-permute all the items in and reset our counter to .
If the client has enough space to store locally then the easiest thing to do is just to download , decrypt it locally to recover , update it (in case there were any inconsistencies) and setup a new ORAM from scratch.
If, on the other hand, the client does not have enough local storage then the problem becomes harder. Here we’ll assume the client only has storage so it can store, e.g., only two items.
Recall that in order to re-structure , the client needs to re-permute and re-encrypt everything obliviously while using only space. Also, the client needs to do this in a way that updates the elements that are in an inconsistent state due to operations. The key to doing all this will be to figure out a way for the client to sort elements obliviously while using space. Once we can obliviously sort, the rest will follow relatively easily.
To do this, Goldreich and Ostrovsky proposed to use a sorting network like Batcher’s Bitonic network. Think of a sorting network as a circuit composed of comparison gates. The gates take two inputs and and output the pair if and the pair if . Given a set of input values, the sorting network outputs the items in sorted order. Sorting networks have two interesting properties: the comparisons they perform are independent of the input sequence; and each gate in the network is a binary operation (i.e., takes only two inputs). Of course, there is an overhead to sorting obviously so Batcher’s network requires work as opposed to the traditional for sorting.
So to obliviously sort a set of ciphertexts stored at the server, the client will start executing the sorting network and whenever it reaches a comparison gate between the th and th item, it will just request the th and th ciphertexts, decrypt them, compare them, and store them back re-encrypted in the appropriate order. Note that by the first property above, the client’s access pattern reveals nothing to the server; and by the second property the client will never need to store more than two items at the same time.
Now that we can sort obliviously, let’s see how to re-structure the ORAM. We will do it in two phases. In the first phase, we sort all the items in according to their virtual addresses. This is how we will get rid of inconsistencies. Remember that the items in are augmented to have the form for real items and for dummy items. It follows that all items in the cache have the first form since they are either copies or updates of real items put there during and operations.
So we just execute the sorting network and, for each comparison gate, retrieve the appropriate items, decrypt them, compare their virtual addresses and return them re-encrypted in the appropriate order. The result of this process is that will now have the following form:
- the first items will consist of the most recent versions of the real items, i.e., all the items with virtual addresses other than and ;
- the next items will consist of dummy items, i.e., all items with virtual address .
- the final items will consist of the old/inconsistent versions of the real items, i.e., all items with virtual address (remember that in Step of and we executed a modified that updated the item’s virtual address to ).
In the second phase, we randomly permute and re-encrypt the first items of . We first choose a new key for . We then access each item from location to and update their tags to . Once we’ve updated the tags, we sort all the items according to their tags. The result will be a new random permutation of items. Note that we don’t technically have to do this in two passes; but it’s easier to explain this way.
At this point, we’re done! is as good as new and we can start accessing it again safely.
Efficiency. So what is the efficiency of the Square-Root solution? Setup is : to construct the real, dummy and cache items and to permute everything through sorting.
Each access operation (i.e., or ) is : total get/put operations to get the cache twice and for each get/put operation due to binary search.
Restructuring is : to sort by virtual address and to sort by tag. Restructuring, however, only occurs once every accesses. Because of this, we usually average the cost of re-structuring over the number read/write operations supported to give an amortized access cost. In our case, the amortized access cost is then
which is .
ORAM-Based Encrypted Search
So now that we know how to build an ORAM, we’ll see how to use it for encrypted search. There are two possible ways to do this.
A naive approach. The first is for the client to just dump all the documents in an array , setup an ORAM and send to the server. To search, the client can just simulate a sequential search algorithm via the protocol; that is, replace every read operation of the search algorithm with an execution of the protocol. To update the documents the client can similarly simulate an update algorithm using the protocol.
This will obviously be slow. Let’ s assume all the documents have bit-length and that has a block size of bits. The document collection will then fit in (approximately) blocks. The sequential scan algorithm is itself , but on top of that we’ll have to execute an entire protocol for every address of memory read.
Remember that if we’re using the Square-Root solution as our ORAM then the protocol requires amortized work. So in total, search would be which would not scale. Now imagine for a second if we were using the FHE-based ORAM described above which requires work for each and . In this scenario, a single search would take time!
A better approach.  A better idea is for the client to build two arrays and . In it will store a data structure that supports fast searches on the document collection (e.g., an inverted index) and in it will store the documents themselves. It then builds and sends and to the server. To search, the client simulates a query to the data structure in via the protocol (i.e., it replaces each read operation in the data structure’s query algorithm with an execution of ) . From this, the client will recover the identifiers of the documents that contain the keyword and with this information it can just read those documents from .
Now suppose there are documents that contain the keyword and that we’re using an optimal-time data structure (i.e., a structure with a query algorithm that runs in time like an inverted index). Also, assume that the data structure fits in blocks of bits and that the data collection fits in blocks.
Again, if we were using the Square-Root solution for our ORAMs, then the first step would take time and the second step will take
In practice, the size of a fast data structure for keyword search can be large. A very conservative estimate for an inverted index, for example, would be that it is roughly the size of the data collection . Setting , the total search time would be
which is (since ) compared to the previous approach’ s .
In cases where the search term appears in documents, this can be a substantial improvement.
Is This Practical?
If one were to only look at the asymptotics, one might conclude that the two-RAM solution described above might be reasonably efficient. After all it would take at least time just to retrieve the matching files from (unencrypted) memory so the two-RAM solution adds just a multiplicative factor over the minimum retrieval time.
Also there are much more efficient ORAM constructions than the Square-Root solution. In fact, in their paper, Goldreich and Ostrovsky also proposed the Hierarchichal solution which achieves amortized access cost. Goodrich and Mitzenmacher [GM11] gave a solution with amortized access cost and, recently, Kushilevitz, Lu and Ostrovsky [KLO12] a solution with amortized cost (and there are even more recent papers that improve on this under certain conditions). There are also works that tradeoff client storage for access efficiency. For example, Williams, Sion and Carbunar [WSC08] propose a solution with amortized access cost and client storage while Stefanov, Shi and Song [SSS12] propose a solution with amortized overhead for clients that have local storage, where the underlying constant is very small. There is also a line of work that tries to de-amortize ORAM in the sense that it splits the re-structuring operation so that it happens progressively over each access. This was first considered by Ostrovsky and Shoup in [OS97] and was further studied by Goodrich, Mitzenmacher, Ohrimenko, Tamassia [GMOT11] and by Shi, Chan, Stefanov and Li [SSSL11].
All in all this may not seem that bad and, intuitively, the two-RAM solution might actually be reasonably practical for small to moderate-scale data collections—especially considering all the recent improvements in efficiency that have been proposed. For large- or massive-scale collections, however, I’d be surprised .
In this post we went over the ORAM-based solution to encrypted search which provides the most secure solution to our problem since it hides everything—even the access pattern!
In the next post we’ll cover an approach that tries to strike a balance between efficiency and security. In particular, this solution is as efficient as the deterministic-encryption-based solution while being only slightly less secure than the ORAM-based solution.
 I haven’t seen this construction written down anywhere. It’s fairly obvious, however, so I suspect it’s been mentioned somewhere. If anyone knows of a reference, please let me know. Also, as Jon Katz points out in the comments, this approach requires the server to compute whereas traditional ORAM constructions do not. Some recent works have started to distinguish these variants of ORAM (see comments for more).
 Like the FHE-based ORAM, I have not seen this construction written down anywhere so if anyone knows of a reference, please let me know.
 Of course, the following could be done using a single RAM, but splitting into two makes things easier to explain.
 Note that this will reveal to the server some information. If we’re using an inverted index as the underlying search structure, it will reveal the number of documents that contain the keyword (see the comments for a brief discussion). The natural way to address this, of course, is to use an inverted index with lists that are padded to the maximum size.
 In practice, this would not be the case and, in addition, we could make use of index compression techniques.
 I won’t attempt to draw exact lines between what’s small-, moderate- and large-scale since I think that’s a question best answered by experimental results.