Managing Large Enclaves in a Data Center

Live migration of applications and VMs in data centers is an old and quintessential problem. In this large body of work, an important open problem still remains, which is the migration of secure enclaves (sandboxes) running on trusted execution environments (TEEs) like Intel SGX. Here, the decade-old stop-and-copy-based method is used, in which the entire application`s execution is stopped and the state is collected and transferred. This method has an exceedingly long downtime when we consider enclaves with large memory footprints. Better solutions have eluded us because of some design limitations posed by TEEs like Intel SGX, such as the opacity of data within enclaves (not visible to the OS/hypervisor) and the lack of mechanisms to track writes on secure pages. We propose a new technique, OptMig, to circumvent these limitations and implement secure enclave migration with a near-zero downtime. We rely on a short compiler pass and propose a novel migration mechanism. Our optimizations reduce the total downtime by 77-96% for a suite of Intel SGX applications that have multi-GB memory footprints. We show results for our system on a real cloud and in settings that use containers, VMs, and microVMs
View on arXiv