CipherCore introduces a new way of accessing and using data in which data remains confidential and allows for a robust and secure collaboration between many data owners, without disclosing their data to each other.
CipherCore introduces a new way of accessing and using data in which data remains confidential and allows for a robust and secure collaboration between many data owners, without disclosing their data to each other.
TL;DR CipherCore introduces a new way of accessing and using data in which data remains confidential and allows for a robust and secure collaboration between many data owners, without disclosing their data to each other.
In a nutshell, CipherCore is a secure computation engine written in Rust (with a Python wrapper) that forms the foundation of the CipherMode secure data sharing platform that operates over encrypted data without decrypting it.
Efficiency and genericity of CipherCore allow us, among other things, to train a neural network to predict time to failure on the NASA Turbofan Jet Engine dataset (which consists of tens of thousands of training data points) in under 5 minutes without access to the plaintext training data (if one compares it with the best available implementation of Homomorphic Encryption, very conservatively, the latter will be at least 5000x slower).
Check out CipherCore on GitHub and if you like what you see, consider giving it a star!
Some quick links:
At CipherMode, we are building a secure data sharing platform that enables analytics (think joins) and machine learning (think training decision trees or neural networks) over datasets that are distributed between parties that are not willing to trust each other or any third party with sharing their data. But how can we run any computation if we can't bring all the necessary data into one place and aren't allowed to leak any information about the said data? Isn't this obviously impossible?
Turns out it is possible if one is willing to use modern cryptography. Specifically, Secure Multi-Party Computation (SMPC). SMPC allows several parties to compute jointly any function of their inputs in a way that no information about the inputs (other than what can be inferred from the output) leaks to any other party. This is incredibly powerful: the function computed can be anything from "train this machine learning model" to "run this SQL query".
You can think of SMPC as a "black box", where the parties contribute their data and after that the black box gives back the result of a computation. Normally, such a black box is achieved using a trusted third party, but with SMPC there is no need for it.
SMPC is a very active area of academic research with many exciting developments: see this GitHub repo for an overview and pointers. However, so far it has not had mainstream, mass adoption by organizations despite providing a particularly clean, general and powerful solution for private data collaboration. There are two main reasons for this:
At CipherMode, we are actively working on mitigating these two barriers. A part of these efforts is CipherCore: an SMPC engine that we decided to share with the community.
When we were building CipherCore, we spent lots of time thinking what's a good level of abstraction that would allow us, at the same time,
Turns out, a good intermediate representation that satisfies all of the above property is that of a computation graph. Computation graphs are ubiquitous in the context of machine learning, massive data processing, databases etc.
In CipherCore, any user-defined computation is a computation graph. For example, if you want to multiply two matrices of sizes 10x20 and 20x30 securely using SMPC, you can write:
And this code produces the following computation graph:
Not only the original computation is represented as a graph, but also a secure protocol obtained by applying SMPC is also simply a graph (albeit a larger one). For instance, if we "compile" the above graph using the ABY3 SMPC three-party protocol, we get the following:
The new graph has functionality identical to the original user-defined computation (we still multiply two matrices), but this time we do it in a way that the first matrix is provided by "party 0", the second matrix is provided by "party 1", and the product is revealed to "party 2", and that's the only piece of information about the inputs anyone learns along the way.
The consequences of that in CipherCore, "everything is a computation graph" are the following:
To build a sufficiently generic yet efficient secure computation engine, we also need to think about what is a basic set of types and operations we would like to support.
After some iterations, we arrived to the following instruction set:
CipherCore can be used directly, but it is still might be quite low-level for an everyday use, so using CipherCore as a foundation, we at CipherMode are building two higher level frameworks:
We are planning to blog about both of these directions in detail soon (Will we be able to fine-tune a transformer on encrypted data??), stay tuned!
If you are interested to dig deeper and get some hands-on experience working with CipherCore, we provide a few resources:
Our latest funding milestone will enable us to expand into highly regulated sectors.
The latest funding will accelerate the commercialization of Pyte’s secure computation tech for data utilization and collaboration
Standard access management is not enough to protect data. Snowflake's recent hack is just another example.