跳转到主要内容

热门内容

今日:


总体:


最近浏览:


Chinese, Simplified

category

在Zama赏金计划的第一季中,我们要求您使用FHE创建一个加密的密钥值数据库。需要一个适当的解决方案来确保密钥和值都被加密,同时进一步遵守赏金描述中提供的一些条件。

Github用户Alpaylan已成功完成此赏金,详情如下。

键值数据库是为通过唯一密钥存储和检索数据而设计的。虽然键值数据库比标准关系数据库更灵活、更高效,但仍然容易发生数据被盗。解决方案?一种加密的密钥值数据库,可以存储和检索数据,而无需解密。当密钥和值都加密时,将数据存储在基于云的设置中,甚至存储在大型开放的内部环境中都不会产生安全问题。

以下加密键值数据库教程解释了三个操作:“插入”、“替换”和“查询”。所有操作都实现为完全同态加密电路。

examples/key-value-database/`中,您可以找到以下文件。将它们用作参考或构建自己的数据库:

  • `static-size.py`:此文件包含一个静态大小数据库实现,这意味着条目数量在开头作为参数给出。
  • `dynamic-size.py`:此文件包含一个动态大小数据库实现,这意味着数据库从零条目数据库开始,并根据需要进行扩展。


首先将检查静态大小的数据库实现。动态数据库实现非常相似,因此建议您查看代码以确定差异。

以下是进口声明。

`concrete.numpy`:用于实现同态电路。

`numpy`:用于数学运算。

import concrete.numpy as cnp
import numpy as np


数据库配置参数如下:

  • 条目数:定义可插入(键/值)对的最大数量。
  • 块大小:定义每个块的大小,即键和值的最小子结构。
  • 密钥大小:定义每个密钥的大小。
  • 值大小:定义每个值的大小。
# The number of entries in the database
NUMBER_OF_ENTRIES = 5
# The number of bits in each chunk
CHUNK_SIZE = 4

# The number of bits in the key and value
KEY_SIZE = 32
VALUE_SIZE = 32


接下来是状态的定义和状态的访问器/索引器。

状态的形状根据键/值的大小定义,如下表所示。

# Key and Value size must be a multiple of chunk size
assert KEY_SIZE % CHUNK_SIZE == 0
assert VALUE_SIZE % CHUNK_SIZE == 0

# Required number of chunks to store keys and values
NUMBER_OF_KEY_CHUNKS = KEY_SIZE // CHUNK_SIZE
NUMBER_OF_VALUE_CHUNKS = VALUE_SIZE // CHUNK_SIZE

# The shape of the state as a tensor
# Shape:
# | Flag Size | Key Size | Value Size |
# | 1         | 32/4 = 8 | 32/4 = 8   |
STATE_SHAPE = (NUMBER_OF_ENTRIES, 1 + NUMBER_OF_KEY_CHUNKS + NUMBER_OF_VALUE_CHUNKS)


下面的切片用于索引该州的某些部分。

# Indexers for each part of the state
FLAG = 0
KEY = slice(1, 1 + NUMBER_OF_KEY_CHUNKS)
VALUE = slice(1 + NUMBER_OF_KEY_CHUNKS, None)


编码/解码功能


编码/解码函数用于在整数和NumPy数组之间进行转换。该接口公开整数,但状态作为NumPy数组存储和处理。

编码。

  • 将数字编码为NumPy数组。
  • 该数字以二进制编码,然后分成块;
  • 然后将每个块转换为整数;


然后将整数存储在NumPy数组中。

def encode(number: int, width: int) -> np.array:
    binary_repr = np.binary_repr(number, width=width)
    blocks = [binary_repr[i:i+CHUNK_SIZE] for i in range(0, len(binary_repr), CHUNK_SIZE)]
    return np.array([int(block, 2) for block in blocks])

# Encode a number with the key size
def encode_key(number: int) -> np.array:
    return encode(number, width=KEY_SIZE)

# Encode a number with the value size
def encode_value(number: int) -> np.array:
    return encode(number, width=VALUE_SIZE)

Decode.

Decodes a NumPy array into a number.

def decode(encoded_number: np.array) -> int:
    result = 0
    for i in range(len(encoded_number)):
        result += 2**(CHUNK_SIZE*i) * encoded_number[(len(encoded_number) - i) - 1]
    return result

Row selection with table lookups

The `keep_selected` function is used to select the correct row of the database for each operation.

Below is the Python definition of the function.

def keep_selected(value, selected):
  if selected:
      return value
  else:
      return 0

This function takes any value, and a boolean flag indicates if value is selected or not. Within homomorphic encryption circuits, we cannot compile this function as encrypted values cannot affect control flow. Instead, we turn this function into a lookup table.

`Selected` is prepended to the value, and the function is modified to act as below.

keep_selected(i=0..15, 1) -> 0 keep_selected(i=16..31, 0) -> i-16

Next comes the Python code for the lookup table.

keep_selected_lut = cnp.LookupTable([0 for _ in range(16)] + [i for i in range(16)])

The most significant bit of the input to the lookup table represents the `select` bit. If select=0 <=> i=0..15 , then the output is 0. If select=1 <=> i=16..31, then the output is i-16, the value itself.

To summarize, you can implement the `keep_selected` function as below.

def keep_selected_using_lut(value, selected):
  packed = (2 ** CHUNK_SIZE) * selected + value
  return keep_selected_lut[packed]

Circuit implementation functions

The following functions are used to implement the key-value database circuits. Three circuits are implemented as:

  • `insert`: Inserts a key value pair into the database
  • `replace`: Replaces the value of a key in the database
  • `query`: Queries the database for a key and returns the value

Insert.

The algorithm of the `insert` function is as follows:

  • Create a selection array to select a certain row of the database;
  • Fill this array by setting the first non-empty row of the database to 1;
  • Create a state update array, where the first non-empty row of the database is set to the new key and value;
  • Add the state update array to the state.

The full implementation is provided here:

# Insert a key value pair into the database
# - state: The state of the database
# - key: The key to insert
# - value: The value to insert
# Returns the updated state
def _insert_impl(state, key, value):
    # Get the used bit from the state
    # This bit is used to determine if an entry is used or not
    flags = state[:, FLAG]

    # Create a selection array
    # This array is used to select the first unused entry
    selection = cnp.zeros(NUMBER_OF_ENTRIES)

    # The found bit is used to determine if an unused entry has been found
    found = cnp.zero()
    for i in range(NUMBER_OF_ENTRIES):
        # The packed flag and found bit are used to determine if the entry is unused
        # | Flag | Found |
        # | 0    | 0     | -> Unused, select
        # | 0    | 1     | -> Unused, skip
        # | 1    | 0     | -> Used, skip
        # | 1    | 1     | -> Used, skip
        packed_flag_and_found = (found * 2) + flags[i]
        # Use the packed flag and found bit to determine if the entry is unused
        is_selected = (packed_flag_and_found == 0)

        # Update the selection array
        selection[i] = is_selected
        # Update the found bit, so all entries will be 
        # skipped after the first unused entry is found
        found += is_selected

    # Create a state update array
    state_update = cnp.zeros(STATE_SHAPE)
    # Update the state update array with the selection array
    state_update[:, FLAG] = selection

    # Reshape the selection array to be able to use it as an index
    selection = selection.reshape((-1, 1))

    # Create a packed selection and key array
    # This array is used to update the key of the selected entry
    packed_selection_and_key = (selection * (2 ** CHUNK_SIZE)) + key
    key_update = keep_selected_lut[packed_selection_and_key]

    # Create a packed selection and value array
    # This array is used to update the value of the selected entry
    packed_selection_and_value = selection * (2 ** CHUNK_SIZE) + value
    value_update = keep_selected_lut[packed_selection_and_value]

    # Update the state update array with the key and value update arrays
    state_update[:, KEY] = key_update
    state_update[:, VALUE] = value_update

    # Update the state with the state update array
    new_state = state + state_update
    return new_state

Replace.

The algorithm of the `replace` function is as follows:

  • Create an equal-rows array to select rows that match the given key in the database;
  • Create a selection array to select the row that is currently used in the database;
  • Set the selection array to 1 for the row that contains the key, and 0 for all other rows;
  • Create an inverse selection array by inverting the selection array;
  • Update rows set to 1 in the selection array – all other values will stay the same. To do this, we multiply the selection array with the new key and value, and the inverse selection array with the old key and value;
  • then, add the two arrays to get the new state.
# Replace the value of a key in the database
#   If the key is not in the database, nothing happens
#   If the key is in the database, the value is replaced
# - state: The state of the database
# - key: The key to replace
# - value: The value to replace
# Returns the updated state
def _replace_impl(state, key, value):
    # Get the flags, keys and values from the state
    flags = state[:, FLAG]
    keys = state[:, KEY]
    values = state[:, VALUE]

    # Create an equal_rows array
    # This array is used to select all entries with the given key
    # The equal_rows array is created by comparing the keys in the state
    # with the given key, and only setting the entry to 1 if the keys are equal
    # Example:
    #   keys = [[1, 0, 1, 0], [0, 1, 0, 1, 1]]
    #   key = [1, 0, 1, 0]
    #   equal_rows = [1, 0]
    equal_rows = (np.sum((keys - key) == 0, axis=1) == NUMBER_OF_KEY_CHUNKS)

    # Create a selection array
    # This array is used to select the entry to change the value of
    # The selection array is created by combining the equal_rows array
    # with the flags array, which is used to determine if an entry is used or not
    # The reason for combining the equal_rows array with the flags array
    # is to make sure that only used entries are selected
    selection = (flags * 2 + equal_rows == 3).reshape((-1, 1))
    
    # Create a packed selection and value array
    # This array is used to update the value of the selected entry
    packed_selection_and_value = selection * (2 ** CHUNK_SIZE) + value
    set_value = keep_selected_lut[packed_selection_and_value]

    # Create an inverse selection array
    # This array is used to pick entries that are not selected
    # Example:
    #   selection = [1, 0, 0]
    #   inverse_selection = [0, 1, 1]
    inverse_selection = 1 - selection

    # Create a packed inverse selection and value array
    # This array is used to keep the value of the entries that are not selected
    packed_inverse_selection_and_values = inverse_selection * (2 ** CHUNK_SIZE) + values
    kept_values = keep_selected_lut[packed_inverse_selection_and_values]

    # Update the values of the state with the new values
    new_values = kept_values + set_value
    state[:, VALUE] = new_values

    return state

Query.

The algorithm of the `query` function is as follows:

  • Create a selection array to select a certain row of the database;
  • Set the selection array to 1 for the row that contains the key;
  • Multiply the selection array with the state to zero all rows that do not contain the key;
  • Sum the rows of the state to get the remaining non-zero row, basically doing a dimension reduction;
  • Prepend the found flag to the value, and return the resulting array;
  • Destructure the resulting array in the non-encrypted query function.
# Query the database for a key and return the value
# - state: The state of the database
# - key: The key to query
# Returns an array with the following format:
#   [found, value]
#   found: 1 if the key was found, 0 otherwise
#   value: The value of the key if the key was found, 0 otherwise
def _query_impl(state, key):
    # Get the keys and values from the state
    keys = state[:, KEY]
    values = state[:, VALUE]

    # Create a selection array
    # This array is used to select the entry with the given key
    # The selection array is created by comparing the keys in the state
    # with the given key, and only setting the entry to 1 if the keys are equal
    # Example:
    #   keys = [[1, 0, 1, 0], [0, 1, 0, 1, 1]]
    #   key = [1, 0, 1, 0]
    #   selection = [1, 0]
    selection = (np.sum((keys - key) == 0, axis=1) == NUMBER_OF_KEY_CHUNKS).reshape((-1, 1))

    # Create a found bit
    # This bit is used to determine if the key was found
    # The found bit is set to 1 if the key was found, and 0 otherwise
    found = np.sum(selection)

    # Create a packed selection and value array
    # This array is used to get the value of the selected entry
    packed_selection_and_values = selection * (2 ** CHUNK_SIZE) + values
    value_selection = keep_selected_lut[packed_selection_and_values]

    # Sum the value selection array to get the value
    value = np.sum(value_selection, axis=0)

    # Return the found bit and the value
    return cnp.array([found, *value])

Key-value database interface

The `KeyValueDatabase` class is the interface that exposes the functionality of the key-value database.

class KeyValueDatabase:
    """
    A key-value database that uses fully homomorphic encryption circuits to store the data.
    """

    # The state of the database, it holds all the 
    # keys and values as a table of entries
    _state: np.ndarray

    # The circuits used to implement the database
    _insert_circuit: cnp.Circuit
    _replace_circuit: cnp.Circuit
    _query_circuit: cnp.Circuit

    # Below is the initialization of the database.

    # First, we initialize the state, and provide the necessary input sets. 
    # In versions later than concrete-numpy.0.9.0, we can use the `direct circuit` 
    # functionality to define the bit-widths of encrypted values rather than using 
    # `input sets`. Input sets are used to determine the required bit-width of the 
    # encrypted values. Hence, we add the largest possible value in the database 
    # to the input sets.

    # Within the initialization phase, we create the required configuration, 
    # compilers, circuits, and keys. Circuit and key generation phase is 
    # timed and printed in the output.

    def __init__(self):
        # Initialize the state to all zeros
        self._state = np.zeros(STATE_SHAPE, dtype=np.int64)

        ## Input sets for initialization of the circuits
        # The input sets are used to initialize the circuits with the correct parameters

        # The input set for the query circuit
        inputset_binary = [
            (
                np.zeros(STATE_SHAPE, dtype=np.int64), # state
                np.ones(NUMBER_OF_KEY_CHUNKS, dtype=np.int64) * (2**CHUNK_SIZE - 1), # key
            )
        ]
        # The input set for the insert and replace circuits
        inputset_ternary = [
            (
                np.zeros(STATE_SHAPE, dtype=np.int64), # state
                np.ones(NUMBER_OF_KEY_CHUNKS, dtype=np.int64) * (2**CHUNK_SIZE - 1), # key
                np.ones(NUMBER_OF_VALUE_CHUNKS, dtype=np.int64) * (2**CHUNK_SIZE - 1), # value
            )
        ]

        ## Circuit compilation

        # Create a configuration for the compiler
        configuration = cnp.Configuration(
            enable_unsafe_features=True,
            use_insecure_key_cache=True,
            insecure_key_cache_location=".keys",
        )

        # Create the compilers for the circuits
        # Each compiler is provided with
        # - The implementation of the circuit
        # - The inputs and their corresponding types of the circuit
        #  - "encrypted": The input is encrypted
        #  - "plain": The input is not encrypted
        insert_compiler = cnp.Compiler(
            _insert_impl,
            {"state": "encrypted", "key": "encrypted", "value": "encrypted"}
        )
        replace_compiler = cnp.Compiler(
            _replace_impl,
            {"state": "encrypted", "key": "encrypted", "value": "encrypted"}
        )
        query_compiler = cnp.Compiler(
            _query_impl,
            {"state": "encrypted", "key": "encrypted"}
        )


        ## Compile the circuits
        # The circuits are compiled with the input set and the configuration

        self._insert_circuit = insert_compiler.compile(inputset_ternary, configuration)
        self._replace_circuit = replace_compiler.compile(inputset_ternary, configuration)
        self._query_circuit = query_compiler.compile(inputset_binary, configuration)

        ## Generate the keys for the circuits
        # The keys are seaparately generated for each circuit

        self._insert_circuit.keygen()
        self._replace_circuit.keygen()
        self._query_circuit.keygen()

    ### The Interface Functions
        
    # The following methods are used to interact with the database. 
    # They are used to insert, replace and query the database. 
    # The methods are implemented by encrypting the inputs, 
    # running the circuit and decrypting the output.

    # Insert a key-value pair into the database
    # - key: The key to insert
    # - value: The value to insert
    # The key and value are encoded before they are inserted
    # The state of the database is updated with the new key-value pair
    def insert(self, key, value):
        self._state = self._insert_circuit.encrypt_run_decrypt(
            self._state, encode_key(key), encode_value(value)
        )

    # Replace a key-value pair in the database
    # - key: The key to replace
    # - value: The new value to insert with the key
    # The key and value are encoded before they are inserted
    # The state of the database is updated with the new key-value pair
    def replace(self, key, value):
        self._state = self._replace_circuit.encrypt_run_decrypt(
            self._state, encode_key(key), encode_value(value)
        )

    # Query the database for a key
    # - key: The key to query
    # The key is encoded before it is queried
    # Returns the value associated with the key or None if the key is not found
    def query(self, key):
        result = self._query_circuit.encrypt_run_decrypt(
            self._state, encode_key(key)
        )

        if result[0] == 0:
            return None

        return decode(result[1:])

Whereas static implementation works with circuits over the entire database, dynamic implementation works with circuits over a single row.

In the dynamic implementation, you can iterate over the rows of the database in a simple Python loop and run the circuits over each row. This difference in implementation is reflected in the `insert`, `replace`, and `query` functions.

The static implementation is more efficient with dense databases as it works with parallelized tensors, resulting in a fixed amount of time for both simple and complex queries. The dynamic implementation is more efficient with sparse databases as it grows with the number of entries, but it doesn't use circuit-level parallelization. Also, insertion is free in the dynamic implementation as it only appends a new item to a Python list.

Usage

The initialization of the database is given below. Parameters are provided globally, so you can simply initialize the database with the following command.

## Test: Initialization
# Initialize the database
db = KeyValueDatabase()

Use the interface functions as provided below.

# Test: Insert/Query
# Insert (key: 3, value: 4) into the database
db.insert(3, 4)
# Query the database for the key 3
# The value 4 should be returned
assert db.query(3) == 4
# Test: Replace/Query
# Replace the value of the key 3 with 1
db.replace(3, 1)
# Query the database for the key 3
# The value 1 should be returned
assert db.query(3) == 1
# Test: Insert/Query
# Insert (key: 25, value: 40) into the database
db.insert(25, 40)
# Query the database for the key 25
# The value 40 should be returned
assert db.query(25) == 40
# Test: Query Not Found
# Query the database for the key 4
# None should be returned
assert db.query(4) == None
# Test: Replace/Query
# Replace the value of the key 3 with 5
db.replace(3, 5)
# Query the database for the key 3
# The value 5 should be returned
assert db.query(3) == 5

Now, we’ll test the limits. Using the hyper-parameters KEY_SIZE and VALUE_SIZE, we ensure that the examples work robustly against changes to the parameters.

# Define lower/upper bounds for the key
minimum_key = 1
maximum_key = 2 ** KEY_SIZE - 1
# Define lower/upper bounds for the value
minimum_value = 1
maximum_value = 2 ** VALUE_SIZE - 1

Below are the usage examples with these bounds.

# Test: Insert/Replace/Query Bounds
# Insert (key: minimum_key , value: minimum_value) into the database
db.insert(minimum_key, minimum_value)

# Query the database for the key=minimum_key
# The value minimum_value should be returned
assert db.query(minimum_key) == minimum_value

# Insert (key: maximum_key , value: maximum_value) into the database
db.insert(maximum_key, maximum_value)

# Query the database for the key=maximum_key
# The value maximum_value should be returned
assert db.query(maximum_key) == maximum_value

# Replace the value of key=minimum_key with maximum_value
db.replace(minimum_key, maximum_value)

# Query the database for the key=minimum_key
# The value maximum_value should be returned
assert db.query(minimum_key) == maximum_value

# Replace the value of key=maximum_key with minimum_value
db.replace(maximum_key, minimum_value)

# Query the database for the key=maximum_key
# The value minimum_value should be returned
assert db.query(maximum_key) == minimum_value

 

结论


FHE为在开放环境中存储和查询数据的问题提供了一种创新的方法,而这种慷慨进一步证明了这一概念。例如,现代购物车和经典会话商店可以依靠这种数据库来更安全地运行,因此您的在线订单和会话历史记录一定会保密。还有许多其他用例可以从加密密钥值数据库中受益,这个解决方案为我们提供了一种可能性。

本文地址
最后修改
星期日, 一月 26, 2025 - 20:45
Article