Cloudproof Spark Library

The Cloudproof Java library provides a Spark-friendly API to Cosmian's Cloudproof Encryption.

Cloudproof Encryption secures data repositories and applications in the cloud with advanced application-level encryption and encrypted search.

Licensing
Cryptographic primitives
Getting Started
- Using in Java projects
- From this repository
Reading the code
Parquet format
Benchmarks
Cryptographic overhead
Testing

Licensing

The library is available under a dual licensing scheme Affero GPL/v3 and commercial. See LICENSE.md for details.

Cryptographic primitives

The library is based on:

CoverCrypt algorithm which allows creating ciphertexts for a set of attributes and issuing user keys with access policies over these attributes. CoverCrypt offers Post-Quantum resistance.

Getting Started

Using in Java projects

This library is open-source software and is available on Maven Central.

<dependency>
    <groupId>com.cosmian.cloudproof.spark</groupId>
    <artifactId>cloudproof_spark</artifactId>
    <version>1.0.0</version>
</dependency>

From this repository

1/ Install SBT

For Linux, download and extract ZIP file

2/ Install Spark

Download and extract from source

3/ Download the CSV file organizations-2000000.csv from https://www.datablist.com/learn/csv/download-sample-csv-files and put it at the root folder

wget https://github.com/datablist/sample-csv-files/raw/main/files/organizations/organizations-2000000.csv
7za x organizations-2000000.csv

4/ Execute:

mvn package && spark-submit --class "CloudproofSpark" --master "local[*]" target/cloudproof_spark-1.0.0.jar

or:

sbt assembly && spark-submit --class "CloudproofSpark" --master "local[*]" target/scala-2.12/CloudproofSpark-assembly-1.0.0.jar

Reading the code

src/main/scala/com/cosmian/cloudproof/spark/CloudproofSpark.scala is the main entrypoint, it contains the Spark code to read the CSV, write the encrypted parquet files and read the encrypted parquet files again (with different keys)
src/main/java/com/cosmian/cloudproof/spark/CoverCryptCryptoFactory.java is the class responsible to encrypt/decrypt the files and the columns with CoverCrypt
src/main/java/com/cosmian/cloudproof/spark/EncryptionMapping.java is simple class to encapsulate the mapping in a string form (because Spark config is only working with strings), read it and choose the correct policy for a specific file/column.

Parquet format

Parquet format is described here.

Benchmarks

Timings are about:
- Spark boot
- reading CSV dataset
- writing output in Parquet format
- Du to JVM execution and Spark boot, timings are unstable: given values represent an idea of the performance.
Parquet Encryption deals with files and columns
CPU: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
Datasets are CSV files:
- 100 000 lines (14M)
- 500 000 lines (71M)
- 1 000 000 lines (141M)
- 2 000 000 lines (283M)
Size of output regroups all .parquet and .crc files sizes

Quick summary

Without post-quantum resistance, CoverCrypt scheme overhead size and performance are equivalent to classic symmetric encryption algorithm like AES256-GCM but with a hybrid cryptographic system with multiple benefits.

Parquet without encryption

	100_000 lines	500_000 lines	1_000_000 lines	2_000_000 lines
Size of output	17M	66M	104M	169M
Timings	7s	24s	31s	31s

Parquet with classic AES256-GCM encryption

	100_000 lines	500_000 lines	1_000_000 lines	2_000_000 lines
Size of output	19M	77M	117M	183M
Timings	6s	23s	31s	36s

Parquet with CoverCrypt encryption

	100_000 lines	500_000 lines	1_000_000 lines	2_000_000 lines
Size of output	20M	78M	118M	185M
Timings	9s	24s	33s	40s

Parquet with CoverCrypt encryption (post quantum resistant)

	100_000 lines	500_000 lines	1_000_000 lines	2_000_000 lines
Size of output	21M	85M	126M	192M
Timings	9s	24s	36s	42s

Cryptographic overhead

A description of the cryptographic overhead is given here.

Testing

To test the TestCloudproof.scala, run

sbt "test:testOnly -- -oD"

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
project		project
src		src
.gitignore		.gitignore
.jvmopts		.jvmopts
.pre-commit-config.yaml		.pre-commit-config.yaml
.scalafmt.conf		.scalafmt.conf
CHANGELOG.md		CHANGELOG.md
CRYPTOGRAPHIC_OVERHEAD.md		CRYPTOGRAPHIC_OVERHEAD.md
LICENSE.md		LICENSE.md
README.md		README.md
build.sbt		build.sbt
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cloudproof Spark Library

Licensing

Cryptographic primitives

Getting Started

Using in Java projects

From this repository

Reading the code

Parquet format

Benchmarks

Quick summary

Parquet without encryption

Parquet with classic AES256-GCM encryption

Parquet with CoverCrypt encryption

Parquet with CoverCrypt encryption (post quantum resistant)

Cryptographic overhead

Testing

About

Releases 1

Packages

Contributors 2

Languages

License

Cosmian/cloudproof_spark

Folders and files

Latest commit

History

Repository files navigation

Cloudproof Spark Library

Licensing

Cryptographic primitives

Getting Started

Using in Java projects

From this repository

Reading the code

Parquet format

Benchmarks

Quick summary

Parquet without encryption

Parquet with classic AES256-GCM encryption

Parquet with CoverCrypt encryption

Parquet with CoverCrypt encryption (post quantum resistant)

Cryptographic overhead

Testing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages