Sensitive Data Protection¶
The advent of the EU General Data Protection Regulation (GDPR) permitted to highlight the need to protect sensitive information from leakage.
GPG¶
A basic approach relies on GPG to encrypt single files -- see this tutorial for more details
# File encryption
$ gpg --encrypt [-r <recipient>] <file> # => produces <file>.gpg
$ rm -f <file> # /!\ WARNING: encryption DOES NOT delete the input (clear-text) file
$ gpg --armor --detach-sign <file> # Generate signature file <file>.asc
# Decryption
$ gpg --verify <file>.asc # (eventually but STRONGLY encouraged) verify signature file
$ gpg --decrypt <file>.gpg # Decrypt PGP encrypted file
One drawback is that files need to be completely decrypted for processing
Tutorial: Using GnuPG aka Gnu Privacy Guard aka GPG
File Encryption Frameworks (EncFS, GoCryptFS...)¶
In contrast to disk-encryption software that operate on whole disks (TrueCrypt, dm-crypt etc), file encryption operates on individual files that can be backed up or synchronised easily, especially within a Git repository.
- Comparison matrix
- gocryptfs, aspiring successor of EncFS written in Go
- EncFS, mature with known security issues
- eCryptFS, integrated into the Linux kernel
- Cryptomator, strong cross-platform support through Java and WebDAV
- securefs, a cross-platform project implemented in C++.
- CryFS, result of a master thesis at the KIT University that uses chunked storage to obfuscate file sizes.
Assuming you are working from /path/to/my/project
, your workflow (mentionned below for EncFS, but it can be adpated to all the other tools) operated on encrypted vaults and would be as follows:
- (eventually) if operating within a working copy of a git repository, you should ignore the mounting directory (ex:
vault/*
) in the root.gitignore
of the repository- this ensures neither you nor a collaborator will commit any unencrypted version of a file by mistake
- you commit only the EncFS / GocryptFS / eCryptFS / Cryptomator / securefs / CryFS raw directory (ex:
.crypt/
) in your repository. Thus only encrypted form or your files are commited
- You create the EncFS / GocryptFS / eCryptFS / Cryptomator / securefs / CryFS encrypted vault
- You prepare macros/scripts/Makefile/Rakefile tasks to lock/unlock the vault on demand
Here are for instance a few example of these operations in live to create a encrypted vault:
$ cd /path/to/my/project
$ rawdir=.crypt # /!\ ADAPT accordingly
$ mountdir=vault # /!\ ADAPT accordingly
#
# (eventually) Ignore the mount dir
$ echo $mountdir >> .gitignore
### EncFS: Creation of an EncFS vault (only once)
$ encfs --standard $rawdir $mountdir
you SHOULD be on a computing node to use GoCryptFS.
$ cd /path/to/my/project
$ rawdir=.crypt # /!\ ADAPT accordingly
$ mountdir=vault # /!\ ADAPT accordingly
#
# (eventually) Ignore the mount dir
$ echo $mountdir >> .gitignore
### GoCryptFS: load the module - you SHOULD be on a computing node
$ module load tools/gocryptfs
# Creation of a GoCryptFS vault (only once)
$> gocryptfs -init $rawdir
Then you can mount/unmount the vault as follows:
Tool | OS | Opening/Unlocking the vault | Closing/locking the vault |
---|---|---|---|
EncFS | Linux | encfs -o nonempty --idle=60 $rawdir $mountdir |
fusermount -u $mountdir |
EncFS | Mac OS | encfs --idle=60 $rawdir $mountdir |
umount $mountdir |
GocryptFS | gocryptfs $rawdir $mountdir |
as above |
The fact that GoCryptFS is available as a module brings the advantage that it can be mounted in a view folder (vault/
) where you can read and write the unencrypted files, which is Automatically unmounted upon job termination.
File Encryption using SSH [RSA] Key Pairs¶
- Man pages:
openssl rsa
,openssl rsautl
andopenssl enc
- Tutorial: Encryption with RSA Key Pairs
- Tutorial: How to encrypt a big file using OpenSSL and someone's public key
- OpenSSL Command-Line HOWTO, in particular the section 'How do I simply encrypt a file?'
If you encrypt/decrypt files or messages on more than a one-off occasion, you should really use GnuPGP as that is a much better suited tool for this kind of operations. But if you already have someone's public SSH key, it can be convenient to use it, and it is safe.
Warning
The below instructions are NOT compliant with the new OpenSSH format which is used for storing encrypted (or unencrypted) RSA, EcDSA and Ed25519 keys (among others) when you use the -o
option of ssh-keygen
.
You can recognize these keys by the fact that the private SSH key ~/.ssh/id_rsa
starts with -
----BEGIN OPENSSH PRIVATE KEY-----
Encrypt a file using a public SSH key¶
(eventually) SSH RSA public key conversion to PEM PKCS8
OpenSSL encryption/decryption operations performed using the RSA algorithm relies on keys following the PEM format 1 (ideally in the PKCS#8 format).
It is possible to convert OpenSSH public keys (private ones are already compliant) to the PEM PKCS8 format (a more secure format).
For that one can either use the ssh-keygen
or the openssl
commands, the first one being recomm
ended.
# Convert the public key of your collaborator to the PEM PKCS8 format (a more secure format)
$ ssh-keygen -f id_dst_rsa.pub -e -m pkcs8 > id_dst_rsa.pkcs8.pub
# OR use OpenSSL for that...
$ openssl rsa -in id_dst_rsa -pubout -outform PKCS8 > id_dst_rsa.pkcs8.pub
Generate a 256 bit (32 byte) random symmetric key
There is a limit to the maximum length of a message i.e. size of a file that can be encrypted using asymmetric RSA public key encryption keys (which is what SSH ke ys are). For this reason, you should better rely on a 256 bit key to use for symmetric AES encryption and then encrypt/decrypt that symmetric AES key with the asymmetric RSA k eys This is how encrypted connections usually work, by the way.
Generate the unique symmetric key key.bin
of 32 bytes (i.e. 256 bit) as follows:
openssl rand -base64 32 -out key.bin
You should only use this key once. If you send something else to the recipient at another time, you should regenerate another key.
Encrypt the (potentially big) file with the symmetric key
openssl enc -aes-256-cbc -salt -in bigdata.dat -out bigdata.dat.enc -pass file:./key.bin
Indicative performance of OpenSSL Encryption time
You can quickly generate random files of 1 or 10 GiB size as follows:
# Random generation of a 1GiB file
$ dd if=/dev/urandom of=bigfile_1GiB.dat bs=64M count=16 iflag=fullblock
# Random generation of a 1GiB file
$ dd if=/dev/urandom of=bigfile_10GiB.dat bs=64M count=160 iflag=fullblock
openssl enc -aes-256-cbc -salt -in bigfile_<N>GiB.dat -out bigfile_<N>GiB.dat.enc -pass file:./key.bin
File | size | Encryption time |
---|---|---|
bigfile_1GiB.dat |
1 GiB | 0m5.395s |
bigfile_10GiB.dat |
10 GiB | 2m50.214s |
Encrypt the symmetric key, using your collaborator public SSH key in PKCS8 format:
$ openssl rsautl -encrypt -pubin -inkey <(ssh-keygen -e -m PKCS8 -f id_dst_rsa.pub) -in key.bin -out key.bin.enc
# OR, if you have a copy of the PKCS#8 version of his public key
$ openssl rsautl -encrypt -pubin -inkey id_dst_rsa.pkcs8.pub -in key.bin -out key.bin.enc
Delete the unencrypted symmetric key as you don't need it any more (and you should not use it anymore)
$> rm key.bin
Now you can transfer the *.enc
files i.e. send the (potentially big) encrypted file <file>.enc
and the encrypted symmetric key (i.e. key.bin.enc
) to the recipient _i.e. your collaborator.
Note that you are encouraged to send the encrypted file and the encrypted key separately. Although it's not absolutely necessary, it's good practice to separate the two.
If you're allowed to, transfer them by SSH to an agreed remote server. It is even safe to upload the files to a public file sharing service and tell the recipient to download them from there.
Decrypt a file encrypted with a public SSH key¶
First decrypt the symmetric key using the SSH private counterpart:
# Decrypt the key -- /!\ ADAPT the path to the private SSH key
$ openssl rsautl -decrypt -inkey ~/.ssh/id_rsa -in key.bin.enc -out key.bin
Enter pass phrase for ~/.ssh/id_rsa:
Now the (potentially big) file can be decrypted, using the symmetric key:
openssl enc -d -aes-256-cbc -in bigdata.dat.enc -out bigdata.dat -pass file:./key.bin
Misc Q&D for small files¶
For a 'quick and dirty' encryption/decryption of small files:
# Encrypt
$ openssl rsautl -encrypt -inkey <(ssh-keygen -e -m PKCS8 -f ~/.ssh/id_rsa.pub) -pubin -in <cleartext_file>.dat -out <encrypted_file>.dat.enc
# Decrypt
$ openssl rsautl -decrypt -inkey ~/.ssh/id_rsa -in <encrypted_file>.dat.enc -out <cleartext_file>.dat
Data Encryption in Git Repository with git-crypt
¶
It is of course even more important in the context of git repositories, whether public or private, since the disposal of a working copy of the repository enable the access to the full history of commits, in particular the ones eventually done by mistake (git commit -a
) that used to include sensitive files.
That's where git-crypt comes for help.
It is an open source, command line utility that empowers developers to protect specific files within a git repository.
git-crypt enables transparent encryption and decryption of files in a git repository. Files which you choose to protect are encrypted when committed, and decrypted when checked out. git-crypt lets you freely share a repository containing a mix of public and private content. git-crypt gracefully degrades, so developers without the secret key can still clone and commit to a repository with encrypted files. This lets you store your secret material (such as keys or passwords) in the same repository as your code, without requiring you to lock down your entire repository.
The biggest advantage of git-crypt is that private data and public data can live in the same location.
Using Git-crypt to Protect Sensitive Data
PetaSuite Protect¶
PetaSuite is a compression suite for Next-Generation-Sequencing (NGS) data. It consists of a command-line tool and a user-mode library. The command line tool performs compression and decompression operations on files. The user-mode library allows other tools and pipelines to transparently access the NGS data in their original file formats.
PetaSuite is used within LCSB and provides the following features:
- Encrypt and compress genomic data
- Encryption keys and access managed centrally
- Decryption and decompression on-the-fly using a library that intercepts all FS access
This is a commercial software -- contact lcsb.software@uni.lu
if you would like to use it
-
Defined in RFCs 1421 through 1424, is a container format for public/private keys or certificates used preferentially by open-source software such as OpenSSL. The name is from Privacy Enhanced Mail (PEM) (a failed method for secure email, but the container format it used lives on, and is a base64 translation of the x509 ASN.1 keys. ↩