Wholesale encryption with RSA

24 February 2014

Very short introduction to cryptography

The following schema is used when large data set are encrypted/decrypted.

  1. An initial message exchange is performed using public/private encryption key pairs , note that public and private keys are different. The sending party uses the public key for encryption, while the receiving party is using the private key for decryption; this exchange is very slow. This exchange is secure if no third party is able to guess the private key - the encryption algorithm (such as RSA ) ensures this much, trust the maths.
  1. The sending party generates a symmetric key for encryption and decryption of subsequent data; decryption and encryption of this data is relatively fast, but here the same key is used to both encrypt and to decrypt the data. Security is based on the assumption that no other third party is able to guess this symmetric key.
  1. The sending party is encrypting all further data with the symmetric key; sometimes they swap the symmetric key. Note that the symmetric key is sent by means of the initial message exchange, that means it is encrypted with the public key of the sending party - so the trust that we put into the initial message exchange is the basis for the trust that we place in the security of all further communications.
  1. This two-step schema is used in SSL/TLS , PGP , probably all encryption methods are using this in some way.

The catch is that the whole schema depends on the way that the symmetric key is picked, normally a random number generator is supposed to do this; Security is compromised if a third party can guess the random numbers that were picked; Many attacks on privacy are based on manipulating the way that random numbers are chosen on the computer of the sending party; A somewhat more secure approach would be to encrypt the whole message with public key, and to decrypt it by using the public key - without having access to the private key it is currently impossible to decrypt the message (provided that the keys are sufficiently ‘strong’ - the longer the key, the stronger it is and the harder it is to break by guessing the right combination - one has to try more combinations to be lucky) The secret of the private key is generated only once, and everything is secure if the computer was not compromised at the time that they key was created.

The problem with this approach is that this takes a lot of time to decrypt the message and slightly less time to encrypt the message; however modern computers have an increasing number of processors. This little project explores how multi processing can help to speed up the encryption and decryption of larger messages.

Another thing is that while it is currently practically impossible to break RSA encryption with reasonably strong key, theoretically this might change once they have a working quantum computer; However there are serious doubts that we are anywhere near the development of such a breakthrough, but things might change, you never know.

The tools used in this study and how to use them

For this study command line programs were written that encrypts/sign and decrypt/verify longer data all with a public/private key pair.

First of all the openssl utility is used to create the key pair; the next command creates the super secret private key - with 4096 bits long. Remember that the longer the key, the harder it is to break, 4096 bits is very secure by any standards (of today).

  openssl genrsa 4096 >private.pem

(if on Linux: before doing this move around the mouse in random patterns and type some gibberish on the keyboard; the mouse driver and keyboard driver are some of the sources of entropy used to seed the random number generator used to compute the keys here )

Given this file we extract the public key. the sender of the message must have it.

  openssl rsa -in private.pem -out public.pem -pubout

The message is encrypted and signed with the following command, in this example the file raw_message is encrypted by mean of the public key public.pem, resulting in encrypted file encrypted_file.file . Note that this program splits up the work so that it can be done in parallel.

  rsaencmt -k public.pem -i raw_message.file -o encrypted_file.file 

The encrypted message is later decrypted and verified with the following example, Note that this program splits up the work so that it can be done in parallel.

  rsadecmt -k private.pem -i encrypted_file.file -o decrypted_file.file

Also two programs where written that do all the work serially (not in parallell). these are not very practial, they only exist as a baseline in order to compare the times.

  rsaenc -k public.pem -i raw_message.file -o encrypted_file.file 
  rsadec -k private.pem -i encrypted_file.file -o decrypted_file.file

is it any good?

The RSA algorithm is for this utility because there are some doubts in the validity of elliptic curve cryptography (given all those recent revelations); also they seem to know how to break at least some variants of AES cipher (here )

However RSA has problems of its own: there are known side-channel attacks; if the computer that is encrypting the message has been compromised then you can break RSA encryption if you can measure fluctuations of the power consumption while encrypting/decrypting; Still this program does make side channel attacks more difficult: encryption/decryption is delegated to multiple cores on the system, to the best of my knowledge it is impossible to measure the power consumption of individual hardware cores/hardware threads.

Maybe for practical encryption systems you would better use several encryption step, each step is applied on the input of the previous step. Each encryption step would use a different symmetric cipher with its own key. The keys would be still encrypted with the some kind of asymmetric encryption like RSA - so you still have the problem of possible side channel attacks.

Chaining of several the following advantage: if at least one of the steps has not been compromised then the end result is still safe (given that the encryption keys have not been compromised), also it is harder to apply side channel attacks if all steps are made concurrently - because it is harder to measuring the time of the encryption steps.

the tool described in this text does not do this kind of chaining.

how to download the sources & build the tools

Sources can be downloaded as zip archive or downloaded by the git program.

    git clone https://github.com/MoserMichael/cstuff.git

The project can be build on Linux (checked 32 or 64 bit Intel/Amd) or on Windows with Cygwin. In order to build the program, we need the presence of an internet connection, so that the build procedure can download the openssl libraries (alternatively you can place the openssl tar ball into directory cstuff/rsautl/openssl

Build and install into /usr/local/ directory

    cd build
    ./build-rsautl.sh install

Build and install into /usr/alt directory

    cd build
`   ./build-rsautl.sh install INSTALL_PREFIX=/usr/alt

How it works & file formats

Task of the encryption program

  • A header is put before the input message, The header is of the following form
    • 4 bytes - magic constant (0xABCDEF)
    • 4 bytes - file version (currently 0x1)
    • 4 bytes - length of input data
  • The input message is divided into smaller blocks, each block is equal to the size of the public key, minus length of cryptographic padding.
  • The each block is RSA encrypted with the public key, while using the PKCS1_OAEP cryptographic padding
  • a SHA256 digest is computer over all encrypted output, this digest is used to verify that the encrypted message has not been altered in transition
  • the digest itself is encrypted with the public key

Task of the decryption program

  • The first block is decrypted with the private key
    • verify the file header; if magic constant and file version do not make sense, then this means that the message was encrypted with a non matching public key.
    • check that the length of the file is consistent with the length field of the file header.
  • Break up the input message into chunks equal to the length of the private key; compute the SHA256 digest over the encrypted data (prior to decrypting)
  • Decrypt each data chunk using the private key, while using the PKCAS1_OAEP cryptographic padding
  • when finished with the data: compare the value of the SHA256 digest with the stored value; if it is not equal then fail (show error message)
  • Strip the header from the decrypted message

More internals

The program uses the openssl library for cryptography. The openssl tarball includes instructions on working with multiple threads (File doc\crypto\threads.pod and directory crypto\threads have a sample) Basically the program needs to install an optional callback function that implements locking for sensitive data structures.

The rsaencmt and rsadecmt programs are doing their task in parallel; In order to do so they spawn a number of thread - their number is equal to the number of processor cores on the system times 2. The factor of two is chosen due to hyperthreading - usually a core has two hardware threads, if a given core is stalled then the second thread has the chance to do something useful. Currently I do not set the processor affinity of each thread, as I understand the CPU scheduler of the operating system does its own tricks, and tuning of CPU affinity can be tricky undertaking. The tpool thread pool is used to dispatch requests to a thread pool with fixed number of threads.

At first the multithreaded program did not perform very much faster than the serial program; as it turns out this was due to locks used by the openssl library. However with the openssl library it is relatively easy to fix problems related to locking, a callback function for locking must be implemented and registered by means of the CRYPTO_set_locking_callback function. It turns out that each time that there were multiple locks when calling RSA_public_encrypt and RSA_private_decrypt Turns out that the library performs a trick called Montgomery reduction (this function ) intermedeate values have to be cached somewhere - the openssl library uses the RSA key object for that, now in order to make this library usable with multithreading, every access to these cached values must be locked ! The solution was to

  • Disable this particular lock while building openssl for this project
  • for each thread create a copy off the RSA key object.

However there is no good fix for RSA_public_encrypt . Remember that cryptographic padding is used for each block, the library must create random numbers for each encrypted block, in to do so it must it uses the single instance of the SSLeay random number generator , this is a cryptographically secure random number generator (more here ) I can’t create a random number generator per thread because that could deplete the entropy pool and seeding of the generator may block for some time !)

Therefore multithreaded encryption is often not very much faster than single threaded encryption, but the speedup for decryption (where it counts) is considerable.


The test encrypts and decrypts a 100kb file with keys of various length; here are the results

Laptop, one processor with eight cores | (intel Core i7-2720QM - 2.20 GHZ); | on Windows under Cygwin - program is 32 bit

Key length: 2048 bits |Mode|Encryption|Decryption| |Single thread|0.154|3.463| |Multithreaded|0.116|0.850|

Key length: 4096 bits |Mode|Encryption|Decryption| |Single thread|0.347 sec|11.905 sec| |Multithreaded|0.287 sec|3.0222 sec|

Eight cores (two cpu’s) | Intel® Xeon® CPU E5-2609 0 @ 2.40GHz | (Linux on x86_64)

Key length: 2048 bits |Mode|Encryption|Decryption| |Single thread|0.038|0.898| |Multithreaded|0.013|0.115|

Key length: 4096 bits |Mode|Encryption|Decryption| |Single thread|0.053|2.936| |Multithreaded|0.017|0.382|

24 cores (2 cpus) | Intel® Xeon® CPU E5-2630 0 @ 2.30GHz | (Linux on x86_64)

Key length: 2048 bits |Mode|Encryption|Decryption| |Single thread|0.042|0.790| |Multithreaded|0.023|0.087|

Key length: 4096 bits |Mode|Encryption|Decryption| |Single thread|0.057|2.527| |Multithreaded|0.024|0.223|

On VMWare virtual machine that utilizes two core (Linux on i686)

Key length: 2048 bits

Mode Encryption Decryption
Single thread 0.573 14.645
Multithreaded 0.521 9.297

Key length: 4096 bits |Mode|Encryption|Decryption| |Single thread|0.931|53.61| |Multithreaded|0.897|29.773|

Summary of results / more internals

Results are consistent with predictions. Encrypting everything with RSA may still not be good enough for bulk data that spans many megabytes, but for shorter text it seems good enough.

Now I could have achieved a greater speedup if I were to use the GPU of the graphics card, but this a bit too complex for an evening of work.


Looking at the details of cryptography one can see to what extent technology did progress since the end of World War II. The Enigma and Lorenz machines used electrio-mechanical tricks, several hundreds of moving parts, in its complexity that was very similar to the typewriter that my father was using back when I was a kid. Now compare this complexity to the number of circuits in a modern computer. Amazing, Science fiction ! Also remember that back then everything related to cryptography was very tightly regulated.

… and the only way to break it is by cheating with the random number generator, by means of side channel attacks (such as watching power consumption directly ) ; or by the mythical beast - the quantum computer.