Scaling and benchmarking
Cloudflare’s Keyless SSL technology was designed to scale to accommodate any sized workload using vertical and horizontal scaling, and pre-computation techniques wherever possible, such as ECDSA. The goals of the architectural design of the key server are to minimize latency while maximizing signing operations per second.
Each key server uses a worker pool model, with incoming client connections handled by its own pair of reader/writer goroutines and cryptographic work done in separate worker goroutines pulled from a a global pool.
Where needed, multiple key servers can be deployed and balanced between using your preferred ingress load balancing configuration. For full high availability, make sure to deploy sufficient key servers to handle twice the expected workload.
ECDSA signing can be broken down into two steps. Since the first step — generating random values (to be used later with the private key and message to be signed) — represents the majority of the computational cost, we pre-generate these random values to significantly reduce latency. ECDSA signing requests are computationally isolated from RSA signing requests using separate worker pools to keep them as fast as possible.
c5$ cat /proc/cpuinfo|grep "model name"model name : Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHzmodel name : Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHzmodel name : Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHzmodel name : Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
By default, bench runs with one worker goroutine per core (4) and a maximum number of operating system threads equal to the total number of cores (in this case,
GOMAXPROCS=4). As expected and explained above, ECDSA signature performance far exceeds that of RSA. The that each core of this c5.xl machine can perform over 10,000 ECDSA signing operations/second and approximately 200 RSA signing operations/second.
When planning your deployment, determine the maximum number of new TLS connections per second you expect to terminate using a given key server and scale accordingly. For full high availability, each data center running keyless should be able to terminate the full workload that you anticipate.
c5$ bench -ski $ECDSA_SKI -op ECDSA-SHA256 -bandwidth -duration 60sTotal operations completed: 2661570Average operation duration: 22.543µs
c5$ bench -ski $RSA_SKI -op RSA-SHA256 -bandwidth -duration 60sTotal operations completed: 46560Average operation duration: 1.288659ms.