Interesting problem: You have RSA signatures and the signed data, and want to know the RSA public key that can be used to verify the signatures. For older signature schemes this is possible, if you have at least two signatures (or an oracle that can provide signatures on request).
Math is not my strong suit, but I found the necessary formula in this Cryptography StackExchange post: RSA public key recovery from signatures. It has the general idea, but is light on details and actual code.
Tools:
- OpenSSL to generate examples
- SageMath for the actual calculations. It has an absolutely wonderful Jupyter notebook interface.
First, let’s generate an example key and two example files. We’ll use 512 bits RSA for this example, which is about the minimum key size we can use, just to keep the examples short (in both screen real estate and calculation size). Don’t worry: while the calculation is ~30 seconds for 512 bits RSA, it’ll only grow to ~2.5 minutes for real-world 2048 bits RSA.
$ echo "Hallo, Welt" > hallowelt.txt $ echo "Hallo, Otto" > hallootto.txt $ openssl genrsa 512 > privkey.pem
RSA signatures are complicated beasts. In theory, you only have to hash the input and apply the RSA operation with the private key (that is, ‘decrypt’ it), but for various reasons this is highly insecure and never done in practice.
Instead, we’ll let OpenSSL handle the generation of signatures for our examples:
$ openssl dgst < hallowelt.txt -out hallowelt.txt.sig -sign privkey.pem $ openssl dgst < hallootto.txt -out hallootto.txt.sig -sign privkey.pem
The resultant *.sig files are 64 bytes each, matching the 512 bit RSA modulus.
To better understand the RSA signature generation process (and prepare the next step), let’s look ‘into’ the signatures:
$ openssl rsautl -encrypt -inkey privkey.pem -in hallowelt.txt.sig -raw | hd 00000000 00 01 ff ff ff ff ff ff ff ff ff ff 00 30 31 30 |.............010| 00000010 0d 06 09 60 86 48 01 65 03 04 02 01 05 00 04 20 |...`.H.e....... | 00000020 3e 6f f8 06 a5 b4 e7 e6 d7 4d 26 7f e3 db 90 a2 |>o.......M&.....| 00000030 e2 bc a3 70 e3 db 9b 10 73 fd 55 e1 06 a1 0c 2a |...p....s.U....*| $ openssl rsautl -encrypt -inkey privkey.pem -in hallowelt.txt.sig -raw | openssl asn1parse -offset 13 -inform der 0:d=0 hl=2 l= 49 cons: SEQUENCE 2:d=1 hl=2 l= 13 cons: SEQUENCE 4:d=2 hl=2 l= 9 prim: OBJECT :sha256 15:d=2 hl=2 l= 0 prim: NULL 17:d=1 hl=2 l= 32 prim: OCTET STRING [HEX DUMP]:3E6FF806A5B4E7E6D74D267FE3DB90A2E2BCA370E3DB9B1073FD55E106A10C2A $ openssl dgst -sha256 hallowelt.txt SHA256(hallowelt.txt)= 3e6ff806a5b4e7e6d74d267fe3db90a2e2bca370e3db9b1073fd55e106a10c2a
The first step ‘encrypts’ the signature (that is: applies the RSA operation with the public key) and prints a hexdump of the result. In the hexdump we see:
- Some padding: 00 01 ff ff … ff 00
- An ASN.1 structure, consisting of
- A sequence (tag 30, 49 bytes), of
- A sequence (tag 30, 13 bytes), of
- An object identifier (tag 06, 9 bytes) for sha256
- A NULL value (tag 05, 0 bytes)
- An octet string (tag 04, 32 bytes) with
- The SHA-256 hash (3e6ff8…a10c2a) of the signed data
- A sequence (tag 30, 13 bytes), of
- A sequence (tag 30, 49 bytes), of
The signature follows the PKCS#1 standard for RSA signatures. All the extra stuff serves to distinguish signatures with SHA-256 from signatures with other hashes, and to prevent some attacks on the padding. It’s also the reason why we can’t go much below 512 bits RSA if we want to demo with SHA-256. (It must be noted that PKCS#1 padding shouldn’t be used anymore. The new standard is RSASSA-PSS, which has a robust security proof, but also is randomized and completely foils the technique in this blog post.)
Let’s define the first set of functions to generate this sort of padding:
import hashlib def pkcs1_padding(size_bytes, hexdigest, hashfn): oid = {hashlib.sha256: '608648016503040201'}[hashfn] result = '06' + ("%02X" % (len(oid)/2)) + oid + '05' + '00' result = '30' + ("%02X" % (len(result)/2)) + result result = result + '04' + ("%02X" % (len(hexdigest)/2)) + hexdigest result = '30' + ("%02X" % (len(result)/2)) + result result = '0001' + ('ff' * int(size_bytes - 3 - len(result)/2) ) + '00' + result return result def hash_pad(size_bytes, data, hashfn): hexdigest = hashfn(data).hexdigest() return pkcs1_padding(size_bytes, hexdigest, hashfn)
A simple test:
hash_pad(64, "Hallo, Welt\n", hashlib.sha256)
‘0001ffffffffffffffffffff003031300D0609608648016503040201050004203e6ff806a5b4e7e6d74d267fe3db90a2e2bca370e3db9b1073fd55e106a10c2a’
To perform the gcd calculation, you need for each signature the corresponding signed data, the hash function used, and the public exponent of the RSA key pair. Both hash function and public exponent may need to be guessed, but the hash is usually SHA-256, and the exponent is usually 0x10001 (65537) or 3.
The full code is as follows:
import binascii, hashlib def message_sig_pair(size_bytes, data, signature, hashfn=hashlib.sha256): return ( Integer('0x' + hash_pad(size_bytes, data, hashfn)), Integer('0x' + binascii.hexlify(signature)) ) def find_n(*filenames): data_raw = [] signature_raw = [] for fn in filenames: data_raw.append( open(fn, 'rb').read() ) signature_raw.append( open(fn+'.sig', 'rb').read() ) size_bytes = len(signature_raw[0]) if any(len(s) != size_bytes for s in signature_raw): raise Exception("All signature sizes must be identical") for hashfn in [hashlib.sha256]: pairs = [message_sig_pair(size_bytes, m, s, hashfn) for (m,s) in zip(data_raw, signature_raw)] for e in [0x10001, 3, 17]: gcd_input = [ (s^e - m) for (m,s) in pairs ] result = gcd(*gcd_input) if result != 1: return (hashfn, e, result)
If we test it, we’ll find:
time hashfn, e, n = find_n('hallowelt.txt', 'hallootto.txt');
CPU times: user 27.3 s, sys: 609 ms, total: 27.9 s
Wall time: 28.4 s
print hex(n)
d9dac509621ed7f27b4868ab1874f649778c63f11000366e827cf18fd70db1e27f39902524e29aa2bfb3167627caaa408e17e907ee3c44e0321dc77fb8890075
And compare to the ground truth of our example:
$ openssl rsa -in privkey.pem -noout -text Private-Key: (512 bit) modulus: 00:d9:da:c5:09:62:1e:d7:f2:7b:48:68:ab:18:74: f6:49:77:8c:63:f1:10:00:36:6e:82:7c:f1:8f:d7: 0d:b1:e2:7f:39:90:25:24:e2:9a:a2:bf:b3:16:76: 27:ca:aa:40:8e:17:e9:07:ee:3c:44:e0:32:1d:c7: 7f:b8:89:00:75 publicExponent: 65537 (0x10001) [...]
Finally, to create a standard format PEM public key from our n and e:
from Crypto.PublicKey import RSA print RSA.construct( (long(n), long(e)) ).exportKey(format='PEM')
—–BEGIN PUBLIC KEY—–
MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBANnaxQliHtfye0hoqxh09kl3jGPxEAA2
boJ88Y/XDbHifzmQJSTimqK/sxZ2J8qqQI4X6QfuPETgMh3Hf7iJAHUCAwEAAQ==
—–END PUBLIC KEY—–
Which is exactly what we would get from OpenSSL:
$ openssl rsa -in privkey.pem -pubout writing RSA key -----BEGIN PUBLIC KEY----- MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBANnaxQliHtfye0hoqxh09kl3jGPxEAA2 boJ88Y/XDbHifzmQJSTimqK/sxZ2J8qqQI4X6QfuPETgMh3Hf7iJAHUCAwEAAQ== -----END PUBLIC KEY-----
Warning: The gcd method may sometimes return not n but a product k * n for a smallish value of k. You may need to check for small prime factors and remove them.
The code is available as a SageMath Jupyter notebook: rsa_find_n.ipynb (all example files).
The find_n function is written to accept arbitrary arguments (must be filenames where the file contains the data and the filename appended with .sig contains the signature) but will only work when given exactly two arguments. FactHacks: Batch gcd has a batchgcd_faster function that will work on an arbitrary number of arguments (but is slower in the 2‑argument case).