Interesting problem: You have RSA signatures and the signed data, and want to know the RSA public key that can be used to verify the signatures. For older signature schemes this is possible, if you have at least two signatures (or an oracle that can provide signatures on request).

Math is not my strong suit, but I found the necessary formula in this Cryptography StackExchange post: RSA public key recovery from signatures. It has the general idea, but is light on details and actual code.

Tools:

- OpenSSL to generate examples
- SageMath for the actual calculations. It has an absolutely wonderful Jupyter notebook interface.

First, let’s generate an example key and two example files. We’ll use 512 bits RSA for this example, which is about the minimum key size we can use, just to keep the examples short (in both screen real estate and calculation size). Don’t worry: while the calculation is ~30 seconds for 512 bits RSA, it’ll only grow to ~2.5 minutes for real-world 2048 bits RSA.

$ echo "Hallo, Welt" > hallowelt.txt
$ echo "Hallo, Otto" > hallootto.txt
$ openssl genrsa 512 > privkey.pem

RSA signatures are complicated beasts. In theory, you only have to hash the input and apply the RSA operation with the private key (that is, ‘decrypt’ it), but for various reasons this is highly insecure and never done in practice.

Instead, we’ll let OpenSSL handle the generation of signatures for our examples:

$ openssl dgst < hallowelt.txt -out hallowelt.txt.sig -sign privkey.pem
$ openssl dgst < hallootto.txt -out hallootto.txt.sig -sign privkey.pem

The resultant `*.sig` files are 64 bytes each, matching the 512 bit RSA modulus.

To better understand the RSA signature generation process (and prepare the next step), let’s look ‘into’ the signatures:

$ openssl rsautl -encrypt -inkey privkey.pem -in hallowelt.txt.sig -raw | hd
00000000 00 01 ff ff ff ff ff ff ff ff ff ff 00 30 31 30 |.............010|
00000010 0d 06 09 60 86 48 01 65 03 04 02 01 05 00 04 20 |...`.H.e....... |
00000020 3e 6f f8 06 a5 b4 e7 e6 d7 4d 26 7f e3 db 90 a2 |>o.......M&.....|
00000030 e2 bc a3 70 e3 db 9b 10 73 fd 55 e1 06 a1 0c 2a |...p....s.U....*|
$ openssl rsautl -encrypt -inkey privkey.pem -in hallowelt.txt.sig -raw | openssl asn1parse -offset 13 -inform der
0:d=0 hl=2 l= 49 cons: SEQUENCE
2:d=1 hl=2 l= 13 cons: SEQUENCE
4:d=2 hl=2 l= 9 prim: OBJECT :sha256
15:d=2 hl=2 l= 0 prim: NULL
17:d=1 hl=2 l= 32 prim: OCTET STRING [HEX DUMP]:3E6FF806A5B4E7E6D74D267FE3DB90A2E2BCA370E3DB9B1073FD55E106A10C2A
$ openssl dgst -sha256 hallowelt.txt
SHA256(hallowelt.txt)= 3e6ff806a5b4e7e6d74d267fe3db90a2e2bca370e3db9b1073fd55e106a10c2a

The first step ‘encrypts’ the signature (that is: applies the RSA operation with the public key) and prints a hexdump of the result. In the hexdump we see:

- Some padding:
`00 01 ff ff … ff 00`
- An ASN.1 structure, consisting of
- A sequence (tag
`30`, 49 bytes), of
- A sequence (tag
`30`, 13 bytes), of
- An object identifier (tag
`06`, 9 bytes) for `sha256`
- A NULL value (tag
`05`, 0 bytes)

- An octet string (tag
`04`, 32 bytes) with
- The SHA-256 hash (
`3e6ff8…a10c2a`) of the signed data

The signature follows the PKCS#1 standard for RSA signatures. All the extra stuff serves to distinguish signatures with SHA-256 from signatures with other hashes, and to prevent some attacks on the padding. It’s also the reason why we can’t go much below 512 bits RSA if we want to demo with SHA-256. (It must be noted that PKCS#1 padding shouldn’t be used anymore. The new standard is RSASSA-PSS, which has a robust security proof, but also is randomized and completely foils the technique in this blog post.)

Let’s define the first set of functions to generate this sort of padding:

import hashlib
def pkcs1_padding(size_bytes, hexdigest, hashfn):
oid = {hashlib.sha256: '608648016503040201'}[hashfn]
result = '06' + ("%02X" % (len(oid)/2)) + oid + '05' + '00'
result = '30' + ("%02X" % (len(result)/2)) + result
result = result + '04' + ("%02X" % (len(hexdigest)/2)) + hexdigest
result = '30' + ("%02X" % (len(result)/2)) + result
result = '0001' + ('ff' * int(size_bytes - 3 - len(result)/2) ) + '00' + result
return result
def hash_pad(size_bytes, data, hashfn):
hexdigest = hashfn(data).hexdigest()
return pkcs1_padding(size_bytes, hexdigest, hashfn)

A simple test:

hash_pad(64, "Hallo, Welt\n", hashlib.sha256)

`‘0001ffffffffffffffffffff003031300D0609608648016503040201050004203e6ff806a5b4e7e6d74d267fe3db90a2e2bca370e3db9b1073fd55e106a10c2a’`

To perform the `gcd` calculation, you need for each signature the corresponding signed data, the hash function used, and the public exponent of the RSA key pair. Both hash function and public exponent may need to be guessed, but the hash is usually SHA-256, and the exponent is usually 0x10001 (65537) or 3.

The full code is as follows:

import binascii, hashlib
def message_sig_pair(size_bytes, data, signature, hashfn=hashlib.sha256):
return ( Integer('0x' + hash_pad(size_bytes, data, hashfn)), Integer('0x' + binascii.hexlify(signature)) )
def find_n(*filenames):
data_raw = []
signature_raw = []
for fn in filenames:
data_raw.append( open(fn, 'rb').read() )
signature_raw.append( open(fn+'.sig', 'rb').read() )
size_bytes = len(signature_raw[0])
if any(len(s) != size_bytes for s in signature_raw):
raise Exception("All signature sizes must be identical")
for hashfn in [hashlib.sha256]:
pairs = [message_sig_pair(size_bytes, m, s, hashfn) for (m,s) in zip(data_raw, signature_raw)]
for e in [0x10001, 3, 17]:
gcd_input = [ (s^e - m) for (m,s) in pairs ]
result = gcd(*gcd_input)
if result != 1:
return (hashfn, e, result)

If we test it, we’ll find:

time hashfn, e, n = find_n('hallowelt.txt', 'hallootto.txt');

`CPU times: user 27.3 s, sys: 609 ms, total: 27.9 s`

Wall time: 28.4 s

print hex(n)

`d9dac509621ed7f27b4868ab1874f649778c63f11000366e827cf18fd70db1e27f39902524e29aa2bfb3167627caaa408e17e907ee3c44e0321dc77fb8890075`

And compare to the ground truth of our example:

$ openssl rsa -in privkey.pem -noout -text
Private-Key: (512 bit)
modulus:
00:d9:da:c5:09:62:1e:d7:f2:7b:48:68:ab:18:74:
f6:49:77:8c:63:f1:10:00:36:6e:82:7c:f1:8f:d7:
0d:b1:e2:7f:39:90:25:24:e2:9a:a2:bf:b3:16:76:
27:ca:aa:40:8e:17:e9:07:ee:3c:44:e0:32:1d:c7:
7f:b8:89:00:75
publicExponent: 65537 (0x10001)
*[...]*

Finally, to create a standard format PEM public key from our `n` and `e`:

from Crypto.PublicKey import RSA
print RSA.construct( (long(n), long(e)) ).exportKey(format='PEM')

`—–BEGIN PUBLIC KEY—–`

MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBANnaxQliHtfye0hoqxh09kl3jGPxEAA2

boJ88Y/XDbHifzmQJSTimqK/sxZ2J8qqQI4X6QfuPETgMh3Hf7iJAHUCAwEAAQ==

—–END PUBLIC KEY—–

Which is exactly what we would get from OpenSSL:

$ openssl rsa -in privkey.pem -pubout
writing RSA key
-----BEGIN PUBLIC KEY-----
MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBANnaxQliHtfye0hoqxh09kl3jGPxEAA2
boJ88Y/XDbHifzmQJSTimqK/sxZ2J8qqQI4X6QfuPETgMh3Hf7iJAHUCAwEAAQ==
-----END PUBLIC KEY-----

Warning: The `gcd` method *may* sometimes return not `n` but a product `k * n` for a smallish value of `k`. You may need to check for small prime factors and remove them.

The code is available as a SageMath Jupyter notebook: rsa_find_n.ipynb (all example files).

The `find_n` function is written to accept arbitrary arguments (must be filenames where the file contains the data and the filename appended with `.sig` contains the signature) but will only work when given exactly two arguments. FactHacks: Batch gcd has a `batchgcd_faster` function that will work on an arbitrary number of arguments (but is slower in the 2‑argument case).