I was talking with a colleague recently who was helping me translate a report generated from a diagnostic tool. He has extensive knowledge in this area and was able to help me find the root cause of the issue I was working through fairly quickly.
After talking with him for a while, it turned out that he had written multiple “scripts” that would health check an implementation of this particular software in numerous ways. On top of this, he had built a framework with which you could change a few configuration files to adjust which batches of scripts you want to run, launch the tool and it would do it’s job. Sounds great right? It was, except that there was one fairly large flaw.
This tool is something he has built up over a very long time (10+ years), either by adding new scripts as needed, or by adjusting existing ones. I can really only imagine the hours that have been spent on it, as there are a hundreds of these scripts. The tool is fairly central to what he does and is used daily, not only by him but by many others as well. The problem comes when this is required to run on client machines that are somewhat sensitive and where we are provided very limited access or none at all.
While I personally would love for everything to be open source and the pros almost always outweigh the cons, it’s just not possible in all situations, nor can it be worth while in some cases. If you have specific intellectual property that is core to your business, this obviously needs to remain private. Sometimes your software may be so niche that it’s not worth making it open source as nobody else would use it anyway. In this case it was the former.
Releasing this software to a client to have them run it on their sensitive systems and produce the reports, gave us two problems:
- Did the scripts actually come from us? i.e. Could someone drop malicious scripts into this tool or edit the scripts without our knowledge?
- Once you release this software from your control to a client who needs to run this on their sensitive systems, who knows where it could end up.
For many years, the main way this problem was partially tackled was by simply adding a copyright and confidential notice at the top of every single script. While this is a form of legal protection, it obviously isn’t ideal to rely solely upon this. Content is plagiarised all the time, and you can end up in a very grey area when some minor modifications are made; here’s looking at you Music Industry.
There’s two main forms of cryptography we can use to reinforce this tool, protect clients and protect intellectual property, these are;
- Asymmetric Encryption
- Symmetric Encryption
Asymmetric encryption or Public Key Cryptography  is where a PUBLIC/PRIVATE key pair is generated. Generally the PUBLIC key is used to encrypt data and the PRIVATE key is used to decrypt it again.
So let’s imagine in a hypothetical situation that you have a Twitter account and you want everyone else in the world to be able to publish messages that only you can read. By releasing your PUBLIC key, people can encrypt their messages using this key and only you would be able to decrypt them. Note: this is purely a hypothetical situation, cryptography is governed by law and has different laws based on the country and type of implementation .
It doesn’t matter that you have published the PUBLIC key, as once this key pair has been generated, it’s not really possible to derive the PRIVATE key from the PUBLIC key. You can however very easily derive the PUBLIC key from the PRIVATE key, so if the PRIVATE key was ever shared, then the entire system needs to be considered compromised.
This is, very basically, how HTTPS works as compared to HTTP. On your computer, you have a “key store” where the public certificates are stored. When you decide to submit a form such as enter credit card information on a website, the data is encrypted with the PUBLIC key before being sent to the server. The server then decrypts this data using the PRIVATE key. There’s a lot more to it and these PUBLIC certificates are verified by external entities, but it gives you the idea. Now you understand why the Heartbleed bug in OpenSSL  was such a big problem, since it exposed the PRIVATE key.
While you can encrypt data with the PUBLIC key and decrypt the resulting message with the PRIVATE key, the same is not true in the opposite direction. Instead, you can SIGN a message with the PRIVATE key which generates a hash or “message digest”. This allows you to publish a message, alongside the message digest, which can then be VERIFIED  by the PUBLIC key. So in the above hypothetical situation all of your Twitter followers can be assured that a particular message was written by you, but it would be displayed in plain text for everyone to read.
Unfortunately, there is one fairly large limitation of using asymmetric encryption, where you also need to factor in performance. You can only encrypt data smaller than the key size and for the most part today this is generally 1024 to 4096 bits of data; which is not a lot. For a 2048 bit binary key (2048 0s and 1s) 2^2048, this can be represented as approximately 617 decimal digits (3.23E616) or 256 ASCII characters (2048/8).
Originally and historically, Twitter was known for having a 140 character limit. So while this might work perfectly well for our hypothetical situation, things have moved on and it certainly wouldn’t work for this tool.
In the case of RSA  you also want to make sure that the message is always smaller than the key size and not exactly the same length. This is due to the way the message is “padded” out to fill the key size and is an important part of the protection. If you know the length of the message it weakens the cryptography.
So what do you do when you want to encypt and decrypt larger volumes of data?
Symmetric encryption is where you generate a KEY that both parties need to have. Data (of any length, in this case if it can fit in memory it’s good) can be encrypted, but you must share the KEY so that it can be decrypted on the other end.
There’s really not much more to say about symmetric encryption, you generate the key and use it to both encrypt and decypt data. The power comes when you combine this with asymmetric encryption.
Combining Asymmetric and Symmetric Encryption
In the context of this tool, let’s imagine that you want to provide it to a client to run a certain collection of scripts. You want the client to be assured that the scripts come from you, while also making it a little more difficult to just openly view the scripts. We also want to make sure that nobody else can “drop” malicious scripts into this tool and have them run.
First we generate an aysmmetric PUBLIC and PRIVATE key pair. Then:
- For each script generate a one-time symmetric key. So a different symmetric key for each script.
- Use the symmetric key to encrypt the script.
- Sign the symmetric key with the asymmetric PRIVATE key.
To make it a little more clear, let’s imagine that this has now turned the single file script.txt into 3 files;
- script.enc (the encrypted script)
- script.key (the key to decrypt the script)
- script.sig (the message digest or “signature” of script.key that was generated from the PRIVATE key)
Ok so now we have these 3 files, the client has only been provided the PUBLIC key alongside the tool which is enough to VERIFY the SIGNATURE and be assured that any scripts came from us. We’ve also added another layer of protection which is that the scripts would need to be decrypted to read them. Not perfect, but combined with the fact that the tool will not run any scripts that do not have an accompanying signature for the key, we’re doing quite well.
When this particular tool runs, it generates a report for each script. So what can we do here? Well, we can utilise the PUBLIC key that has been provided with the tool. As each report is generated, we can;
- Generate a one-time symmetric key.
- Encrypt the report with the symmetric key.
- Encrypt the symmetric key with the PUBLIC key (since it’s under 2048/4096 bits this is perfectly fine)
To illustrate, imagine now you have 2 files for each report instead of one:
Without the PRIVATE key, it’s virtually impossible to decrypt these reports, and they need to be sent back to you for decryption before they can be evaluated.
So that’s it. A basic implementation of cryptography to make a tool more robust while protecting some IP. After this was done, I ended up writing 4 small additional tools to help utilise this new implementation:
- encrypt.exe – A tool to simulate the encryption of reports.
- decrypt.exe – A tool to decrypt reports.
- sign.exe – A tool to encrypt and sign scripts.
- verify.exe – A tool to simulate the decryption process as the tool runs the scripts.
Remember, you can have all the data in the world, but without the proper knowledge to translate that data into something meaningful, it’s fairly useless; which is what got me into this in the first place.