RandStream

Copyright (C) 2000 by Zygo Blaxell

Download

Source and Linux-i386 binary tarball and GPG signature.

Introduction

RandStream is a program designed for generating very large amounts of mediocre-quality random data and writing it very fast to a raw disk device.

As part of my site's security policy, all storage media that is moved from one machine to another or that changes ownership must be completely erased at the lowest level available to software as a minimum security measure to prevent unauthorized disclosure of private data.

One way to do that is by booting from a floppy and typing something like:

cat /dev/zero > /dev/sda
However, this has a number of drawbacks:
  1. The stream of zeros is a dead giveaway that the hard disk was deliberately erased. It is easy to scan the hard disk and determine that yes, there really isn't any useful data on it.
  2. A few malicious programs (or people!) sometimes search for or leak information from the disk. Examples of such programs include:
  3. It's really boring...all those zeros. The monotony... ;-)

Double your Entropy, Double your Fun...

Instead of erasing hard disks with a stream of zeros, I erase them with streams of random data. This has a number of advantages:
  1. A stream of random data might be random data...or it might be compressed or encrypted data...or it might be exectuable code. The neat thing about digital data storage is that what a pattern of ones and zeros represents depends on how you want to interpret it.
  2. Malicious programs that look for information on the disk will find no shortage of interesting stuff to look at--or, in some cases, they crash and make themselves visible as a result! Pure entropy contains lots and lots of information, according to classic information theory. Totally useless information, of course, but there's no way to know that without examining each and every useless random bit. IMHO anything that wastes more of your adversary's processing time without using a significant amount of your own is a good thing.
  3. The previous two points appeal to the cypherpunk in me. ;-)
One way to generate a lot of random data on Linux is by typing something like:
cat /dev/urandom > /dev/sda
However, this runs a very slow PRNG, one that is much slower than a modern disk drive. Erasing a 20 gigabyte disk takes hours.

Of course, there is no /dev/urandom device for non-Linux Unixes (yet), so it's not a viable technique for, say, a Solaris machine.

Fast Random Number Generation: RandStream

Enter RandStream. RandStream accepts an entropy input data stream on stdin, feeds that entropy into a pseudo random-number generator (PRNG), and writes an entropy output data stream on stdout. This is done in three separate threads with each thread doing its job at the maximum speed that the hardware will allow.

The input entropy stream can be any source of random data. On Linux, a good source is /dev/random, which is a PRNG that dispenses entropy collected from high-resolution timers and interrupt counters inside the kernel. Another usable source of entropy is /dev/audio attached to an audio source. If you have a multi-user machine you can use:

while sleep 1; do 
	ps aux; netstat -an; df
done | randstream ...
If you have a busy LAN, you can use:
tcpdump -ne | randstream ...
Or all of them together:
(
	cat /dev/random &
	cat /dev/audio &
	tcpdump -ne &
	while sleep 1; do
		ps aux; netstat -an; df
	done
) | randstream ...
If you want to stress-test your hardware, you can use the very hard disk you're erasing as an entropy source. As a last resort, you can use the 'date' command to produce a very small amount of entropy.

If you provide no input to stdin at all, RandStream will generate some entropy for you, using the system time and date, pid, ppid, uid, euid, gid, and egid.

RandStream has entropy amplification characteristics comparable to /dev/urandom but throughput comparable to /dev/zero. The PRNG in RandStream is quite a bit faster than Linux's /dev/urandom PRNG, although still up to 20 times slower than modern hard disks. Because the output thread writes as fast as it possibly can, and because disk DMA hardware is often faster than the software PRNG, there can be repeating patterns in the output. The non-deterministic nature of three unsynchronized threads in theory adds to the overall entropy produced, but in practice, unless you have an SMP machine, the scheduler will cause entropy to be mixed into the data stream in bursts, with the last of that generated entropy repeated many times in the output.

Remember: RandStream is designed to clobber a disk as quickly as possible. For that purpose it is not necessary to have high-quality cryptographically strong random output, so RandStream simply generates output of as much quality as it can while keeping the output device I/O saturated. Changing this would make RandStream significantly slower--but if you are willing to make that tradeoff, you might as well use GnuPG's "--gen-random 0" feature, or simply "cat /dev/urandom".

How to Use RandStream

WARNING: RandStream, like all pure software disk erasing tools, will not prevent a sufficiently well-funded adversary from extracting erased data from rewritable media. Using tools that can scan the disk a few thousand atoms at a time, it is possible to retrieve dozens or even thousands of generations of written and erased data from the disk. Truly secure erasure of magnetic media requires using non-standard high-energy magnetic impulses at fairly high energy levels--a procedure that also removes the registration marks that the hard disk controller itself requires to navigate from place to place on the disk, therefore rendering the device useless. If you really want to prevent anyone from extracting data from a hard disk, a much more cost-effective solution is to either shred the device or melt it down. To date, I know of no technology that will allow retrieval of deleted files from a disk platter that has been melted down and turned into costume jewellery.

Typical usage looks like this:

randstream 1048576 < /dev/random > /dev/sda
This uses a 1-megabyte write buffer, reads entropy from /dev/random, and erases the entire first SCSI disk on a Linux system. For ideal results, use a write buffer that is 25% smaller than the total installed RAM on the system. e.g. on a machine with 256MB of RAM, use a buffer size of 201326592, and on a machine with 8MB of RAM, use a buffer size of 6291456. In order to further confuse your opponent(s), vary the number of bytes by a small amount. Note that the buffer size may be restricted to even numbers only depending on compilation options.

RandStream will stop generating output when it encounters an error while writing. If you wish to generate a limited amount of output, then pipe the output of RandStream into head:

randstream 1048576 < /dev/random | head -c 1474560 > floppy-image.bin
If you wish to erase a disk while ignoring write errors, use dd:
randstream 1048576 < /dev/random | dd bs=512 conv=noerror of=/dev/sda
If RandStream runs out of input (i.e. there is a read error or EOF on stdin), then the PRNG and output threads will simply continue without adding any further random entropy.

If you want to repeatedly rewrite a hard disk, e.g. to stress-test a controller-cable-device chain, use one of the following:

while :; do
	randstream 1048576 < /dev/random > /dev/sda
done

Tuning

The major user-serviceable part in the source code is the size of the PRNG data type. The PRNG has an 8-bit mode and a 16-bit mode. This is currently not a run-time option because one or the other usually performs consistently better on a given CPU architecture. To change this, look for the line:
#define USE_RC4_SHORT
and change it to:
#define USE_RC4_CHAR
While RandStream runs, it produces some statistical output:
$ randstream $[25*1024*1024] < /dev/urandom | head -c 500000000 > /dev/null
#Id: randstream.c,v 1.8 2000/02/11 21:53:04 cvs Exp $
Copyright (C) 2000 Zygo Blaxell 
This is free software released under the Free Software Foundation's GPL.
There is no warranty of any kind.  See the source code for details.
Initializing PRNG...
Collecting initial entropy (size = 256)...Stirring...Done.
Allocating output buffer (size = 26214400)...Obfuscating...Done.
Starting threads...
New thread:  entropy collection function (collector)
New thread:  entropy amplification function (prng)
New thread:  data writer (output)
New thread:  statistics reporter (monitor)
W:   0B (  0.00B/s) E:   0B (  0.00B/s) P:   0B (  0.00B/s) T:   0s
W:   0B (  0.00B/s) E:   0B (  0.00B/s) P:   0B (  0.00B/s) T:   1s
W:  26M ( 13.11M/s) E:   0B (  0.00B/s) P:   0B (  0.00B/s) T:   2s
W:  52M ( 17.48M/s) E:   0B (  0.00B/s) P:   0B (  0.00B/s) T:   3s
W:  78M ( 19.66M/s) E: 131K ( 32.77K/s) P:   0B (  0.00B/s) T:   4s
W:  78M ( 15.73M/s) E: 131K ( 26.21K/s) P:   0B (  0.00B/s) T:   5s
W: 104M ( 17.48M/s) E: 131K ( 21.85K/s) P:   0B (  0.00B/s) T:   6s
W: 131M ( 18.72M/s) E: 131K ( 18.72K/s) P:   0B (  0.00B/s) T:   7s
W: 131M ( 16.38M/s) E: 262K ( 32.77K/s) P:   0B (  0.00B/s) T:   8s
W: 157M ( 17.48M/s) E: 262K ( 29.13K/s) P:   0B (  0.00B/s) T:   9s
W: 183M ( 18.35M/s) E: 262K ( 26.21K/s) P:   0B (  0.00B/s) T:  10s
W: 209M ( 19.07M/s) E: 393K ( 35.75K/s) P:  26M (  2.38M/s) T:  11s
W: 235M ( 19.66M/s) E: 393K ( 32.77K/s) P:  26M (  2.18M/s) T:  12s
W: 235M ( 18.15M/s) E: 393K ( 30.25K/s) P:  26M (  2.02M/s) T:  13s
W: 262M ( 18.72M/s) E: 524K ( 37.45K/s) P:  26M (  1.87M/s) T:  14s
W: 288M ( 19.22M/s) E: 524K ( 34.95K/s) P:  26M (  1.75M/s) T:  15s
W: 314M ( 19.66M/s) E: 524K ( 32.77K/s) P:  26M (  1.64M/s) T:  16s
W: 340M ( 20.05M/s) E: 524K ( 30.84K/s) P:  26M (  1.54M/s) T:  17s
W: 340M ( 18.93M/s) E: 655K ( 36.41K/s) P:  26M (  1.46M/s) T:  18s
W: 367M ( 19.32M/s) E: 655K ( 34.49K/s) P:  26M (  1.38M/s) T:  19s
W: 393M ( 19.66M/s) E: 786K ( 39.32K/s) P:  26M (  1.31M/s) T:  20s
W: 419M ( 19.97M/s) E: 786K ( 37.45K/s) P:  52M (  2.50M/s) T:  21s
W: 445M ( 20.26M/s) E: 786K ( 35.75K/s) P:  52M (  2.38M/s) T:  22s
W: 471M ( 20.52M/s) E: 917K ( 39.89K/s) P:  52M (  2.28M/s) T:  23s
output: write: Broken pipe
output: exiting.
W: 500M ( 20.83M/s) E: 917K ( 38.23K/s) P:  52M (  2.18M/s) T:  24s
W: 500M ( 20.00M/s) E: 917K ( 36.70K/s) P:  52M (  2.10M/s) T:  25s
Read:    917504 bytes in 7 reads  (35288.615/sec), 7 cycles
Write:   500006912 bytes in 20 writes (19231035.077/sec), 20 cycles
Entropy: 917504 bytes in 7 stirs  (35288.615/sec, 131072.000/stir)
PRNG:    78643200 bytes in 3 cycles (3024738.462/sec), 26 sec
$ 
Note the choice of the buffer size: here I have arbitrarily chosen 26214400 bytes (25 megabytes). You want the buffer size to be about 25% smaller than the amount of free memory you have on the system. This will give you maximum throughput while maintaining a large distance between instances of repeated data. So for a machine with 128 megs of RAM, choose an output buffer size of 100 megabytes.

The "W:" column indicates write throughput (on stdout), giving the total amount of data written and the rate per second in brackets. The "E:" column indicates entropy input (on stdin), again as a total amount and a rate. The "P:" column similarly indicates PRNG throughput: the number of random bytes generated by the PRNG. The "T:" column simply indicates time.

The final statistics at the end indicate on four lines:

  1. Read:
  2. Write:
  3. Entropy:
  4. PRNG:

$Id$