Randomization Algorithms 

What Are Randomization Algorithms

Randomization Algorithms, RAs for short, seem counterintuitive to what an algorithm should be. When we think of algorithms, we think of following the same steps over and over and getting the same output for a given input, this is known as deterministic algorithms. RAs are not deterministic algorithms. They are developed with some randomness inside them to achieve performance improvement over traditional algorithms that would produce the same, or close to the same, results. Notice that I said “close to the same” because when choosing to use RAs, you may need to accept that the output is not always 100% accurate. I’ll explain more on this below as there are two classes of RAs.

Why Use Randomization Algorithms

You now must be thinking, why would I ever choose to reach for RAs. What situation would I find myself in that I would be willing to accept inaccurate results when I can reach for other algorithms that would give me 100% accuracy. There are many complex reasons, but I will stick with two simple examples so we can get the picture.  

Say you send a file over to one of your friends. Then whenever you open the file, it needs to validate that it’s the same as your friend’s file. The Brute Force way would be to send over the whole file again and check it on their end. Then they would send you a message to say if it’s the current version. However, this would take a long time and cause a lot of overhead if the file was huge. Imagine that the file was 100GB. That would take a great deal of time, memory, and network overhead to repeatedly send that file. The RA approach would be to generate a value from a random sampling of the file and use that to compare with your friend’s file. If that value is not the same, we can undoubtedly say that the files are not the same version. If the values are the same, we can say with some confidence that the files are the same. Of course, you are trading off a significant overhead for 100% accuracy.

Another typical example would be Load Balancing. When a Load Balancer needs to decide on what server to send traffic to, it may reach a tie with whatever internal algorithm it has. So, in this case, we introduce some randomness for it to pick which server to send the traffic.

Types of Randomization Algorithms

There are two main classes of RAs, Las Vegas algorithms and Monte Carlo algorithms. Las Vegas algorithms always output the correct answer, but their run time may vary. The most prime examples of such algorithms are quicksort and quickselect. Monte Carlo algorithms have a constant run time but they might output an incorrect answer, typically with a probability less than O(1/n).