Key Ideas
BACK: Introduction
Key Idea #0: UNDERSTAND THE ALGORITHM
After helping students with this class for a year, I’ve found that the biggest difference between the students that ‘get it’ and the students that ‘don’t’ is that those that get it: you need to understand the algorithm. You’re never going to be able to debug your code if you don’t know what the intermediary steps are within each step.
For example: We’re going to go over a lot of different sorting algorithms. Obviously we know our input is going to be some randomly arranged set of numbers and that our output is going to be the same set of numbers but in increasing/decreasing order. I’ve seen people just straight up try to follow the pseudo code in the textbook without understanding how it works and then wonder why their C++ code doesn’t work.
WHITEBOARD. NOTEBOOK THE STEPS OUT. BUY A DECK OF CARDS.
Some of my friends that come to my sessions laugh every time I bring out my deck of cards. (BTW–cards look the same from both sides of the table so when you’re demonstrating an algorithm with a deck of cards they see the exact same thing, but the output is in decreasing order).
Insertion Sort
(Geez, that gif took so long to make..) Insertion sort works in the same way that you might sort a hand of cards. We consider start at the left and consider the first card as our ‘sorted’ section. As we consider the next card, we determine whether where it should be relative to that first card. The next card, we where the card belongs relative to the next two cards. As this happens, the left side of our hand is sorted before we consider the next card in the right side of our hand. Here is the pseudocode:
\[{ \text{INSERTION-SORT(}A\text{)}\\ 1 \quad\textbf{for } j = 2 \textbf{ to } A.length\\ 2 \quad\quad key = A[j]\\ 3 \quad\quad\text{// Insert A[j] into the sorted sequence A[1...j-1].}\\ 4 \quad\quad i = j - 1\\ 5 \quad\quad \textbf{while } i > 0 \text{ and } A[i] > key\\ 6 \quad\quad\quad A[i+1] = A[i]\\ 7 \quad\quad\quad i = i - 1\\ 8 \quad\quad A[i+1] = key\\ }\]Cool. It works for this input of 2, 3, 4, 5, 6. But how do we know that this algorithm is correct–that it works for any given input \(X\) ?
Key Idea #1: Correctness of an Algorithm
We know that the process of insertion sort works on this input, but want to extrapolate the ideas to larger \(N\) (size of the array). Sounds like some form of induction! Loop invariants goes hand in hand with mathematical induction: we want to prove a base case and an inductive step. In this case, we want to prove that the above pseudocode for insertion sort is gives the correct output for any input.
Three Parts of Loop Invariants
Every (correct) loop invariant needs to hold the following properties (follows the steps of induction):
- Initialization: The statement is true, prior to the first iteration of the loop.
- Maintenance: If it is true before an iteration of a loop, it remains true before the next iteration. Each iteration maintains the loop invariant.
- Termination: When the loop terminates, the loop invariant gives us a useful property that helps show that the algorithm is correct.
In other words: If this a test problem, English as much as you can. Put your thoughts into words and try to explain how your pseudocode helps solve the problem at hand, whether it is a sorting algorithm or whatever.
Example for Insertion Sort. Essentially my description of insertion sort acts as the loop invariant, in an extremely informal way. Once I put it into more concise English, it is easier to translate it into notation: “At the beginning of each iteration, the left side of the array is sorted.” Note that the loop begins at \(\text{j}=2\) (second element, 1-indexed).
But what is the left side of the array? Well we know that any array with just one element in it is sorted. So we know that if we treat the first element in our \(\text{[6, 5, 4, 2, 3]}\) array as a \(\text{[6]}\), we know that this subarray is already sorted. We can denote this as \(\text{A[1...j-1]}\).
So a loop invariant to prove the correctness of insertion sort might look like: At the start of each iteration of the for loop of lines 1-8, the subarray \(\text{A[1...j-1]}\) consists of the elements originally in \(\text{A[1...j-1]}\), but in sorted order. Translation: After each iteration of the for loop, the subarray/left side of our hand will be sorted.
Now we need to prove that the properties of the loop invariants hold:
- Initialization: Check the statement before the for loop begins. When \(\text{j=2}\), our loop invariant suggests that our subarray \(\text{A[1...2-1]} = \text{A[1]}\) is sorted. Since any array of size 1 is sorted, our ‘base case,’ initialization step is held.
- Maintenance: The pseudocode suggests that we move \(\text{A[j-1], A[j-2], A[j-3] ...}\) to the right until we find the right position for \(\text{A[j]}\) (in the GIF, I swap them until we find the right position for the original \(\text{A[j]}\)). We can see that once we increment j, the loop invariant still holds (\(\text{A[1...j-1]}\)), so we can argue that our loop invariant holds the maintenance property.
- Termination: Essentially, if we keep iterating our \(\text{j}\), we will eventually come to \(\text{j=N+1}\). At that time, the loop will terminate and our subarray that we consider will be \(\text{A[1...N]}\). Our loop invariant says that that subarray will be sorted, and therefore, we have sorted our array–proving the correctness of this pseudocode.
Key Idea #2: Analyzing an Algorithm
We study and analyze algorithms because want to be able to predict how much memory or time running it will take. Obviously, we expect the time it takes to sort a 3-element array would take longer than a 999999999999999-element array–but how much longer?
There will be cases where we will consider the space an algorithm takes, but we will primarily focus on the running time. Each computing machine will take a different time to run a single operation, so we don’t want to consider that in our analysis. Instead, we say that running time is going to be how many steps the algorithm takes to complete, which is dependent on the size of an input–how many numbers we want to sort. Consider the linear search problem, which returns the index of a key if it exists in an array:
\[{ \begin{array}{lll} \text{LINEAR-SEARCH(A, key)} & \text{cost} & \text{times}\\ 1\quad\text{index} = NULL & c_1 & 1\\ 2\quad\textbf{for } i = 1 \textbf{ to } A.length & c_2 & n + 1\\ 3\quad\quad \text{if } A[i] = key & c_3 & n\\ 4\quad\quad\quad \text{index} = i & c_4 & 1\\ 5\quad\quad\quad \textbf{break} & c_5 & 1\\ 6\quad\textbf{return }\text{index} & c_6 & 1 \end{array} }\]We start with our condition that \(i = 1\). If you consider the way a for loop works, we check the condition at the beginning of each iteration of the loop, meaning we would check that condition \(n + 1\) times. As mentioned before, we don’t want to consider how long it takes for a machine to run a certain check. So we say that the time itself is going to be some constant \(c_n\), but the number of times the operation is run would be \(n + 1\). At the \(n + 1\) th check of the condition, we would exit, because our \(i\) would be bigger than N. To find the total running time of this pseudocode, we can just add it all up:
\[{ T(n) = c_1 + c_2(n + 1) + c_3(n) + c_4 + c_5 + c_6\\ \quad= c_1 + c_2n + c_2 + c_3n + c_3 + c_4 + c_5 + c_6\\ \quad= an + b }\]Essentially, the total running time is some linear function, dependent on \(n\)
So let’s go through the insertion sort example because things get a little wonky when there’s a double loop:
\[{ \begin{array}{lll} \text{INSERTION-SORT(}A\text{)} & \text{cost} & \text{times}\\ 1 \quad\textbf{for } j = 2 \textbf{ to } A.length & c_1 & n\\ 2 \quad\quad key = A[j] & c_2 & n - 1\\ 3 \quad\quad\text{// Insert A[j] into the sorted sequence A[1...j-1].}\\ 4 \quad\quad i = j - 1 & c_3 & n - 1\\ 5 \quad\quad\quad \textbf{while } i > 0 \text{ and } A[i] > key & c_4 & \sum_{j=2}^{N}{(t_j)}\\ 6 \quad\quad\quad A[i+1] = A[i] & c_5 & \sum_{j=2}^{N}{(t_j-1)}\\ 7 \quad\quad\quad i = i - 1 & c_6 & \sum_{j=2}^{N}{(t_j-1)}\\ 8 \quad\quad A[i+1] = key & c_7 & n - 1\\ \end{array} }\]This time around, we start at \(j = 2\), so instead of checking the condition \(n + 1\) times, we are only checking it \(n\) times, and then \(n-1\) for each iteration of the for loop. But if we look at lines 5-7, we can see this awkward \(t_j\). This is because that \(t_j\) is dependent on what your input actually is: is it in decreasing order? is it already sorted? something in between? These are the things we want to consider as the worst case and the best case and average case, which is obviously situational for the problem we are solving.
Let’s first consider the best case situation for insertion sort, when the input array is already sorted. We can see that we would check line 5 \(n - 1\) times, but the loop would automatically terminate because the condition \(A[i] > key\) is false right off the bat. Lines 6-7 is never ran. Thus, the running time for the best case situation would be a linear function dependent on \(n\).
\[{ T(n) = c_1n + c_2(n-1) + c_3(n-1) + c_4(n-1) + c_5(0) + c_6(0) + c_7(n-1)\\ \quad= \text{algebra stuff}\\ \quad= an + b }\]But what about the worst case situation? This is a little more complicated, but we can still think through about how many times \(t_j\) is ran. If we think about it, \(j\) essentially describes “how many elements are now sorted?” in our array–the size of our subarray. In the worst case (decreasing order), we would have to move every \(j^{\text{th}}\) element over to the first index (\(j\) times). Thus, we can say that \(t_j = j\).
Math tells us that \({\sum_{i=1}^{n}{i} = \frac{n(n+1)}{2}}\), (but if you don’t believe me use induction) (or the Appendix). Using this, we can then boil the running time down to:
\[{ T(n) = c_1n + c_2(n-1) + c_3(n-1) + c_4(\frac{n(n+1)}{2}) + c_5(\frac{n(n-1)}{2}) + c_6(\frac{n(n-1)}{2}) + c_7(n-1)\\ \quad= \text{algebra stuff}\\ \quad= an^2+bn+c }\]which tells us that the worst case running time of insertion sort is going to be some quadratic function dependent on \(n\).
Generally, we want to use the worst case situation for analyzing algorithms: it gives us some upper bound, so we know what to expect when the worst case happens–that the algorithm will take no longer than what we concluded. So for any input size \(n\), insertion sort will take up to \(an^2 + bn + c\) time. Lower order of growths are better for the case of algorithms. Especially nowadays, with big data, we want to assume that our input size is going to be very large–that \(n\) will approach infinity. This sets up the stage for our next ideas: order of growth!