Eric Hennigan had recently pitched a new partitioning problem to ACM: partitioning his discussion sections among two TAs such that students are equally distributed to each TA. Although the problem may be trivial to do by hand, it's easy to decompose into discrete mathematics, and therefore, easy to analyze.

Before reading further, beware: this post is highly dependent on knowledge of discrete mathematics (most importantly, sets). Although the concept is simple, please refer to an authoritative encyclopedia such as Wolfram Mathworld regarding unfamiliar ideas. I will highlight such esoteric ideas with emphasis.

Discussion Section Partioning Problem

Since we can rewrite this problem in terms of discrete mathematics, we will begin by decomposing the problem into sets, the fundamental structure for homogenous data.

You are given a set, $S$, of $n$ discussion sections where each element, $x_i$, represents the number of students in section number $i$. Create two partitions $A$ and $B$ such that the sum of their elements is minimized.

As an example, Let $S$ be

which can be a set of five discussion sections with three, four, five, six and eight students respectively. By staring at this list for a while, it becomes ostensible that we should split the set into

such that their sums are equal. In regards to the problem domain, TA $A$ receives the discussion sections with five and eight students while TA $B$ receives the complementary sections. Consequently, the difference of the sums of their elements are minimized.

Decomposition into Optimization

To define this as an optimization problem, we would like to select a subset of S in the power set of $S$ that minimizes difference in the sum of the subset's elements and the sum of its complement's elements.

Brute Force Solution

Intuitively, we can simply iterate over all subsets of $S$ and try all possibilities in

where $n$ is the number of elements of the set.

This proof is trivial because there are 2n subsets for any set including the empty set (see the cardinality of the power set). Of course, this also means that the algorithm would run in exponential-time which is ridiculously slow (for arbitrarily large sets).

We can attempt to do better.

Dynamic Programming Solution

First, we will take a look at how the we can break up this problem into subproblems. Then, we will see how these subproblems are related and exploit the relationship to develop an efficient solution. Finally, I will provide an example (in case you've become lost in the theory).

I have written about Dynamic Programming before: See my dynamic programming solution for the Josephus Problem if you haven't already.

Subproblem Decomposition

We can begin by reformulating this problem into another familiar problem: the knapsack problem which seeks to find the subset of a set of items with the maximum value given a bag of finite size. To create a mapping from our problem to knapsack, we must think of this backwards.

We must let let the sum of the elements of either partition be equivalent to the bag of finite size. Next, we must determine the maximum bag capacity (or the maximum sum) in our case. Recall that difference is absolutely minimum when the sum of the elements in the partitions are equal. That is, when a partition's sum of elements is equal to exactly half the sum of the elements of the original set:

Subsequently, at each bag size, we can determine the set of elements that are closest to the desired sum.

Of course, with every dynamic programming solution, we must define boundary cases. Thus, we let $V[i]$ be the solution for a sum of $i$. The maximum sum, $i$, should be within the range:

It is trivial then that the maximum sum for the lowerbound is equal to itself:

From here, we define our recurrence relationship and realize that the maximum sum for a current solution, $i$ is an element from the original set, $x$ (that has not yet been selected), summed with a previous solution $V[i-x]$:

Thus, the best fit sum of a partition lay in $V[max_sum]$ where $V[max_sum]$ is half sum of elements of the original set.

Example Application

In the case I have lost you in theory, here is the promised example using the previously used set.

The maximum sum should be

We apply dynamic programming between the range of three and thirteen. Starting from the bottom, we will begin applying our recurrence relationship:

There are no elements better than themselves up to six; subsequently, their solutions are equivalent to the set of themselves.

The next subproblem has a solution which fits with with no duplicates.

The above subproblems continue normally until $V[12]$ is reached.

Although ${3, V[9]}$ comes before ${4, V[8]}$ lexicographically, it does not work because $3$ would be a duplicate in $V[9]$'s solution set.

Finally, we acquire one of two partitions for the solution; the other can be inferred as the complement of this. That is, the other can be calculated by

Consequently, we have the partitions $V[13]$ and $B$ which are the solutions to our problem. That is, if we refer back to the context, TA $A$ should be assigned discussion sections with three, four and six students and TA $B$ should be assigned discussion sections with five and eight students. Because both TAs have thirteen students each, their partitions have the property of minimum difference.

Analysis of Dynamic Programming Solution

Recall that the number of iterations is equivalent to half of the sum,

Each iteration is a subproblem that finds an element in the set which best fits a previous subproblem solution. The time-complexity of each subproblem is thus

which can be implemented with set operations. Recall that the solution must have distinct objects and therefore each element must be checked if they already exist in the solution i.e. find() before the element can be added to the solution set, union().

Hash tables can be used for constant-time operations; however, it can also be memory-intensive. A better solution could have amortized constant-time operations with linear space.

The Union-Find data structure requires that at least one operations of find() and union() must have a logarithmic running time; the latter operation can be constant-time. However, Disjoint Set Forests enable the find() operation to be constant-time, amortized using path compression.

By multiplying the number of iterations with the cost of each iteration, we achieve the final running time:

Thus, I have demonstrated.

Conclusions

Although a dynamic programming solution to this problem exists, it may sometimes be infeasible for arbitrarily large values of the half-sum. Consider the canonical example above. The running time of the dynamic programming solution is 65 time units whereas the power-set solution requires only 32 time units.

Making the decision of selecting either the dynamic programming solution or the power-set solution boils down to solving

Overall, this is a great problem to practice algorithm design and analysis because it can be easily decomposed into discrete mathematics. There may be better solutions to this problem, and if you find one, please inform me.