Logo
Published on
views

MHSCTF 2023: Edmonds’ Blossom Algorithm for Maximum Weighted Matching

Authors
  • Name
    enscribe
    Github

Intro

I was recently invited by the academic team "DN" (the name, surprisingly, has no inappropriate connotation) to compete in Mentor High School's second CTF iteration, MHSCTF 2023. Although the competition ran for 15 days, we maxed out their 11 challenges in just under 16 hours (ignoring solve resets) and managed to take first place. This writeup was part of the verification process, which came with prize-receiving — enjoy!


Matchmaker

solvers:flocto
enscribe author: 0xmmalik
points: 9
category: prog
I've just had the most brilliant idea 😮 I want to write a program that takes all the students and how much they like each other to pair them up so I can maximize the total love in the classroom! Of course, when I say "I," I really mean... "you" 😉
Notes: [SEE BELOW]
nc 0.cloud.chals.io [PORT]
Warning: Discrete math, graph theory, and combinatorial optimization ahead! If you're unfamiliar with the mathematical symbols used in this writeup, reference this dropdown:
Complicated Math Symbols
  • \sum - Summation
  • \in - Element of
  • \notin - Not an element of
  • \subset - Proper subset of
  • \subseteq - Subset of
  • \emptyset - Empty set
  • \forall - For all
  • \exists - There exists
  • \nexists - There does not exist
  • \in - Element of
  • \notin - Not an element of
  • \ni - Contains as member
  • ∌\not\ni - Does not contain as member
  • \setminus - Set minus (drop)
  • \oplus - Direct sum
  • \cup - Union
  • \cap - Intersection
  • xx' - Prime

First blood! Here are the notes the author provided alongside the challenge description, abridged for brevity:

  • The connection times out after 60 seconds, and there will be 3 iterations.
  • The input will be given in NN lines, where NN represents the number of students. The first line represents the zeroth student, the second line represents the first student, and so on (50<N<10050 < N < 100, Nmod2=0N \bmod 2 = 0).
  • Each line of input contains N1N - 1 integers RR (ranged 0<R<1000 < R < 100, inclusive); RR represents a student's rating of another student at that index, repeated for everybody but themselves.

I've cut the notes provided in half to make it a bit more digestable. Before we continue, let's see some sample input from the server for context:

Grabbing Sample Input

We can do a bit of analysis on what we've received so far.

The first line, 86 60 67..., can be translated into:

  • Student 0 -> Student 1 = 86
  • Student 0 -> Student 2 = 60
  • Student 0 -> Student 3 = 67

Let's do the same thing for the second line, 76 0 74...:

  • Student 1 -> Student 0 = 76
  • Student 1 -> Student 2 = 0
  • Student 1 -> Student 3 = 74

Notice how the student will always skip their own index, which aligns with the author's notes detailing how the integers "represent the students ratings of everybody but themselves." Let's continue with the rest of the notes:

  • Determine the pairings that maximize the students' ratings for each other
  • Example: If Student 1 -> Student 7 = 98 and Student 7 -> Student 1 = 87, and Students 1 and 7 are paired, the score of this pairing would be 98+87=18598 + 87 = 185
  • The output should list all maximized pairs (including duplicates; order of the pairs does not matter)
  • Example: If Student 0 -> Student 3, Student 1 -> Student 2, and Student 4 -> Student 5, the desired output is: 0,3;1,2;2,1;3,0;4,5;5,4
  • The connection will close if the output is incorrect, and reiterate if correct

Let's crack on with the actual algorithm which will be used for solving this!

Graph Theory Fundamentals

This challenge is a classic example of a concept called "maximum weight matching," which is fundamental in graph theory and discrete mathematics. Although there is a relatively high prerequisite knowledge floor for understanding these concepts, I'll explain them as best as I can.

Firstly, let's define a graph. A graph is a set of vertices (or nodes), which are connected by edges. Not to be confused with the Cartesian coordinate system, graphs are represented by the ordered pair G=(V, E)G = (V,\ E) in which VV is a set of vertices and EE is a set of edges:

Definition: A set of edges EE is defined as E(x,y)  (x,y)V2 and xyE \subset \\{(x, y)\ |\ (x, y) \in V^2 \textrm{ and } x \neq y\\}. By this definition, xx and yy are the vertices that are connected to the edge ee, called the endpoints. It can also be said that ee is incident to xx and yy.

We can create a matching MEM \subseteq E within the graph GG, in which MM is a collection of edges such that every vertex vv of VV is incident to at most one edge of MM (meaning two edges cannot share the same vertex). When the highest possible cardinality (number of matches) of GG is reached, the matching is considered maximum, and any vertex vv not incident to any edge in MM is considered exposed.

Definition: Formally, a matching MM is said to be maximum if for any other matching MM', MM|M| \geq |M'|. In other words, MM is the largest matching possible.

For example, the following is a maximum matching performed on the graph above:

Definition: Since there is an exposed vertex in this graph (and because the size of VV is odd), GG is not considered perfect. A perfect maximum matching occurs when the size of VV is even (or V/2=E|V|/2 = |E|) and there are no exposed vertices.

Now, let's put weights into consideration (i.e. the students' ratings). With a weighted graph G=(V, E, w)G = (V,\ E,\ w), we can attribute a function ww that assigns a weight w(e)w(e) to each edge eEe \in E (defining w:ENw : E \rightarrow \mathbb{N}):

The total weight of a matching MM, written as w(M)w(M), can be defined as:

w(M)=eMw(e)w(M) = \sum_{e \in M}w(e) \\\\

Solving for w(M)w(M) in the example graph above:

w(M)=w((1,2))+w((3,4))+w((5,6))+w((8,9))w(M)=5+2+9+6=22w(M) = w((1, 2)) + w((3, 4)) + w((5, 6)) + w((8, 9)) \\\\ w(M) = 5 + 2 + 9 + 6 = 22

Our goal is to maximize w(M)w(M) — it is definitely not maximized above, since W(M)W(M) is not at its highest possible value. We will be tackling it with a combination of two different concepts: Edmonds' blossom algorithm, and the primal-dual method. Although the blossom algorithm is typically meant for maximum cardinality matching (maximizing the size of MM itself, and running in O(EV2)\mathcal{O}(|E||V|^2) time), utilizing it as a subroutine alongside the primal-dual method of linear programming creates a maximum weight matching (and running in O(V3)\mathcal{O}(|V|^3) time).

Firstly, let's get started with how it works.

The Blossom Algorithm

The core idea behind the blossom algorithm itself involves two concepts: augmenting paths and blossoms (alongside blossom contraction).

Augmenting Paths

Given a graph G=(V, E)G = (V,\ E), a path PP in GG can be considered an alternating path if edges in the path are alternatively in MM and not in MM. Augmenting paths are a type of alternating, odd-length path that starts and ends with two exposed vertices:

As we can see, PP is considered an augmenting path because it ends with two exposed vertices. Augmenting paths are special in that they can augment ("to improve", as per the name) the size of MM by one edge. To do so, simply swap the edges in PP that are in MM with the edges that are not in MM:

Definition: A matching augmentation along an augmenting path PP is a process by which MM is replaced by MM', defined as such:
M=MP=(MP)(PM)M' = M \oplus P = (M \setminus P) \cup (P \setminus M)
This implies that MP=M+1|M \oplus P| = |M| + 1.

The reason why augmenting paths are so fundamental to the blossom algorithm is that it directly computes a maximum matching — we can reiterate the process of finding augmenting paths until no more are available in GG. This is described well with Berge's Theorem:

Theorem: A matching MM in a graph GG is maximum if and only if there is no MM-augmenting path in GG.

Here is high-level pseudocode which describes this recursive iteration:

Note: Under the context of the above pseudocode, find_augmenting_path() will always start at an exposed vertex, running a breadth-first search to find an augmenting path.

Therefore, the blossom algorithm will always terminate with a maximum matching.

Blossoms (and Blossom Contraction)

The second concept that we need to understand are blossoms. Given a graph GG and a matching MM, a blossom BB is a "cycle" within GG which consists of 2k+12k + 1 edges, of which exactly kk belong to MM (a blossom with two matches would have 2(2)+1=52(2) + 1 = 5 edges).

Let's say the algorithm created matchings in a graph with a blossom — although it isn't the maximum matching possible, we cannot augment it any further because the technically available augmenting path is longer than the shortest available path, therefore terminating subotimally:

This is where blossom contraction comes in. The idea is to contract the blossom BB into a single "super-vertex" vBv_B, and to treat it as a single vertex in the graph:

From there, we can find valid augmenting paths to augment MM:

Finally, we can expand the blossom BB back into its original form, to reveal the maximum matching:

Theorem: If GG' a the graph formed after contraction of a blossom BB in GG, then GG has an augmenting path if and only if GG' has an augmenting path.
(This is called the Blossoms Theorem, or the Main Theorem of Botany)

With the fundamental concepts of this beautiful algorithm covered, the only thing we need to wrap our heads around is how to adopt this with weighted graphs. I'll be referencing Galil's Efficient Algorithms for Finding Maximum Matching in Graphs for this section.

Primal-Dual Method

So... the primal-dual method. I've been personally studying this topic (alongside linear programming in general, from which this method is derived from) for the past couple days after I solved the challenge, and let's just say that I absolutely have no clue what is going on:

Like any other reasonable person would do if they were stuck, I asked ChatGPT-4 about how it worked. Here's what it had to say regarding the process:

Warning: The following text is generated by AI, and may not be 100% accurate. Please refer to the aforementioned paper for more information.
Also note that this information was generated after our team solved the challenge, during the writeup process — the AI was not involved in the process of solving it.
User Profile Picture

Explain the primal-dual method to someone completely new to algorithms and graph theory.

ChatGPT Profile Picture

Alright! Let's start with a simple analogy before diving into the specifics of the primal-dual method:

Analogy: Shopping on a Budget

Imagine you are shopping for fruits with a limited budget. You want to buy as much fruit as possible without exceeding your budget. On the other side, the shopkeeper wants to sell the fruits at a price where he can earn the most profit without losing customers. In this situation:

  • Your goal of maximizing the amount of fruit represents the "primal" problem.
  • The shopkeeper's goal of pricing the fruits represents the "dual" problem.

In an ideal world, the amount you are willing to pay for each fruit (based on your budget and desire) would match exactly with the prices the shopkeeper sets, resulting in the most efficient shopping transaction for both of you.

Now, translating this analogy to the realm of algorithms:

Primal-Dual Method

The primal-dual method is an approach used primarily in optimization problems. In many of these problems, you have a primary (or "primal") problem that you want to solve, but there's also a related "dual" problem that offers a different perspective. Often, by solving the dual problem, we can gain insights into the primal problem or vice versa.

  1. Primal Problem: This is the original problem you want to solve. For example, it might be about finding the shortest path in a graph or maximizing some kind of profit.

  2. Dual Problem: This is a related problem that is derived from the primal problem. The solution to the dual problem often gives a bound or limit to the solution of the primal problem. If the solutions to the primal and dual problems are the same, we have an optimal solution.

The primal-dual method involves:

  • Starting with an initial feasible solution to the dual problem.
  • Using this solution to guide the search for a feasible solution to the primal problem.
  • Repeatedly updating both solutions until they converge to an optimal solution.

Why is it useful?

The power of the primal-dual method lies in the fact that while the primal problem might be hard to solve directly, its dual problem might be easier. Or, solving the dual can give us insights, bounds, or approximations to the primal solution.

Connection to Graph Theory: In graph theory, there are many problems (like network flow problems) where the primal might represent a flow configuration on the graph, and the dual might represent some sort of cost or capacity constraint. The primal-dual method can be applied to get efficient solutions for such problems.

Conclusion: Just like in our shopping analogy, where both you and the shopkeeper are trying to optimize your outcomes (with buying fruits and pricing them), in algorithms, the primal and dual problems offer two sides of the same coin. Solving or approximating one can give us valuable information about the other!

User Profile Picture

How can you combine the Blossom algorithm and the primal-dual method together to create maximum weighted matchings?

ChatGPT Profile Picture

To understand how the Blossom algorithm and the primal-dual method can be combined to create maximum weighted matchings, we first need to briefly review each concept:

  1. Blossom Algorithm: The Blossom algorithm is primarily used to find a maximum cardinality matching in a general graph (i.e., not just bipartite). A "blossom" in this context refers to an odd cycle in the graph, where the cycle has a particular structure that can be "shrunk" into a single vertex, simplifying the graph and allowing for more matchings to be found.

  2. Primal-Dual Method: As we previously discussed, the primal-dual method is a technique in optimization where we deal with a primal problem and its associated dual problem. In the context of weighted matchings, the primal problem is maximizing the weight of the matching, and the dual problem typically involves some cost or capacity constraints that give us a bound or limit to the solution of the primal problem.

Combining the Blossom Algorithm and the Primal-Dual Method:

When we deal with weighted graphs, we aim to find a matching that not only covers as many vertices as possible (maximal cardinality) but also maximizes the sum of the weights of the edges in the matching (maximum weight).

The procedure is as follows:

  1. Initialization:

    • Start with an empty matching.
    • Define a set of "dual variables" for each node. These represent the "prices" or weights we associate with each node. Initially, these can be set based on the weights of the edges connected to each node.
  2. Iterative Process:

    • Use the primal-dual method to adjust the dual variables, aiming to push the solution closer to an optimal one.
    • At each iteration, use the adjusted dual variables to modify the edge weights in the graph. This results in what's known as the "residual graph."
    • On the residual graph, look for augmenting paths (paths that can increase the size of the matching) using the Blossom algorithm. If an augmenting path is found, modify the current matching to include it.
    • If no augmenting paths are found, return to adjusting the dual variables and repeat.
  3. Termination:

    • The process terminates when no more augmenting paths can be found, and further adjustments to the dual variables do not lead to any improvements. At this point, the algorithm has converged to a maximum weighted matching.

In essence, the primal-dual method guides the search by adjusting the weights in the graph based on dual variables, while the Blossom algorithm helps find augmenting paths in the residual graph, which can be tricky due to the presence of blossoms.

By combining these two techniques, we can efficiently find a maximum weighted matching in general graphs, not just bipartite ones.

Take what you will from this information. I'll just sweep it under the rug right now, since we'll simply be "borrowing" an implementation of the above algorithm to solve this challenge. Let me know if you are knowledgable in this area and would like to contribute to this section!

"Borrowing" an Implementation

Yes, the blossom algorithm is the most elegant thing I've ever seen. No, I am absolutely never going to implement it from scratch. I'm not even going to try. Yes, it's super fun to visualize and understand the concepts, but could you even imagine trying to implement this in code, and deal with "neighbors" and "forests" and "visited nodes?" I can't.

Currently, we have tried two implementations which work for this challenge in Python3.9: van Rantwijk's, and the NetworkX Python library. Both run in O(n3)\mathcal{O}(n^3) time, which is still reasonable for our range of 50<N<10050 < N < 100, Nmod2=0N \bmod 2 = 0. Duan and Pettie overview an approximate method for maximum weight matching, which runs in linear time.

We'll begin by parsing the input.

Parsing Input

Let's start with a quick recap of what the netcat server gave us. We have NN lines, with NN representing the amount of students in the classroom (in addition to being an even amount). Each line will contain N1N - 1 integers ww representing the rating that student has for the individual at that index (accommodating for the skipped index representing themselves).

We can pass the input into a variable data with pwntools's recvuntilS method, which will receive data until it sees the specified string. We'll use b'> ' as the string to stop at, since that's the prompt the server gives us.

Once we have the data, let's convert it into something we can work with — we'll make a 2x2 matrix so we can access both the student and their rating of another. We should also insert a 0 at the index which the student skipped themselves so the matchings don't get screwed up:

Parsing Input

Let's do some premature optimization and type hinting because I'm a nerd:

Type Hinting

Testing the script:

Nice. Let's move on to the actual algorithm.

Maximum Weight Matching

This solve utilizes the NetworkX library's algorithms.matching.max_weight_matching function, which takes a NetworkX Graph class as input (the equivalent of GG) and returns a set of tuples (e.g. {(29, 14), (48, 21), (0, 39), (23, 19), ...}) representing the endpoints of the edges in MM.

We'll import networkx.algorithms.matching as matching for our blossom algorithm wrapper, and networkx as nx for our Graph class. In terms of weight, we need to convert the students' ratings into "biweights" (e.g. w(0,1)+w(1,0)w(0, 1) + w(1, 0)) because the ratings they have for each other aren't necessarily symmetric:

Each "biweight" now represents the total score of the match, which was mentioned in the author's notes. We'll be able to pass this into the max_weight_matching function as the weight parameter.

To test the algorithm, we'll use the following dummy input (note that inverse repetition is allowed):

In theory, the following script should pair the zeroth student with the third student, and the first student with the second:

Maximum Weight Matching (test case)

Testing the script:

The script works locally! Let's alter it to include the three iterations (alongside the correct formatting, e.g. 0,3;1,2;2,1;3,0;4,5;5,4) so that it can work with the remote server:

Final Script

Running the script:

We've successfully performed a maximum weight matching utilizing the blossom algorithm and the primal-dual method!

Afterword

Wow, this challenge was definitely a rabbit hole. Even though the author never actually intended for us to go this deep into the math behind it (and for me to skip out on my Calculus classes to learn graph theory and discrete math), I really don't like ignoring concepts (or in this case, a wrapper function) simply because their prerequisite knowledge floors are either too high or too intimidating. Obviously I was a lost sheep when I was initially researching the blossom algorithm (as this is my first algo challenge, ever), but I just love the feeling when you tear through all the layers abstractions and finally get to the juicy, low-level bits.

I'm glad that our team was able to get first blood one this one, and I'm glad this beautiful algorithm was the first one I've learned. I hope you enjoyed this writeup, and I hope you learned something new!

References