Task 1: RL4MaxCut¶

RL4Maxcut is a task that challenges you to build agents capable of solving the MaxCut problem on unseen graphs drawn from diverse distributions, while surpassing industry-level MIP solvers in performance and scalability. Your mission: train agents to effectively explore the vast discrete space of graph partitions and generalize beyond the training set to new, previously unseen graph instances. This task emphasizes distribution-wise generalization — encouraging the development of agents that learn robust, transferable policies applicable to a wide range of graph types (e.g., ER, BA, PowerLaw), rather than overfitting to specific training graphs.

Task Overview¶

In this task, participants are invited to develop MaxCut agents that can produce high-quality solutions and improve the scalability and generalization of reinforcement learning algorithms. We provide a baseline solver, and invite participants to match or surpass its performance — particularly on previously unseen graphs drawn from diverse distributions.

Participants are encouraged to explore a wide range of approaches, including but not limited to:

Improve sampling strategies, such as enhancing solution diversity, improving rollout efficiency, or reducing memory usage.
Apply post-processing filters, such as Local Search or Simulated Annealing, to refine the output from RL agents.
Explore and innovate RL algorithms, such as S2V-DQN, ECO-DQN.
Design a Curriculum Learning schedule to gradually increase problem complexity and accelerate convergence.

Participants are encouraged to propose creative enhancements and hybrid methods that further advance the ability of RL agents to generalize to new and challenging MaxCut instances across different graph distributions.

Datasets¶

We provide a challenging dataset of graphs curated to evaluate the distribution-wise generalization of MaxCut agents. These graphs vary widely in both structural properties and graph sizes, encouraging models to learn transferable policies rather than overfit to specific instances.

Starter Kit¶

This starter kit includes training scripts and environment files for the Maxcut problem. Follow the instructions below to get started.

Commands¶

To run the various methods, follow the commands listed below:

Method	Description	Command
MCPG	Monte Carlo Policy Gradient	N/A
ECO-DQN	Exploratory Combinatorial Optimization Deep Q-Network	N/A
Gurobi	Mixed Integer Programming	N/A

Benchmark ¶

Baseline Solvers:

Gurobi: A mixed-integer programming solver that identifies optimal solutions given an objective function, typically by applying a branch-and-cut algorithm.

RL Methods:

MCPG: Parallel MCMC sampling and a filter scheme to replace the objective function with one with a local search technique. [7]
ECO-DQN: Reinforcement learning method for MaxCut that trains on a distribution of graphs, using a graph neural network to embed states and learn Q-values that guide partition decisions.

Metrics:

Each solution is evaluated based on the cut size, which is the total weight of edges that are cut by the current node partition. The goal is to maximize the cut value—i.e., find the node partition that produces the maximum total weight across the cut. This is equivalent to finding the optimal solution to the MaxCut problem, and serves as the benchmark for comparing different solvers and algorithms.

References