NICE: Robust Scheduling through Reinforcement Learning
Guided Integer Programming.
AAAI 2022
- Luke Kenworthy MIT-Air Force AI Accelerator
- Siddharth Nayak MIT
- Christopher Chin MIT
- Hamsa Balakrishnan MIT
Abstract
Integer programs provide a powerful abstraction for representing
a wide range of real-world scheduling problems. Despite
their ability to model general scheduling problems,
solving large-scale integer programs (IP) remains a computational
challenge in practice. The incorporation of more complex objectives
such as robustness to disruptions further exacerbates the computational
challenge. We present NICE (Neural network IP Coefficient Extraction),
a novel technique that combines reinforcement learning and
integer programming to tackle the problem of robust scheduling.
More specifically, NICE uses reinforcement learning to approximately
represent complex objectives in an integer programming formulation.
We use NICE to determine assignments of pilots to a flight
crew schedule so as to reduce the impact of disruptions.
We compare NICE with (1) a baseline integer programming
formulation that produces a feasible crew schedule, and
(2) a robust integer programming formulation that explicitly
tries to minimize the impact of disruptions. Our experiments
show that, across a variety of scenarios, NICE produces
schedules resulting in 33% to 48% fewer disruptions than
the baseline formulation. Moreover, in more severely
constrained scheduling scenarios in which the robust integer
program fails to produce a schedule within 90 minutes, NICE
is able to build robust schedules in less than 2 seconds on average.
Motivation
Baseline Integer Programming (IP): these can be easily
constructed for simpler crew-scheduling problems due to
the ease in encoding the objectives and constraints. But
these methods fail when there is uncertainty. Specifically,
if there are disruptions in the flights, a new solution
with new assignments has to be created. A simple objective
function for assigning pilots to flights could be:
where we maximize the number of pilots getting assigned to slots in flights. Here 𝐼 is the set of pilots, 𝑆 is the set of slots in flights.
Robust Buffer Formulation: one can use a more sophisticated IP formulation to account for these disruptions. To alleviate the effect of disruptions, one could try to maximize the buffers between successive flights the pilots get assigned to so that any overflow caused due to delays in flights will have some buffer time. Here, one could use an objective function like:
where the objective function is penalizes for having smaller buffers. But as the problem becomes more complex, the time required to get the solution increases and quickly becomes intractable. Here 𝐹 is the set of flights, 𝑏_(𝑖𝑓𝑓^′ ) is the buffer penalty.
Reinforcement Learning (RL): Another possible solution would be to use RL to create schedules where the flights are shown sequentially in chronological order and the agent has to choose pilots. Here, the agent can be rewarded according to whatever objective function we want to optimize. Although, this works well, the IP methods perform better to give a solution in a reasonable amount of time.
Method
Knowledge Distillation: Hinton et al.
(2015) show
that the probabilities in a neural network’s output layer
carries useful information even if only the maximum probability
value is used for downstream tasks like classification. Instead
of letting the RL agent take actions in the environment,
we use the probability distribution over the choice of pilots
to guide the IP.
The objective function is modified to:
where 𝑎_𝑖𝑠 are the probabilities for assigning pilot 𝑖 to slot 𝑠. The probability values are obtained from the final layer of the RL agent’s network.
Probability Weight Extraction: Two different methods are used to extract the weights from the RL agent
-
Blank State Method: This approach exploits the
fact that the probability weights for the pilots produced by
the first slot do not depend on any previous actions taken.
Here, each slot is treated as the first scheduled slot. To
do so, for each slot in the fixed order a new RL agent with
the same underlying neural network is initialized cutting out
all the states that occur before extracting the probability
values 𝑎_𝑖𝑠
- Monte Carlo Method: The scheduler is run 𝑛 times, shuffling the order of slots each time. Then, the probability values from the output layer are averaged across the 𝑛 runs. These coefficients are then passed to the IP solver to obtain the NICE-generated schedule.
Experiments and Results
We perform the experiments as follows:
- Generate one week's worth of flights
- Schedule with baseline IP, RL scheduler, buffer IP and NICE
- One day into the schedule delay f% of the flights that had not already left
- Fix delays with disruption minimizing IP
- Count the disruptions with each scheduling method
Conclusions
- NICE uses reinforcement learning to approximate and simplify computationally difficult integer programming problems. It leverages the knowledge learned by the RL agent along with utilizing the global outlook of the integer programming.
- NICE is able to produce schedules significantly better than the ones obtained by baseline integer programming formulations
- NICE is highly generalizable and can be used whenever there is an isomorphism between an integer program and a reinforcement learning solution
Further Experiments on Integer Programming Solutions*
After running further experiments with the crew scheduling
problem, we found that the NICE solver appears to help
reduce bias in the original dataset with respect to reducing
disruptions in the face of flight delays. Despite all possible
pilot assignments having the same objective value, the
baseline integer programming scheduler (henceforth referred
to as the “Original Baseline”) of our original paper tended
to favor solutions that more often included pilots with
a lower ID value in our original dataset. The Original
Baseline, where all pilot-event-slot decision variables
are given a coefficient of one, produces solutions that
are less robust than other more random baselines. Upon
further investigation, this tendency of the Original
Baseline stems from the fact that the decision variables
were created in order of ascending pilot id from the original
data set. When we randomly shuffle the order of the decision
variable creation, creating a “Shuffled Baseline,” this
modified solver creates schedules that yield either a
similar number of or fewer disruptions than NICE, on
average. A similar baseline, the “Normal Distribution
Baseline,” that sets each coefficient randomly drawn
from the normal distribution centered on 1 with a standard
deviation of 0.1 performs comparably. In some instances,
NICE performs better than these random baselines and, in
most, NICE performs worse. The Normal Distribution Baseline
behaves similarly to NICE in that it creates the decision
variables in the same order and only changes the coefficients.
Our results, using the same experiments from our original
paper, are shown in Table 3 and 4.
*(Updated on 17th Feb 2024)
Citation
If you find our code or datasets useful in your research, please cite the following:
Related Links
-
Knowledge Distillation by
Hinton et al.
(2015), where the authors show the importance of the knowledge contained
in the final layer of neural networks.
-
Reinforcement learning based methods for scheduling by Siddharth Nayak
(2022).
-
The buffer formulation was introduced by
Christopher Chin.
(2021), for the robust crew-scheduling problem.
-
Integer programming formulation for different
objective functions was introduced by Matthew Koch
(2021).
-
Talk by Christopher Chin on AI-assisted Optimization
of Schedules.
Acknowledgements
This research was sponsored by the United
States Air Force Research Laboratory and was accomplished
under Cooperative Agreement Number FA8750-19-2-1000.
The views and conclusions contained in this document are
those of the authors and should not be interpreted as
representing the official policies, either expressed or implied,
of the United States Air Force or the U.S. Government.
The U.S. Government is authorized to reproduce and distribute
reprints for Government purposes notwithstanding
any copyright notion herein.
This website template was borrowed from Michaël Gharbi and
Matthew Tannick.