NICE: Robust Scheduling through Reinforcement Learning
Guided Integer Programming.
AAAI 2022

Abstract

Integer programs provide a powerful abstraction for representing a wide range of real-world scheduling problems. Despite their ability to model general scheduling problems, solving large-scale integer programs (IP) remains a computational challenge in practice. The incorporation of more complex objectives such as robustness to disruptions further exacerbates the computational challenge. We present NICE (Neural network IP Coefficient Extraction), a novel technique that combines reinforcement learning and integer programming to tackle the problem of robust scheduling. More specifically, NICE uses reinforcement learning to approximately represent complex objectives in an integer programming formulation. We use NICE to determine assignments of pilots to a flight crew schedule so as to reduce the impact of disruptions. We compare NICE with (1) a baseline integer programming formulation that produces a feasible crew schedule, and (2) a robust integer programming formulation that explicitly tries to minimize the impact of disruptions. Our experiments show that, across a variety of scenarios, NICE produces schedules resulting in 33% to 48% fewer disruptions than the baseline formulation. Moreover, in more severely constrained scheduling scenarios in which the robust integer program fails to produce a schedule within 90 minutes, NICE is able to build robust schedules in less than 2 seconds on average.

Motivation

Baseline Integer Programming (IP): these can be easily constructed for simpler crew-scheduling problems due to the ease in encoding the objectives and constraints. But these methods fail when there is uncertainty. Specifically, if there are disruptions in the flights, a new solution with new assignments has to be created. A simple objective function for assigning pilots to flights could be:

Baseline IP objective

where we maximize the number of pilots getting assigned to slots in flights. Here 𝐼 is the set of pilots, 𝑆 is the set of slots in flights.

Robust Buffer Formulation: one can use a more sophisticated IP formulation to account for these disruptions. To alleviate the effect of disruptions, one could try to maximize the buffers between successive flights the pilots get assigned to so that any overflow caused due to delays in flights will have some buffer time. Here, one could use an objective function like:
Robust buffer IP objective

where the objective function is penalizes for having smaller buffers. But as the problem becomes more complex, the time required to get the solution increases and quickly becomes intractable. Here 𝐹 is the set of flights, 𝑏_(𝑖𝑓𝑓^′ ) is the buffer penalty.

Reinforcement Learning (RL): Another possible solution would be to use RL to create schedules where the flights are shown sequentially in chronological order and the agent has to choose pilots. Here, the agent can be rewarded according to whatever objective function we want to optimize. Although, this works well, the IP methods perform better to give a solution in a reasonable amount of time.

Method

Knowledge Distillation: Hinton et al. (2015) show that the probabilities in a neural network’s output layer carries useful information even if only the maximum probability value is used for downstream tasks like classification. Instead of letting the RL agent take actions in the environment, we use the probability distribution over the choice of pilots to guide the IP.

The MDP for RL agent

Fig 1:
A rough MDP for the RL agent. The RL agent gives a probability distribution over the pilots which is used by NICE to boost the integer programming formulation.

The objective function is modified to:

Modified NICE IP objective

where 𝑎_𝑖𝑠 are the probabilities for assigning pilot 𝑖 to slot 𝑠. The probability values are obtained from the final layer of the RL agent’s network.


Probability Weight Extraction: Two different methods are used to extract the weights from the RL agent
  1. Blank State Method: This approach exploits the fact that the probability weights for the pilots produced by the first slot do not depend on any previous actions taken. Here, each slot is treated as the first scheduled slot. To do so, for each slot in the fixed order a new RL agent with the same underlying neural network is initialized cutting out all the states that occur before extracting the probability values 𝑎_𝑖𝑠
  2. Monte Carlo Method: The scheduler is run 𝑛 times, shuffling the order of slots each time. Then, the probability values from the output layer are averaged across the 𝑛 runs. These coefficients are then passed to the IP solver to obtain the NICE-generated schedule.


The weight extraction method

Fig 2:
The Monte Carlo weight extraction method. Once we train our network, we run the scheduler n times, shuffling the order of slots each time. We then average the probability values in the output layer across the n runs over pilots and slots to obtain our NICE coefficients. Finally, we pass these coefficients to the IP solver to obtain our NICE-generated schedule.


A rough pseudocode

Experiments and Results

We perform the experiments as follows:

  • Generate one week's worth of flights
  • Schedule with baseline IP, RL scheduler, buffer IP and NICE
  • One day into the schedule delay f% of the flights that had not already left
  • Fix delays with disruption minimizing IP
  • Count the disruptions with each scheduling method

The weight extraction method

Table 1
: With scheduling density same as the training dataset, both NICE and Buffer IP take < 0.85 seconds on average.

The weight extraction method

Table 2
: With scheduling density twice as the training dataset, NICE gives a solution in 1.85-1.90 seconds on average whereas Buffer IP fails to produce even a single schedule timing out after 90 minutes.

Conclusions

  • NICE uses reinforcement learning to approximate and simplify computationally difficult integer programming problems. It leverages the knowledge learned by the RL agent along with utilizing the global outlook of the integer programming.
  • NICE is able to produce schedules significantly better than the ones obtained by baseline integer programming formulations
  • NICE is highly generalizable and can be used whenever there is an isomorphism between an integer program and a reinforcement learning solution

Video at AAAI 2022



Presented at the AAAI 2022 Main track.

 [Slides]

Insert your error message here, if the PDF cannot be displayed.



Further Experiments on Integer Programming Solutions*

After running further experiments with the crew scheduling problem, we found that the NICE solver appears to help reduce bias in the original dataset with respect to reducing disruptions in the face of flight delays. Despite all possible pilot assignments having the same objective value, the baseline integer programming scheduler (henceforth referred to as the “Original Baseline”) of our original paper tended to favor solutions that more often included pilots with a lower ID value in our original dataset. The Original Baseline, where all pilot-event-slot decision variables are given a coefficient of one, produces solutions that are less robust than other more random baselines. Upon further investigation, this tendency of the Original Baseline stems from the fact that the decision variables were created in order of ascending pilot id from the original data set. When we randomly shuffle the order of the decision variable creation, creating a “Shuffled Baseline,” this modified solver creates schedules that yield either a similar number of or fewer disruptions than NICE, on average. A similar baseline, the “Normal Distribution Baseline,” that sets each coefficient randomly drawn from the normal distribution centered on 1 with a standard deviation of 0.1 performs comparably. In some instances, NICE performs better than these random baselines and, in most, NICE performs worse. The Normal Distribution Baseline behaves similarly to NICE in that it creates the decision variables in the same order and only changes the coefficients. Our results, using the same experiments from our original paper, are shown in Table 3 and 4.
*(Updated on 17th Feb 2024)


Table1-with-results(redo)

Table 3:
Average and standard deviation of disruptions across scheduling methods when flights are delayed (lower values are better). Scheduling density of 1.

Table2-with-results(redo)

Table 4:
Average and standard deviation of disruptions across scheduling methods when flights are delayed (lower values are better). Scheduling density of 2.

Citation


If you find our code or datasets useful in your research, please cite the following:






Related Links

  1. Knowledge Distillation by Hinton et al. (2015), where the authors show the importance of the knowledge contained in the final layer of neural networks.
  2. Reinforcement learning based methods for scheduling by Siddharth Nayak (2022).
  3. The buffer formulation was introduced by Christopher Chin. (2021), for the robust crew-scheduling problem.
  4. Integer programming formulation for different objective functions was introduced by Matthew Koch (2021).
  5. Talk by Christopher Chin on AI-assisted Optimization of Schedules.

Acknowledgements


This research was sponsored by the United States Air Force Research Laboratory and was accomplished under Cooperative Agreement Number FA8750-19-2-1000. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the United States Air Force or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notion herein.

This website template was borrowed from Michaël Gharbi and Matthew Tannick.