A few of the projects I worked on!

Identifying Disparities in Sepsis Treatment using Inverse Reinforcement Learning (2022)

Authors: Hyewon Jeong, Siddharth Nayak, Taylor Killian, Sanjat Kanjilal, Marzyeh Ghassemi

Started as a course project for the 18.0651-Matrix Methods In Data Analysis, Signal Processing, And Machine Learning at MIT by Prof. Gilbert Strang and ended up being my secondary project at MIT:) It has been reported in numerous studies that disparities in care for treating sepsis patients exist across the trajectory of patient stay in the emergency department and intensive care unit. Here, we apply a number of reinforcement learning techniques including behavioral cloning, imitation learning, and inverse reinforcement learning, to learn the optimal policy in the management of septic patient subgroups using expert demonstrations. Then we estimate the counterfactual optimal policies by applying the model to another subset of unseen medical populations and identify the difference in cure by comparing it to the real policy. Our data comes from the sepsis cohort of MIMIC-IV and the clinical data warehouses of the Mass General Brigham healthcare system. The ultimate objective of this work is to use the optimal learned policy function to estimate the counterfactual treatment policy and identify deviations across sub-populations of interest. We hope this approach would help us identify any disparities in care and also changes in cure in response to the publication of national sepsis treatment guidelines.

Fairness and Robustness of Mixed Autonomous Traffic Control with Reinforcement Learning (2021)

Authors: Ashay Athalye, Shannon Hwang, Siddharth Nayak

A course project for the 6.884-Computational Sensorimotor Learning Spring 2021 course at MIT by Prof. Pulkit Agrawal. Mixed autonomy (MA) scenarios – where both autonomous vehicles (AVs) and human drivers share the same road – will become increasingly prevalent as autonomous vehicles are deployed into society. From a reinforcement learning perspective, this offers a variety of interesting research opportunities such as modeling problems with very large state spaces, multiple agents, and exploring reward design with fairness constraints. In this work we try to replicate an existing benchmark for the bottleneck environment and investigate the changes in learned agent policies and performance when explicitly considering fairness and human driver model variation during training. We find that adding a fairness term to the reward function significantly changes the learned behavior, allowing all vehicles to move through the bottleneck at approximately equal average speeds while decreasing the throughput through the bottleneck by small and at times insignificant amounts. [PDF] [Code]

Overhead the Albatross Hangs Motionless Upon the Air with Dynamic Soaring (2021)

Authors: Antonia Bronars, Rebecca Jiang, Siddharth Nayak

A course project for the 6.832-Underactuated Robotics Spring 2021 course at MIT by Prof. Russ Tedrake. Albatrosses are capable of travelling thousands of kilometres daily with very little fuel supplies. They utilise a flight strategy called dynamic soaring which helps them to extract propulsive energy from the horizontal wind shear layers formed a few metres above the ocean surface. This allows them to fly for hours or even days without flapping their wings. We investigate this flight behaviour from a trajectory optimization point of view and try to emulate these zero-cost trajectories numerically. We also analyse the variation in these trajectories caused by changing the wind parameters. Finally we use Linear-Quadratic Regulator (LQR) control for tracking these trajectories, to account for errors and noise in the dynamics. [PDF] [Code]

Residual Policy Learning(2020)

Authors: Antonia Bronars, Rebecca Jiang, Siddharth Nayak

A course project for the 6.881-Robotic Manipulation Fall 2020 course at MIT by Prof. Russ Tedrake. We worked on implementing a controller plus reinforcement learning agent to improve the sample efficiency of reinforcement learning (RL) agents for robotic tasks with sparse rewards (residual learning). We design controllers for the mujoco push, pick-and-place and sliding tasks robosuite pick-and-place and peg-in-hole tasks each with different task success rates. We train reinforcement learning agents on top of these controllers to improve the success rates of the agents. We compare this against learning an RL agent from scratch and show that residual learning has better sample efficiency. This project was an implementation of the Residual Policy Learning paper by Silver et al. [PDF] [Code]

PyThor (2020)

Author: Siddharth Nayak

A side project of mine where I try to combine PyTorch, Pytorch-lightning, Telegrad and MLFlow for ML-based projects. I took this up when I was in the pandemic-lockdown and was waiting for my graduate school to start. The pytorch-lightning module takes care of the boilerplate code in running experiments. Telegrad gives live updates(when requested) about the status of the training losses along with plots on your mobile so that you can monitor your jobs on the go while you are not on your system. You can also chage the learning rates from your mobile. MLFlow manages the experiments so that each experiment can be reproduced by logging in all the hyper-parameters used. Have provided some template codes for linear, convolutional and graph networks along with some reinforcement learning examples. This was basically done by me as a preparation for my graduate school codebases.[Code]

Transfer Matching Networks (2019)

Authors: Abhishek Nair and Siddharth Nayak

A course project for the EE6180-Advanced Topics in Artificial Intelligence (Fall 2019) course by Prof. Abhishek Sinha at IIT Madras. We tried to add a autoencoder module in the reinforcement learning pipeline to help with domain randomisation. As this was a 3-week project, we did not get to investigate the problem as much as we wanted. [Final Presentation] [Code]

Robust Reinforcement Learning (2018)

Authors: Rishhanth Maanav V and Siddharth Nayak

A project done as a course project for the CS6700-Reinforcement Learning course at IIT Madras by Prof. L.A. Prashanth. We implement two robust reinforcement learning algorithms: Robust Adversarial Reinforcement Learning (RARL) by Pinto et al. and Adversarial Robust Policy Learning (ARPL) by Mandlekar et al. and compare them with naïve and vanilla reinforcement learning algorithms under various amounts of perturbations to the model dynamics. [PDF] [Code]

Hand Gesture Recognition for Soldier Support (2017-18)

Author: Siddharth Nayak

I created a hand-gesture recognition glove which recognises the gestures given in this link. This does not use computer vision and instead uses readings from sensors attached to the glove. [PDF] [Documentation] [Code]

Automatic Waste Segregation Dustbin (2017)

Project Members: Varun Sundar, Arvind Pujari, Siddharth Nayak, Rohith Srinivas, Vishnu Harshith, Sai Venkat, Iniyan Ram

We created an automatic waste segregation dustbin which classifies the waste into distinct categories using features learned from images, the capacitances and the inductances of the waste. The dustbin also has a mechanical disc powered by stepper motors which push the waste into its corresponding compartment after the classification. [Documentation]

Self-Balancing Robot (2016-17)

Project Members: Siddharth Nayak, Rahul Chakwate, Adil Shaikh, Anuj Sindgi, Atharva Rajadnya

Self-Balancing Robot was the first tech-project I worked on. It is basically a one wheeled robot which balances itself to stay upright. It is like the real-life version of the Cartpole from OpenAI Gym. The robot balances itself using a tuned Proportional-Integral-Derivative(PID) Controller. The angles of inclination and the angular velocities are given as the state vector. I learnt a lot about electronics, robotics and programming while working on this project.[Code]

Mini Projects:


A list of small projects which required less than a week to work on: