Multi-agent cooperative control of a three-tank liquid system using reinforcement learning algorithms
Abstract
This research paper aims to present a multi-agent cooperative reinforcement learning approach for controlling the highly nonlinear dynamics of a three-tank liquid system. Meaning to say the system’s complexity arises from the interdependence of the tanks, where precise control is required to maintain stable fluid levels. Initially, we deploy two twin-delayed deep deterministic policy Gradient agents to manage the inflow valves, then followed by two proximal policy optimization agents tasked with controlling the valves. The main goal is to compare the performance of these two models of agents against traditional proportional-integral-derivative controllers. In addition to promote effective collaboration between agents, a cooperative reward structure is implemented, encouraging agents to work together to maintain balanced fluid levels within all three tanks. The reward function penalizes deviations from target levels, accounting for both local performance and system-wide stability. The proposed method also addresses key challenges in multi-agent systems, such as non-stationarity and coordination in decentralized control, by integrating a centralized critic during training with decentralized execution. Experimental results reveal that the twin-delayed deep deterministic policy agents outperform the proportional integral derivative control system in terms of settling time, rise time, and robustness, showcasing their ability to handle the nonlinear nature of the system with minimal tuning.
Full text article
References
[1] Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. 2017 Jul 20
[2] Fujimoto S, Hoof H, Meger D. Addressing function approximation error in actor-critic methods. International conference on machine learning 2018 Jul 3 (pp. 1587-1596). PMLR.
[3] Gupta JK, Egorov M, Kochenderfer M. Cooperative multi-agent control using deep reinforcement learning. International conference on autonomous agents and multiagent systems 2017 May 8 (pp. 66-83). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-71682-4_5
[4] Suzuki A, Kawahara R, Harada S. Cooperative multi-agent deep reinforcement learning for dynamic virtual network allocation with traffic fluctuations. IEEE Transactions on Network and Service Management. 2022 Feb 7;19(3):1982-2000. https://doi.org/10.1109/TNSM.2022.3149243
[5] Yun WJ, Park S, Kim J, Shin M, Jung S, Mohaisen DA, Kim JH. Cooperative multiagent deep reinforcement learning for reliable surveillance via autonomous multi-UAV control. IEEE Transactions on Industrial Informatics. 2022 Jan 14;18(10):7086-96. https://doi.org/10.1109/TII.2022.3143175
[6] Dittrich MA, Fohlmeister S. Cooperative multi-agent system for production control using reinforcement learning. CIRP Annals. 2020 Jan 1;69(1):389-92. https://doi.org/10.1016/j.cirp.2020.04.005
[7] Nguyen ND, Nguyen T, Nahavandi S. Multi-agent behavioral control system using deep reinforcement learning. Neurocomputing. 2019 Sep 24;359:58-68. https://doi.org/10.1016/j.neucom.2019.05.062
[8] Zhang Y, Quinones-Grueiro M, Barbour W, Zhang Z, Scherer J, Biswas G, Work D. Cooperative multi-agent reinforcement learning for large scale variable speed limit control. In2023 IEEE International Conference on Smart Computing (SMARTCOMP) 2023 Jun 26 (pp. 149-156). IEEE. https://doi.org/10.1109/SMARTCOMP58114.2023.00036
[9] Rivera C, Staley E, Llorens A. Learning multi-agent cooperation. Frontiers in Neurorobotics. 2022 Oct 14;16:932671. https://doi.org/10.3389/fnbot.2022.932671
[10] Lillicrap TP, Hunt JJ, Pritzel A, Heess NM, Erez T, Tassa Y, Silver D, Wierstra DP. Continuous control with deep reinforcement learning. Google Patents. US Patent. 2020;10.
[11] Lowe R, Wu YI, Tamar A, Harb J, Pieter Abbeel O, Mordatch I. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems. 2017;30.
[12] Amato C. An introduction to centralized training for decentralized execution in cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2409.03052. 2024 Sep 4.
[13] Hernandez-Leal P, Kartal B, Taylor ME. A survey and critique of multiagent deep reinforcement learning. Autonomous Agents and Multi-Agent Systems. 2019 Nov;33(6):750-97. https://doi.org/10.1007/s10458-019-09421-1
[14] Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. 2017 Jul 20.
[15] Fujimoto S, Hoof H, Meger D. Addressing function approximation error in actor-critic methods. InInternational conference on machine learning 2018 Jul 3 (pp. 1587-1596). PMLR.
[16] Gupta JK, Egorov M, Kochenderfer M. Cooperative multi-agent control using deep reinforcement learning. InInternational conference on autonomous agents and multiagent systems 2017 May 8 (pp. 66-83). Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-71682-4_5
[17] Suzuki A, Kawahara R, Harada S. Cooperative multi-agent deep reinforcement learning for dynamic virtual network allocation with traffic fluctuations. IEEE Transactions on Network and Service Management. 2022 Feb 7;19(3):1982-2000. https://doi.org/10.1109/TNSM.2022.3149243
[18] Yun WJ, Park S, Kim J, Shin M, Jung S, Mohaisen DA, Kim JH. Cooperative multiagent deep reinforcement learning for reliable surveillance via autonomous multi-UAV control. IEEE Transactions on Industrial Informatics. 2022 Jan 14;18(10):7086-96. https://doi.org/10.1109/TII.2022.3143175
[19] Dittrich MA, Fohlmeister S. Cooperative multi-agent system for production control using reinforcement learning. CIRP Annals. 2020 Jan 1;69(1):389-92. https://doi.org/10.1016/j.cirp.2020.04.005
[20] Nguyen ND, Nguyen T, Nahavandi S. Multi-agent behavioral control system using deep reinforcement learning. Neurocomputing. 2019 Sep 24;359:58-68. https://doi.org/10.1016/j.neucom.2019.05.062
[21] Zhang Y, Quinones-Grueiro M, Barbour W, Zhang Z, Scherer J, Biswas G, Work D. Cooperative multi-agent reinforcement learning for large scale variable speed limit control. In2023 IEEE International Conference on Smart Computing (SMARTCOMP) 2023 Jun 26 (pp. 149-156). IEEE. https://doi.org/10.1109/SMARTCOMP58114.2023.00036
[22] Lillicrap TP, Hunt JJ, Pritzel A, Heess NM, Erez T, Tassa Y, Silver D, Wierstra DP, inventors; DeepMind Technologies Ltd, assignee. Continuous control with deep reinforcement learning. United States patent US 10,776,692. 2020 Sep 15.
[23] Lowe R, Wu YI, Tamar A, Harb J, Pieter Abbeel O, Mordatch I. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems. 2017;30.
[24] Zhang K, Yang Z, Başar T. Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of reinforcement learning and control. 2021 Jun 24:321-84. https://doi.org/10.1007/978-3-030-60990-0_12
[25] Amato C. An introduction to centralized training for decentralized execution in cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2409.03052. 2024 Sep 4.
[26] Rajasekhar N, Radhakrishnan, T K, Samsudeen N, Decentralized multi-agent control of a three-tank hybrid system based on twin delayed deep deterministic policy gradient reinforcement learning algorithm. Int. J. Dynam. Control 12, 1098-1115 (2024). https://doi.org/10.1007/s40435-023-01227-0
[27] Zhang Y, Zhao Y, Zhang C, Feng C, Multi-agent reinforcement learning-based method for demand response of building HVAC systems, Journal of Building Engineering, Volume 108, 2025, 112734, ISSN 2352-7102. https://doi.org/10.1016/j.jobe.2025.112734
[28] Sheikh A, Machine Learning for Optimization and Decision Support in Complex System-of-Systems: Applications in Microelectronics, Healthcare, Smart Cities, and the Circular Economy, Colorado State University ProQuest Dissertations & Theses, 2025. 32236453.
[29] Mallik S, Mathivanan SK, Shivahare BD, S.K.B, S., Jayagopal, P., & Chakraborty, S. (Eds.). (2026). Robotics in Weaponry using Machine Learning and Engineering (1st ed.). CRC Press. https://doi.org/10.1201/9781003663461
[30] Abdel A W, (2009). Pathophysiology. In: Passing the USMLE. Springer, New York, NY. https://doi.org/10.1007/978-0-387-68980-7_10
[31] Hoseini S A, Hassan J, Bokani A, Kanhere SS. In Situ MIMO-WPT Recharging of UAVs Using Intelligent Flying Energy Sources. Drones. 2021; 5(3):89. https://doi.org/10.3390/drones5030089
[32] Xiaoqiang C, Haobo Y, Jianjun M et al. Research on multi-section energy-saving operation control methods of urban rail train based on deep reinforcement learning, 18 October 2024, PREPRINT (Version 1) available at Research Square
[33] Cheng Z. Towards Trustworthy Deep Reinforcement Learning, Northwestern University ProQuest Dissertations & Theses, 2025. 32122366.
[34] Lu Q, Fang H, Yin Z, Zhu G. HAPS-PPO: A Multi-Agent Reinforcement Learning Architecture for Coordinated Regional Control of Traffic Signals in Heterogeneous Road Networks. Applied Sciences. 2025; 15(20):10945. https://doi.org/10.3390/app152010945
[35] El-Nabulsi AR, Non-Linear Dynamics with Non-Standard Lagrangians. Qual. Theory Dyn. Syst. 12, 273-291 (2013). https://doi.org/10.1007/s12346-012-0074-0
[36] Ma T, Xu C, Yang S, et al. An intelligent proactive defense against the client-side DNS cache poisoning attack via self-checking deep reinforcement learning. Int J Intell Syst. 2022;37:8170-8197. https://doi.org/10.1002/int.22934
[37] Prabu A, Integrating Data-Driven Control Methods with Motion Planning: A Deep Reinforcement Learning-Based Approach, Purdue University ProQuest Dissertations & Theses, 2023. 32363572.
[38] Wu M, Eldele E, Chen Z, Pan S, Wen Q, & Li, X. (Eds.). (2026). AI for Time Series: Volume 1: Unlocking Patterns with Deep Learning (1st ed.). CRC Press. https://doi.org/10.1201/9781003612742
[39] Misbaudeen AA, Hammed O, Anis R, Wook-Ho N, Qazeem OO, Timothy Denen Akpenpuun, Min-Hwi Kim, Hyeon-Tae Kim, Bo-Yeong Kang, Hyun-Woo Lee, Deep reinforcement learning for PID parameter tuning in greenhouse HVAC system energy Optimization: A TRNSYS-Python cosimulation approach, Expert Systems with Applications, Volume 252, Part A, 2024, 124126, ISSN 0957-4174, https://doi.org/10.1016/j.eswa.2024.124126
[40] Tao L, Zhang J and Zhang, X. Multi-Phase Multi-Objective Dexterous Manipulation with Adaptive Hierarchical Curriculum. J Intell Robot Syst 106, 1 (2022). https://doi.org/10.1007/s10846-022-01680-7
[41] Le BG, Ta VC, Low variance trust region optimization with independent actors and sequential updates in cooperative multi-agent reinforcement learning. Auton Agent Multi-Agent Syst 39, 12 (2025). https://doi.org/10.1007/s10458-025-09695-8
[42] Abdel AW (2009). Pathophysiology. In: Passing the USMLE. Springer, New York, NY. https://doi.org/10.1007/978-0-387-68980-7_10
[43] Hoseini SA, Hassan J, Bokani A, Kanhere SS. In Situ MIMO-WPT Recharging of UAVs Using Intelligent Flying Energy Sources. Drones. 2021; 5(3):89. https://doi.org/10.3390/drones5030089
[44] Bujgoi G, Sendrescu D. Tuning of PID Controllers Using Reinforcement Learning for Nonlinear System Control. Processes. 2025; 13(3):735. https://doi.org/10.3390/pr13030735
[45] Mahapatra AGKRS, and Mahapatro SR, Design of a decentralized control law for variable area coupled tank systems using H∞ complimentary sensitivity function, Asian J. Control. 26 (2024), 1540-1552. https://doi.org/10.1002/asjc.3281
[46] Sawant HH, Gujar R, Mandhare N, Sable MJ, Ambadekar, P.K. and Gawande, S.H. (2025), Comparative Analysis of Reinforcement Learning Agents for Optimizing Airfoil Shapes. Int J Numer Meth Fluids, 97: 1142-1156. https://doi.org/10.1002/fld.5395
[47] Rao A & Jelvis T, (2022). Foundations of Reinforcement Learning with Applications in Finance (1st ed.). Chapman and Hall/CRC. https://doi.org/10.1201/9781003229193-1
[48] Roberto CR, Orbital Maneuvers and Interplanetary Trajectory Design via Reinforcement Learning, Embry-Riddle Aeronautical University ProQuest Dissertations & Theses, 2025. 32237533.
[49] Ho TM, Nguyen KK and Cheriet M, "Converging Game Theory and Reinforcement Learning For Industrial Internet of Things," in IEEE Transactions on Network and Service Management, vol. 20, no. 2, pp. 890-903, June 2023, https://doi.org/10.1109/TNSM.2022.3202168
[50] Gautam M, Deep Reinforcement Learning for Resilient Power and Energy Systems: Progress, Prospects, and Future Avenues. Electricity. 2023; 4(4):336-380. https://doi.org/10.3390/electricity4040020
[51] Kim N, Ecological and Predictive Indicators of Social and Emotional Skills among Korean Adolescents: A Person-Centered Analysis within the OECD SSES Framework. Child Ind Res (2026). https://doi.org/10.1007/s12187-026-10355-w
[52] Dodda, A. (2025). Artificial Intelligence and Financial Transformation: Unlocking the Power of Fintech, Predictive Analytics, and Public Governance in the Next Era of Economic Intelligence. Deep Science Publishing. Chapter 5, 1-20. https://doi.org/10.70593/978-81-988918-1-5
Authors
Copyright (c) 2026 Lincoln Nobert Munhenzwa, Adlen Kerboua, Godfrey Murairidzi Gotora (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright (c) 2025 International Journal of Applied Resilience and Sustainability (IJARS) 
This work is licensed under a Creative Commons Attribution 4.0 International License.