IoT-driven dynamic replenishment of fresh produce in the presence of seasonal variations: A deep reinforcement learning approach using reward shaping

2025年2月12日·

王梓豪

王文隆

刘天军

Jasmine Chang

Jim Shi

· 2 分钟阅读时长

引用代码 DOI Omega

Cover page

摘要

Internet of things (IoT) has been transforming inventory management disruptively by linking and synchronizing inventory products together. It is one of the driving forces for the prevailing innovation of AgriTech. For fresh produce replenishment in the presence of its inherent seasonal variations, not only can IoT devices capture bidirectional seasonal information of lead time and demand, but also detect fresh produce loss and waste (FPLW) caused by deterioration. With the aid of the massive data collected by IoT, we propose a data-driven deep reinforcement learning (DRL) approach using reward shaping, called DQN-SV-RS, to optimize the dynamic replenishment policy for a fresh produce wholesaler, specifically addressing the challenge posed by seasonal variations. Experimental results show that our DQN-SV-SR approach yields significant improvements for fresh produce supply chain (FPSC) inventory management, especially achieving a remarkable reduction in FPLW. As a core innovation in our DQN-SV-SR approach, the introduced reward shaping can significantly mitigate lost sales and inventory holding, thereby lowering the total cost. Furthermore, with numerical experiments based on real business data, our proposed approach is demonstrated with plausible robustness and scalable applicability.

类型

期刊文章

出版物

Omega, 134, 103299

物联网驱动的新鲜农产品动态补货在季节性变化下的应用：一种使用奖励塑形的深度强化学习方法

System Framework: IoT-driven Dynamic Replenishment of Fresh Produce.

介绍

本研究发表于管理科学领域权威期刊Omega，聚焦于深度强化学习在生鲜农产品供应链动态补货中的应用。我们考虑了生鲜品供需双向季节性波动，其中需求与提前期均呈现时变特征。为了提高动态补货绩效，本研究基于“零库存”管理范式设计了奖励塑形（Reward shaping）函数，通过强化学习算法实现了供应链库存的动态优化控制。

下面是英文摘要翻译

物联网（IoT）通过将库存产品连接和同步，正在以颠覆性的方式改变库存管理。它是AgriTech普遍创新的驱动力之一。在存在固有季节变化的情况下，对于新鲜农产品的补充，不仅物联网设备能够捕捉到交货时间和需求的双向季节信息，还能检测因变质造成的新鲜农产品损失和浪费（FPLW）。借助于物联网收集的大量数据，我们提出了一种基于数据驱动的深度强化学习（DRL）方法，称为DQN-SV-RS，旨在优化新鲜农产品批发商的动态补货策略，特别是针对季节变化带来的挑战。实验结果表明，我们的DQN-SV-SR方法在新鲜农产品供应链（FPSC）库存管理方面取得了显著改善，尤其是在减少FPLW方面表现突出。作为我们DQN-SV-SR方法中的核心创新，引入的奖励塑造可以显著减轻销售损失和库存持有成本，从而降低总成本。此外，基于真实商业数据的数值实验表明，我们提出的方法具有合理的稳健性和可扩展性。

最近更新于 2025年2月12日