Nalazite se na CroRIS probnoj okolini. Ovdje evidentirani podaci neće biti pohranjeni u Informacijskom sustavu znanosti RH. Ako je ovo greška, CroRIS produkcijskoj okolini moguće je pristupi putem poveznice www.croris.hr
izvor podataka: crosbi

Reinforcement learning in non-markov conservative environment using an inductive qualitative model (CROSBI ID 175839)

Prilog u časopisu | izvorni znanstveni rad | međunarodna recenzija

Jović, Franjo ; Slavek, Ninoslav ; Blažević, Damir Reinforcement learning in non-markov conservative environment using an inductive qualitative model // International journal on artificial intelligence tools, 20 (2011), 5; 887-909. doi: 10.1142/S0218213011000425

Podaci o odgovornosti

Jović, Franjo ; Slavek, Ninoslav ; Blažević, Damir

engleski

Reinforcement learning in non-markov conservative environment using an inductive qualitative model

The majority of real-world processes, such as power plants, banking and retail businesses, are non-Markov processes, being conservative systems with stochastic supply and demand. As an example, a retail process possesses long-term memory of the customer's experience and market price drift that deviates from the Markov property. Modeling the reward in this process is directed towards actions that have to be executed daily in order to support it. These actions are further severely distracted by the hidden periodicity of customer behavior on a monthly and weekly basis. Alternative solutions in the retail business are achieved using a retail potential market model and a pricing policy based on demography. The policy of non-Markov behavior has not been intensively studied, although the literature indicates the non-Markov nature of many real process models, such as bank rating migrations. A solution is proposed, based on day-to-day data collection from point-of-sale (POS) locations, synthesizing the reward function from separate sale component rewards using qualitative models, and indicating the most outstanding sale groups that form the reward model. The normalization of POS data has been used for the elimination of periodicities and of non-Markov features of the process data. Reinforcement learning has been additionally supported by artificial corrections of the normalized reward function, and thus the obtained models used for recognition of the most promising and most defective hidden retail product groups. Model data were analyzed for the statistical significance of the obtained results, comparing normalized and non-normalized sales data distributions. The method is simple and effective, being applicable to each POS separately, for a complex retail business network, as well as for other conservative environments. The obtained qualitative correlations of model and reward function lie between 0.72 and 0.95, even for the simple cases presented.

retail process; data normalization; periodicity elimination

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

nije evidentirano

Podaci o izdanju

20 (5)

2011.

887-909

objavljeno

0218-2130

10.1142/S0218213011000425

Povezanost rada

Računarstvo

Poveznice
Indeksiranost