Abstract
Operant conditioning is an essential learning mechanism for organisms and a fundamental theory for reinforcement learning in artificial intelligence. This paper proposes a neural network circuit based on non-volatile memristors that mimics the process of operant conditioning, such as the effects of reinforcement (positive reward or negative punishment) on the acquisition and maintenance of certain behaviors. This circuit is composed of two components: a reward operant conditioning circuit and a punishment operant conditioning circuit. These reward and punishment operant conditioning circuits not only simulate the process of exploration, acquisition, and satiety, but also reveal the effect of reward delay and punishment intensity on the acquisition of operant conditioning. This research holds the potential for practical application in training robots to make decisions. By adjusting reward delay and punishment intensity, the learning speed and effectiveness of robots can be enhanced.
Original language | English |
---|---|
Pages (from-to) | 1002-1006 |
Number of pages | 5 |
Journal | IEEE Transactions on Circuits and Systems II: Express Briefs |
Volume | 71 |
Issue number | 3 |
Early online date | 5 Oct 2023 |
DOIs | |
Publication status | Published - 1 Mar 2024 |