We have, specifically in the domain of using RL for parameter selection. Part of the reason I was brought on board was background experience in RL. I fully believe it's the key to turning domain (yes/no) decisions into numbers for our algorithms to ingest.
Ultimately we would like all of our systems to use RL to interpret user feedback to determine parameters, actions, etc...
Ultimately we would like all of our systems to use RL to interpret user feedback to determine parameters, actions, etc...
Brilliant question by the way!