In game development or scoring algorithms, this is a common pattern to accumulate an "advantage" score into a dynamic programming table.
The frontier lies in learned value functions —using deep learning to discover the AdvF itself from data, as in meta-reinforcement learning. Another frontier is distributional and quantile regression DP , which provides richer uncertainty information. As computational power grows, the old marriage of DP and AdvF will likely evolve into a new synthesis: algorithms that plan by dynamically constructing their own value metrics on the fly. dp advf