... (4) Formulate a Reward Function : Provides numerical feedback to the agent in response to its preceding action. [2], [55], [56], [57], [59], [79], [91], [156], [157], [163], [170], [174], 37 [175], [183], [195], [196], [204], [218], [219], [243], [248], [255], [256], [282], [294], [305], [311], [313], [321], [327], [328], [332], [338], [347], [348], [352], [354] Q-Learning [29], [40], [54], [63], [76], [77], [81,82], [90], [92], [93], [104], [119] 36 [138], [148], [155], [161], [167], [172], [191][192][193][194], [199] [201], [209], [218], [244,245], [246], [258], [283], [290], [307], [357], [359] SARSA [64], [67], [77], [86], [134], [177], [258], [284], [290], [305] 10 Other [7], [55], [88], [92], [201], [258] 6 ...