MAL Seminar BW-API RL Code ● Starter Code is provided in F:/BWAPI.3.7.4/BWAPI. 3.7.4/SeminarAIModule ● Implements SARSA(lambda) with epsilon greedy policy Scenario 1 slide BWAPI manual 1. Implement AI agent 2. Rebuild SeminarAIModule in Visual studio 3. Copy SeminarAIModule.dll from F:/BWAPI.3.7.4/BWAPI. 3.7.4/SeminarAIModule/Release to F:/StarCraft/BWAPIdata/AI 4. Start F:/BWAPI.3.7.4/BWAPI.3.7.4/ChaosLauncher and click start 5. AImodule and Map can be specified by selecting config in launcher Configuring trials ● Set trial parameters in F:/StarCraft/trials.cfg ○ First line is # of trials ○ Each further line specifies parameters for 1 trial ○ Format: alpha, lambda, #episode, doGreedyEvalRuns (0/1) ○ Output is in starcraft folder in trial<nr>_out.txt files ○ Output format: episode,total reward, number of steps State space ● Current implementation extracts 6 vars from game info ○ X- coordinate in [0,1000] ○ Y-coordinate in [0,1000] ○ enemy distance [0,1000] ○ hitpoint difference unit-enemy in [-50,50] ○ enemy attacking/moving (boolean) ○ enemy angle [-pi,pi] ● Implemented in SeminarAIModule::getState() Discretization ● Each state variable (except boolean isAttacking) is discretized into set of finite values ● Resolution per variable is set in SeminarAIModule::initQ() ● Finer discretization allows more accurate, but slower learning Action Space ● Agent has 7 discrete actions ○ ○ ○ ○ 0: stop 1: attack enemy (only if visible) 2: move towards enemy (only if visible) 3-6: move N,E,S,W 30 units ● Implemented in SeminarAIModule:: executeAction Rewards ● step_reward (-0.03) for every non-final step ● hitpoint difference when either unit is killed ○ negative for loss ○ positive for win ● So goal is to kill enemy as quickly as possible, with maximum hitpoints left