Deep Q Learning for Atari using Neon

Simple deep Q-learning implementation for training AI to play Atari games. You can read the report is Czech language.

Trained AI for Demon attack example:

Deep Q-learning AI playing Atari Demon attack score 8770


You need to install Arcade learning environment and Neon.


Usage: python src/ --{params} {value}


path to rom file as first argument

optional arguments:
-h, --help Show this help message and exit.

--display_screen BOOL Display game screen during training and testing.
--sound BOOL Play (or record) sound.
--frame_skip INT How many times to repeat each chosen action.
--screen_width INT Screen width after resize.
--screen_height INT Screen height after resize.
--record_screen_path PATH Record game screens under this path. Subfolder for each game is created.

Replay memory:
--replay_size REPLAY_SIZE Maximum size of replay memory.
--history_length HISTORY_LENGTH How many screen frames form a state.

Deep Q-learning network:
--learning_rate FLOAT Learning rate.
--discount_rate FLOAT Discount rate for future rewards.
--batch_size INT Batch size for neural network.
--optimizer STRING{rmsprop,adam,adadelta} Network optimization algorithm.
--decay_rate FLOAT Decay rate for RMSProp and Adadelta algorithms.
--clip_error FLOAT Clip error term in update between this number and its negative.
--target_steps INT Copy main network into target network after this many steps.
--trained_steps INT This number will be added to trained steps.
--min_reward FLOAT Minimum reward.
--max_reward FLOAT Maximum reward.
--batch_norm BOOL Use batch normalization in all layers.

--backend STRING{cpu,gpu} Backend type for Neon
--device_id INT GPU device id (only used with GPU backend)
--datatype STRING{float16,float32,float64} Default floating point precision for backend (float64 for cpu only)
--stochastic_round BOOL Use stochastic rounding (will round to BITS number of bits if specified)

--exploration_rate_start FLOAT Exploration rate at the beginning of decay.
--exploration_rate_end FLOAT Exploration rate at the end of decay.
--exploration_decay_steps FLOAT How many steps to decay the exploration rate.
--exploration_rate_test FLOAT Exploration rate used during testing.
--train_frequency INT Perform training after this many game steps.
--train_repeat INT Number of times to sample minibatch during training.
--random_starts INT Perform max this number of dummy actions after game restart, to produce more random game dynamics.

Main loop:
--random_steps INT Populate replay memory with random steps before starting learning.
--train_steps INT How many training steps per epoch.
--test_steps INT How many testing steps after each epoch.
--epochs INT How many epochs to run.
--max_computing_time INT After timeout (in seconds) computing will be terminated.
--play_games INT How many games to play, suppresses training and testing.
--load_weights PATH Load network from file.
--save_weights_prefix PATH Save network to given file. Epoch and extension will be appended.
--csv_file PATH Write training progress to this file.

--random_seed INT Random seed for repeatable experiments.

Default values are written in



You can run it in MetaCentrum for larger performance.

ssh {login}
cd {path/to/your/sources/}
qsub -l walltime=1d -q gpu -l mem=8gb -l scratch=10gb -l nodes=1:ppn=1:gpu=1:x86_64:debian8:cl_doom

where content of can be:
for first task:
#!/usr/bin/env bash
cd {path/to/your/sources/}
source ./
source ./ ./roms/demon_attack.bin --backend gpu --device_id 0 --random_steps 50000 --train_steps 250000 --test_steps 125000 --epochs 200 --display_screen false --csv_file ./results/${folderName}.csv --save_weights_prefix ./snapshots/${folderName}/${folderName} --max_computing_time 82800 > ./results/${folderName}.log 2>&1

and for continuing task:
#!/usr/bin/env bash
cd {path/to/your/sources/}
source ./
source ./ ./roms/demon_attack.bin --backend gpu --device_id 0 --random_steps 50000 --train_steps 250000 --test_steps 125000 --epochs 200 --display_screen false --csv_file ./results/${folderName}.csv --save_weights_prefix ./snapshots/${folderName}/${folderName} --load_weights ./snapshots/${folderName}/demon_attack_gpu_run1_88.prm --exploration_rate_end 0.001 --exploration_rate_test 0.001 --max_computing_time 82000 >> ./results/${folderName}.log 2>&1

Or you can run the on your computer as well.


Only for pre-trained model: snapshots/{nameOfSnapshot}.prm roms/{nameOfRom}.bin


Your pre-trained model can be recorded for one game. snapshots/{nameOfSnapshot}.prm roms/{nameOfRom}.bin
Your video will be saved in video/{nameOfRom}.mov.


You can plot results from saved *.csv file. results/{nameOfLog}.csv
Your image will be saved in results/{nameOfLog}.eps.