Reinforcement Learning

class olympus.metrics.rl.ReinforcementTest(env, model, vis=False, epsilon=0.01)[source]

Bases: olympus.observers.observer.Observer

Compute average and sd reward of a given model, env over a given number of plays

Methods

epsilon_act(task, state) Do a random action from time to time to shake things up
every(*args[, epoch, batch]) Define how often this metric should be called
load_state_dict(state_dict) Load a state dictionary to resume a previous training
on_end_train(task[, step]) Called at the end of training after the last epoch
on_new_batch(task, step[, input, context]) Called after a batch has been processed
on_new_epoch(step, task, input, context) Called at the end of an epoch, before a new epoch starts
on_new_trial(task, step, parameters, uid) Called after a trial has been processed
on_start_train(task[, step]) Called on ce the training starts
state_dict() Return a state dictionary used to checkpointing and resuming
value() Return the key values that metrics computes
compute_reward  
compute_rewards  
finish  
plot  
compute_reward(task)[source]
compute_rewards(task)[source]
epsilon_act(task, state)[source]

Do a random action from time to time to shake things up

finish(task)[source]
on_new_epoch(step, task, input, context)[source]

Called at the end of an epoch, before a new epoch starts

plot()[source]
value()[source]

Return the key values that metrics computes

class olympus.metrics.rl.Validation(env, trajectory_length, seeds=None, batch_size=128, throws=1000)[source]

Bases: olympus.observers.observer.Observer

Generates random trajectories to validate our model on

Methods

every(*args[, epoch, batch]) Define how often this metric should be called
load_state_dict(state_dict) Load a state dictionary to resume a previous training
on_end_train(task[, step]) Called at the end of training after the last epoch
on_new_batch(task, step[, input, context]) Called after a batch has been processed
on_new_epoch(task, epoch, context) Called at the end of an epoch, before a new epoch starts
on_new_trial(task, step, parameters, uid) Called after a trial has been processed
on_start_train(task[, step]) Called on ce the training starts
state_dict() Return a state dictionary used to checkpointing and resuming
value() Return the key values that metrics computes
compute_rewards  
compute_rewards(task)[source]