Reinforcement Learning¶

class olympus.metrics.rl.ReinforcementTest(env, model, vis=False, epsilon=0.01)[source]¶

Compute average and sd reward of a given model, env over a given number of plays

Methods

`epsilon_act`(task, state)	Do a random action from time to time to shake things up
`every`(*args[, epoch, batch])	Define how often this metric should be called
`load_state_dict`(state_dict)	Load a state dictionary to resume a previous training
`on_end_train`(task[, step])	Called at the end of training after the last epoch
`on_new_batch`(task, step[, input, context])	Called after a batch has been processed
`on_new_epoch`(step, task, input, context)	Called at the end of an epoch, before a new epoch starts
`on_new_trial`(task, step, parameters, uid)	Called after a trial has been processed
`on_start_train`(task[, step])	Called on ce the training starts
`state_dict`()	Return a state dictionary used to checkpointing and resuming
`value`()	Return the key values that metrics computes

epsilon_act(task, state)[source]¶: Do a random action from time to time to shake things up

on_new_epoch(step, task, input, context)[source]¶: Called at the end of an epoch, before a new epoch starts

class olympus.metrics.rl.Validation(env, trajectory_length, seeds=None, batch_size=128, throws=1000)[source]¶

Generates random trajectories to validate our model on

Methods

`every`(*args[, epoch, batch])	Define how often this metric should be called
`load_state_dict`(state_dict)	Load a state dictionary to resume a previous training
`on_end_train`(task[, step])	Called at the end of training after the last epoch
`on_new_batch`(task, step[, input, context])	Called after a batch has been processed
`on_new_epoch`(task, epoch, context)	Called at the end of an epoch, before a new epoch starts
`on_new_trial`(task, step, parameters, uid)	Called after a trial has been processed
`on_start_train`(task[, step])	Called on ce the training starts
`state_dict`()	Return a state dictionary used to checkpointing and resuming
`value`()	Return the key values that metrics computes

compute_rewards