Reinforcement Learning¶
-
class
olympus.metrics.rl.ReinforcementTest(env, model, vis=False, epsilon=0.01)[source]¶ Bases:
olympus.observers.observer.ObserverCompute average and sd reward of a given model, env over a given number of plays
Methods
epsilon_act(task, state)Do a random action from time to time to shake things up every(*args[, epoch, batch])Define how often this metric should be called load_state_dict(state_dict)Load a state dictionary to resume a previous training on_end_train(task[, step])Called at the end of training after the last epoch on_new_batch(task, step[, input, context])Called after a batch has been processed on_new_epoch(step, task, input, context)Called at the end of an epoch, before a new epoch starts on_new_trial(task, step, parameters, uid)Called after a trial has been processed on_start_train(task[, step])Called on ce the training starts state_dict()Return a state dictionary used to checkpointing and resuming value()Return the key values that metrics computes compute_reward compute_rewards finish plot
-
class
olympus.metrics.rl.Validation(env, trajectory_length, seeds=None, batch_size=128, throws=1000)[source]¶ Bases:
olympus.observers.observer.ObserverGenerates random trajectories to validate our model on
Methods
every(*args[, epoch, batch])Define how often this metric should be called load_state_dict(state_dict)Load a state dictionary to resume a previous training on_end_train(task[, step])Called at the end of training after the last epoch on_new_batch(task, step[, input, context])Called after a batch has been processed on_new_epoch(task, epoch, context)Called at the end of an epoch, before a new epoch starts on_new_trial(task, step, parameters, uid)Called after a trial has been processed on_start_train(task[, step])Called on ce the training starts state_dict()Return a state dictionary used to checkpointing and resuming value()Return the key values that metrics computes compute_rewards