Actor Critic¶

class olympus.tasks.reinforcement.a2c.A2C(model: olympus.reinforcement.utils.AbstractActorCritic, dataloader, optimizer, lr_scheduler, device, criterion=None, storage=None, logger=None)[source]¶

Bases: olympus.tasks.task.Task

Parameters:	actor_critic: Module Torch Module that takes a state and return an action and a value env: Env Gym like environment num_steps: int number of simulation/environment steps to accumulate before doing a gradient step

Notes

RL has two batch size, the data loader batch size (lbs) which is equivalent to the number of simulation done in parallel and the gradient batch size.

num_steps of simulations are accumulated together to perform one gradient update

Attributes:	device events metrics model

Methods

advantage_actor_critic(current_state, …) A2C Synchronous actor Critic

eval_loss(batch) This is used to compute validation and test loss

fit(epochs[, context]) Execute a single batch

get_space(**fidelities) Return hyper parameter space

init([gamma, optimizer, lr_schedule, model, uid])

Parameters:

load_state_dict(state[, strict]) Try to load a previous unfinished state to resume

state_dict([destination, prefix, keep_vars]) Save a state the task can go back to if an error occur

compute_returns
parameters
report
resumed
set_device
summary

advantage_actor_critic(current_state, replay_vector)[source]¶

A2C Synchronous actor Critic

Parameters:	current_state: current state the game was left in replay_vector: list of action that was performed by the model to reach current state

compute_returns(value, states)[source]¶

fit(epochs, context=None)[source]¶

Execute a single batch

Parameters:	epoch: int current step in the training process context: dict Optional Context

Notes

You should wrap whatever code you have here inside a BadResumeGuard to prevent users from resuming a failed task that can have a bad states

To resume a task, you need to create a clean one with the same hyper parameters. It will pickup automatically where at its last checkpoint

get_space(**fidelities)[source]¶: Return hyper parameter space

init(gamma=0.99, optimizer=None, lr_schedule=None, model=None, uid=None)[source]¶

Parameters:	optimizer: Dict Optimizer hyper parameters lr_schedule: Dict lr schedule hyper parameters model: Dict model hyper parameters gamma: float reward discount factor trial: Optional[str] trial id to use for logging. When using orion usually it already created a trial for us we just need to append to it

load_state_dict(state, strict=True)[source]¶

Try to load a previous unfinished state to resume

Notes

You should wrap whatever code you have here inside a BadResumeGuard to prevent users from resuming a failed task that can have a bad states

To resume a task, you need to create a clean one with the same hyper parameters. It will pickup automatically where at its last checkpoint

model¶

parameters()[source]¶

state_dict(destination=None, prefix='', keep_vars=False)[source]¶: Save a state the task can go back to if an error occur