Base¶

class olympus.optimizers.base.OptimizerAdapter(factory, *args, **kwargs)[source]¶

Bases: olympus.optimizers.base.OptimizerInterface

Wraps an existing Pytorch Optimizer into an Olympus optimizer

Attributes:	param_groups state

Methods

`add_param_group`(param_group)	Add a param group to the `Optimizer` s `param_groups`.
`backward`(loss)	This method comes from FP16 Optimizer, for consistency we add it everywhere
`defaults`()	Specifies the hyper parameters defaults
`get_space`()	Specifies the hyper parameters that are supported by this optimizer
`load_state_dict`(state_dict[, strict])	Loads the optimizer state.
`state_dict`([destination, prefix, keep_vars])	Returns the state of the optimizer as a `dict`.
`step`([closure])	Performs a single optimization step (parameter update).
`zero_grad`()	Sets the gradients of all optimized `torch.Tensor` s to zero.

add_param_group(param_group)[source]¶

Add a param group to the Optimizer s param_groups.

This can be useful when fine tuning a pre-trained network as frozen layers can be made trainable and added to the Optimizer as training progresses.

Args:: param_group (dict): Specifies what Tensors should be optimized along with group specific optimization options.

backward(loss)[source]¶: This method comes from FP16 Optimizer, for consistency we add it everywhere

static defaults()[source]¶: Specifies the hyper parameters defaults

static get_space()[source]¶: Specifies the hyper parameters that are supported by this optimizer

load_state_dict(state_dict, strict=True)[source]¶

Loads the optimizer state.

Args:

state_dict (dict): optimizer state. Should be an object returned: from a call to state_dict().

param_groups¶

state¶

state_dict(destination=None, prefix='', keep_vars=False)[source]¶

Returns the state of the optimizer as a dict.

It contains two entries:

state - a dict holding current optimization state. Its content

differs between optimizer classes.
param_groups - a list containing all parameter groups where each

parameter group is a dict

step(closure=None)[source]¶

Performs a single optimization step (parameter update).

Args:

closure (callable): A closure that reevaluates the model and: returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.

zero_grad()[source]¶

Sets the gradients of all optimized torch.Tensor s to zero.

Args:

set_to_none (bool): instead of setting to zero, set the grads to None.: This will in general have lower memory footprint, and can modestly improve performance. However, it changes certain behaviors. For example: 1. When the user tries to access a gradient and perform manual ops on it, a None attribute or a Tensor full of 0s will behave differently. 2. If the user requests zero_grad(set_to_none=True) followed by a backward pass, .grads are guaranteed to be None for params that did not receive a gradient. 3. torch.optim optimizers have a different behavior if the gradient is 0 or None (in one case it does the step with a gradient of 0 and in the other it skips the step altogether).

class olympus.optimizers.base.OptimizerInterface(params)[source]¶

Bases: torch.optim.optimizer.Optimizer

Base Olympus Optimizer

Methods

`add_param_group`(param_group)	Add a param group to the `Optimizer` s param_groups.
`backward`(loss)	This method comes from FP16 Optimizer, for consistency we add it everywhere
`defaults`()	Specifies the hyper parameters defaults
`get_space`()	Specifies the hyper parameters that are supported by this optimizer
`load_state_dict`(state_dict[, strict])	Loads the optimizer state.
`state_dict`([destination, prefix, keep_vars])	Returns the state of the optimizer as a `dict`.
`step`([closure])	Performs a single optimization step (parameter update).
`zero_grad`()	Sets the gradients of all optimized `torch.Tensor` s to zero.

add_param_group(param_group)[source]¶

Add a param group to the Optimizer s param_groups.

This can be useful when fine tuning a pre-trained network as frozen layers can be made trainable and added to the Optimizer as training progresses.

Args:: param_group (dict): Specifies what Tensors should be optimized along with group specific optimization options.

backward(loss)[source]¶: This method comes from FP16 Optimizer, for consistency we add it everywhere

static defaults()[source]¶: Specifies the hyper parameters defaults

static get_space()[source]¶: Specifies the hyper parameters that are supported by this optimizer

load_state_dict(state_dict, strict=True)[source]¶

Loads the optimizer state.

Args:

state_dict (dict): optimizer state. Should be an object returned: from a call to state_dict().

state_dict(destination=None, prefix='', keep_vars=False)[source]¶

Returns the state of the optimizer as a dict.

It contains two entries:

state - a dict holding current optimization state. Its content

differs between optimizer classes.
param_groups - a list containing all parameter groups where each

parameter group is a dict

step(closure=None)[source]¶

Performs a single optimization step (parameter update).

Args:

closure (callable): A closure that reevaluates the model and: returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.

zero_grad()[source]¶

Sets the gradients of all optimized torch.Tensor s to zero.

Args:

set_to_none (bool): instead of setting to zero, set the grads to None.: This will in general have lower memory footprint, and can modestly improve performance. However, it changes certain behaviors. For example: 1. When the user tries to access a gradient and perform manual ops on it, a None attribute or a Tensor full of 0s will behave differently. 2. If the user requests zero_grad(set_to_none=True) followed by a backward pass, .grads are guaranteed to be None for params that did not receive a gradient. 3. torch.optim optimizers have a different behavior if the gradient is 0 or None (in one case it does the step with a gradient of 0 and in the other it skips the step altogether).