Base

class olympus.optimizers.base.OptimizerAdapter(factory, *args, **kwargs)[source]

Bases: olympus.optimizers.base.OptimizerInterface

Wraps an existing Pytorch Optimizer into an Olympus optimizer

Attributes:
param_groups
state

Methods

add_param_group(param_group) Add a param group to the Optimizer s param_groups.
backward(loss) This method comes from FP16 Optimizer, for consistency we add it everywhere
defaults() Specifies the hyper parameters defaults
get_space() Specifies the hyper parameters that are supported by this optimizer
load_state_dict(state_dict[, strict]) Loads the optimizer state.
state_dict([destination, prefix, keep_vars]) Returns the state of the optimizer as a dict.
step([closure]) Performs a single optimization step (parameter update).
zero_grad() Sets the gradients of all optimized torch.Tensor s to zero.
add_param_group(param_group)[source]

Add a param group to the Optimizer s param_groups.

This can be useful when fine tuning a pre-trained network as frozen layers can be made trainable and added to the Optimizer as training progresses.

Args:
param_group (dict): Specifies what Tensors should be optimized along with group specific optimization options.
backward(loss)[source]

This method comes from FP16 Optimizer, for consistency we add it everywhere

static defaults()[source]

Specifies the hyper parameters defaults

static get_space()[source]

Specifies the hyper parameters that are supported by this optimizer

load_state_dict(state_dict, strict=True)[source]

Loads the optimizer state.

Args:
state_dict (dict): optimizer state. Should be an object returned
from a call to state_dict().
param_groups
state
state_dict(destination=None, prefix='', keep_vars=False)[source]

Returns the state of the optimizer as a dict.

It contains two entries:

  • state - a dict holding current optimization state. Its content
    differs between optimizer classes.
  • param_groups - a list containing all parameter groups where each
    parameter group is a dict
step(closure=None)[source]

Performs a single optimization step (parameter update).

Args:
closure (callable): A closure that reevaluates the model and
returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.

zero_grad()[source]

Sets the gradients of all optimized torch.Tensor s to zero.

Args:
set_to_none (bool): instead of setting to zero, set the grads to None.
This will in general have lower memory footprint, and can modestly improve performance. However, it changes certain behaviors. For example: 1. When the user tries to access a gradient and perform manual ops on it, a None attribute or a Tensor full of 0s will behave differently. 2. If the user requests zero_grad(set_to_none=True) followed by a backward pass, .grads are guaranteed to be None for params that did not receive a gradient. 3. torch.optim optimizers have a different behavior if the gradient is 0 or None (in one case it does the step with a gradient of 0 and in the other it skips the step altogether).
class olympus.optimizers.base.OptimizerInterface(params)[source]

Bases: torch.optim.optimizer.Optimizer

Base Olympus Optimizer

Methods

add_param_group(param_group) Add a param group to the Optimizer s param_groups.
backward(loss) This method comes from FP16 Optimizer, for consistency we add it everywhere
defaults() Specifies the hyper parameters defaults
get_space() Specifies the hyper parameters that are supported by this optimizer
load_state_dict(state_dict[, strict]) Loads the optimizer state.
state_dict([destination, prefix, keep_vars]) Returns the state of the optimizer as a dict.
step([closure]) Performs a single optimization step (parameter update).
zero_grad() Sets the gradients of all optimized torch.Tensor s to zero.
add_param_group(param_group)[source]

Add a param group to the Optimizer s param_groups.

This can be useful when fine tuning a pre-trained network as frozen layers can be made trainable and added to the Optimizer as training progresses.

Args:
param_group (dict): Specifies what Tensors should be optimized along with group specific optimization options.
backward(loss)[source]

This method comes from FP16 Optimizer, for consistency we add it everywhere

static defaults()[source]

Specifies the hyper parameters defaults

static get_space()[source]

Specifies the hyper parameters that are supported by this optimizer

load_state_dict(state_dict, strict=True)[source]

Loads the optimizer state.

Args:
state_dict (dict): optimizer state. Should be an object returned
from a call to state_dict().
state_dict(destination=None, prefix='', keep_vars=False)[source]

Returns the state of the optimizer as a dict.

It contains two entries:

  • state - a dict holding current optimization state. Its content
    differs between optimizer classes.
  • param_groups - a list containing all parameter groups where each
    parameter group is a dict
step(closure=None)[source]

Performs a single optimization step (parameter update).

Args:
closure (callable): A closure that reevaluates the model and
returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.

zero_grad()[source]

Sets the gradients of all optimized torch.Tensor s to zero.

Args:
set_to_none (bool): instead of setting to zero, set the grads to None.
This will in general have lower memory footprint, and can modestly improve performance. However, it changes certain behaviors. For example: 1. When the user tries to access a gradient and perform manual ops on it, a None attribute or a Tensor full of 0s will behave differently. 2. If the user requests zero_grad(set_to_none=True) followed by a backward pass, .grads are guaranteed to be None for params that did not receive a gradient. 3. torch.optim optimizers have a different behavior if the gradient is 0 or None (in one case it does the step with a gradient of 0 and in the other it skips the step altogether).