Optimizers

Module contents

class olympus.optimizers.Optimizer(name=None, *, params=None, optimizer=None, half=False, loss_scale=1, dynamic_loss_scale=False, scale_window=1000, scale_factor=2, min_loss_scale=None, max_loss_scale=16777216.0, **kwargs)[source]

Bases: torch.optim.optimizer.Optimizer

Lazy Optimizer that allows you to first fetch the supported parameters using get_space and then initialize the underlying optimizer using init_optimizer

Parameters:
name: str

Name of a registered optimizer

optimizer: Optimizer

Custom optimizer, mutually exclusive with :param name

half: bool

Enable fp16 Optimizer

loss_scale: float (LS)

fp16 optimizer option: loss scale to use

dynamic_loss_scale: bool

fp16 optimizer option: Enable dynamic loss scaling

scale_window: int (SW)

dynamic loss scaling option: Increase LS after SW successful iteration

scale_factor: float (SF)

dynamic loss scaling option: divide LS by SF after an overflow, or multiply LS by SF after SW successful iteration

min_loss_scale: float
max_loss_scale: float
Raises:
RegisteredOptimizerNotFound

when using a name of an known optimizers

MissingArgument:

if name nor optimizer were not set

WrongParameter

if a wrong hyper parameter is passed in kwargs

Examples

Follows standard Pytorch Optimizer

>>> import torch
>>> from olympus.models import Model
>>> model = Model('resnet18',
...     input_size=(1, 28, 28),
...     output_size=10,)
>>>
>>> x = torch.randn((1, 1, 28, 28))
>>>
>>> optimizer = Optimizer('SGD', params=model.parameters(),  weight_decay=1e-3, lr=0.001, momentum=0.8)
>>>
>>> optimizer.zero_grad()
>>> loss = model(x).sum()
>>> optimizer.backward(loss)
>>> optimizer.step()

Can be lazily initialized for hyper parameter search

>>> optimizer = Optimizer('SGD')
>>> optimizer.get_space()
{'lr': 'loguniform(1e-5, 1)', 'momentum': 'uniform(0, 1)', 'weight_decay': 'loguniform(1e-10, 1e-3)'}
>>> optimizer.init(model.parameters(), weight_decay=1e-3, lr=0.001, momentum=0.8)
>>>
>>> optimizer.zero_grad()
>>> loss = model(x).sum()
>>> optimizer.backward(loss)
>>> optimizer.step()

Switch to a mixed precision optimizer if needed

>>> optimizer = Optimizer('SGD', half=True)
Attributes:
defaults

Returns the default hyper parameter of the underlying optimizer

optimizer
param_groups
state

Methods

add_param_group(param_group) Add a param group to the Optimizer s param_groups.
get_current_space() Get currently defined parameter space
get_space() Return the dimension space of each parameters
init([params, override]) instantiate the underlying optimizer
load_state_dict(state_dict[, strict, device]) Loads the optimizer state.
state_dict([destination, prefix, keep_vars]) Returns the state of the optimizer as a dict.
step([closure]) Performs a single optimization step (parameter update).
zero_grad() Sets the gradients of all optimized torch.Tensor s to zero.
backward  
to  
backward(loss)[source]
defaults

Returns the default hyper parameter of the underlying optimizer

get_current_space()[source]

Get currently defined parameter space

get_space() → Dict[str, str][source]

Return the dimension space of each parameters

half = False
half_args = {}
init(params=None, override=False, **kwargs)[source]

instantiate the underlying optimizer

Raises:
MissingParameters

if an hyper parameter is missing

load_state_dict(state_dict, strict=True, device=None)[source]

Loads the optimizer state.

Args:
state_dict (dict): optimizer state. Should be an object returned
from a call to state_dict().
optimizer
param_groups
state
state_dict(destination=None, prefix='', keep_vars=False)[source]

Returns the state of the optimizer as a dict.

It contains two entries:

  • state - a dict holding current optimization state. Its content
    differs between optimizer classes.
  • param_groups - a list containing all parameter groups where each
    parameter group is a dict
step(closure=None)[source]

Performs a single optimization step (parameter update).

Args:
closure (callable): A closure that reevaluates the model and
returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.

to(device)[source]
zero_grad()[source]

Sets the gradients of all optimized torch.Tensor s to zero.

Args:
set_to_none (bool): instead of setting to zero, set the grads to None.
This will in general have lower memory footprint, and can modestly improve performance. However, it changes certain behaviors. For example: 1. When the user tries to access a gradient and perform manual ops on it, a None attribute or a Tensor full of 0s will behave differently. 2. If the user requests zero_grad(set_to_none=True) followed by a backward pass, .grads are guaranteed to be None for params that did not receive a gradient. 3. torch.optim optimizers have a different behavior if the gradient is 0 or None (in one case it does the step with a gradient of 0 and in the other it skips the step altogether).
exception olympus.optimizers.RegisteredOptimizerNotFound[source]

Bases: Exception

exception olympus.optimizers.UninitializedOptimizer[source]

Bases: Exception

olympus.optimizers.get_optimizers_space()[source]
olympus.optimizers.get_schedules_space()[source]
olympus.optimizers.known_optimizers()[source]
olympus.optimizers.register_optimizer(name, factory, override=False)[source]