Optimizers¶

Module contents¶

class olympus.optimizers.Optimizer(name=None, *, params=None, optimizer=None, half=False, loss_scale=1, dynamic_loss_scale=False, scale_window=1000, scale_factor=2, min_loss_scale=None, max_loss_scale=16777216.0, **kwargs)[source]¶

Bases: torch.optim.optimizer.Optimizer

Lazy Optimizer that allows you to first fetch the supported parameters using get_space and then initialize the underlying optimizer using init_optimizer

Parameters:

name: str: Name of a registered optimizer
optimizer: Optimizer: Custom optimizer, mutually exclusive with :param name
half: bool: Enable fp16 Optimizer
loss_scale: float (LS): fp16 optimizer option: loss scale to use
dynamic_loss_scale: bool: fp16 optimizer option: Enable dynamic loss scaling
scale_window: int (SW): dynamic loss scaling option: Increase LS after SW successful iteration
scale_factor: float (SF): dynamic loss scaling option: divide LS by SF after an overflow, or multiply LS by SF after SW successful iteration
min_loss_scale: float
max_loss_scale: float

Raises:

RegisteredOptimizerNotFound: when using a name of an known optimizers
MissingArgument:: if name nor optimizer were not set
WrongParameter: if a wrong hyper parameter is passed in kwargs

Examples

Follows standard Pytorch Optimizer

>>> import torch
>>> from olympus.models import Model
>>> model = Model('resnet18',
...     input_size=(1, 28, 28),
...     output_size=10,)
>>>
>>> x = torch.randn((1, 1, 28, 28))
>>>
>>> optimizer = Optimizer('SGD', params=model.parameters(),  weight_decay=1e-3, lr=0.001, momentum=0.8)
>>>
>>> optimizer.zero_grad()
>>> loss = model(x).sum()
>>> optimizer.backward(loss)
>>> optimizer.step()

Can be lazily initialized for hyper parameter search

>>> optimizer = Optimizer('SGD')
>>> optimizer.get_space()
{'lr': 'loguniform(1e-5, 1)', 'momentum': 'uniform(0, 1)', 'weight_decay': 'loguniform(1e-10, 1e-3)'}
>>> optimizer.init(model.parameters(), weight_decay=1e-3, lr=0.001, momentum=0.8)
>>>
>>> optimizer.zero_grad()
>>> loss = model(x).sum()
>>> optimizer.backward(loss)
>>> optimizer.step()

Switch to a mixed precision optimizer if needed

>>> optimizer = Optimizer('SGD', half=True)

Attributes:	`defaults` Returns the default hyper parameter of the underlying optimizer optimizer param_groups state

Methods

`add_param_group`(param_group)	Add a param group to the `Optimizer` s `param_groups`.
`get_current_space`()	Get currently defined parameter space
`get_space`()	Return the dimension space of each parameters
`init`([params, override])	instantiate the underlying optimizer
`load_state_dict`(state_dict[, strict, device])	Loads the optimizer state.
`state_dict`([destination, prefix, keep_vars])	Returns the state of the optimizer as a `dict`.
`step`([closure])	Performs a single optimization step (parameter update).
`zero_grad`()	Sets the gradients of all optimized `torch.Tensor` s to zero.

backward
to

backward(loss)[source]¶

defaults¶: Returns the default hyper parameter of the underlying optimizer

get_current_space()[source]¶: Get currently defined parameter space

get_space() → Dict[str, str][source]¶: Return the dimension space of each parameters

half = False¶

half_args = {}¶

init(params=None, override=False, **kwargs)[source]¶

instantiate the underlying optimizer

Raises:	MissingParameters if an hyper parameter is missing

load_state_dict(state_dict, strict=True, device=None)[source]¶

Loads the optimizer state.

Args:

state_dict (dict): optimizer state. Should be an object returned: from a call to state_dict().

optimizer¶

param_groups¶

state¶

state_dict(destination=None, prefix='', keep_vars=False)[source]¶

Returns the state of the optimizer as a dict.

It contains two entries:

state - a dict holding current optimization state. Its content

differs between optimizer classes.
param_groups - a list containing all parameter groups where each

parameter group is a dict

step(closure=None)[source]¶

Performs a single optimization step (parameter update).

Args:

closure (callable): A closure that reevaluates the model and: returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.

to(device)[source]¶

zero_grad()[source]¶

Sets the gradients of all optimized torch.Tensor s to zero.

Args:

set_to_none (bool): instead of setting to zero, set the grads to None.: This will in general have lower memory footprint, and can modestly improve performance. However, it changes certain behaviors. For example: 1. When the user tries to access a gradient and perform manual ops on it, a None attribute or a Tensor full of 0s will behave differently. 2. If the user requests zero_grad(set_to_none=True) followed by a backward pass, .grads are guaranteed to be None for params that did not receive a gradient. 3. torch.optim optimizers have a different behavior if the gradient is 0 or None (in one case it does the step with a gradient of 0 and in the other it skips the step altogether).

exception olympus.optimizers.RegisteredOptimizerNotFound[source]¶: Bases: Exception

exception olympus.optimizers.UninitializedOptimizer[source]¶: Bases: Exception

olympus.optimizers.get_optimizers_space()[source]¶

olympus.optimizers.get_schedules_space()[source]¶

olympus.optimizers.known_optimizers()[source]¶

olympus.optimizers.register_optimizer(name, factory, override=False)[source]¶