Optimizers¶
Module contents¶
-
class
olympus.optimizers.Optimizer(name=None, *, params=None, optimizer=None, half=False, loss_scale=1, dynamic_loss_scale=False, scale_window=1000, scale_factor=2, min_loss_scale=None, max_loss_scale=16777216.0, **kwargs)[source]¶ Bases:
torch.optim.optimizer.OptimizerLazy Optimizer that allows you to first fetch the supported parameters using
get_spaceand then initialize the underlying optimizer usinginit_optimizerParameters: - name: str
Name of a registered optimizer
- optimizer: Optimizer
Custom optimizer, mutually exclusive with :param name
- half: bool
Enable fp16 Optimizer
- loss_scale: float (LS)
fp16 optimizer option: loss scale to use
- dynamic_loss_scale: bool
fp16 optimizer option: Enable dynamic loss scaling
- scale_window: int (SW)
dynamic loss scaling option: Increase LS after SW successful iteration
- scale_factor: float (SF)
dynamic loss scaling option: divide LS by SF after an overflow, or multiply LS by SF after SW successful iteration
- min_loss_scale: float
- max_loss_scale: float
Raises: - RegisteredOptimizerNotFound
when using a name of an known optimizers
- MissingArgument:
if name nor optimizer were not set
- WrongParameter
if a wrong hyper parameter is passed in kwargs
Examples
Follows standard Pytorch Optimizer
>>> import torch >>> from olympus.models import Model >>> model = Model('resnet18', ... input_size=(1, 28, 28), ... output_size=10,) >>> >>> x = torch.randn((1, 1, 28, 28)) >>> >>> optimizer = Optimizer('SGD', params=model.parameters(), weight_decay=1e-3, lr=0.001, momentum=0.8) >>> >>> optimizer.zero_grad() >>> loss = model(x).sum() >>> optimizer.backward(loss) >>> optimizer.step()
Can be lazily initialized for hyper parameter search
>>> optimizer = Optimizer('SGD') >>> optimizer.get_space() {'lr': 'loguniform(1e-5, 1)', 'momentum': 'uniform(0, 1)', 'weight_decay': 'loguniform(1e-10, 1e-3)'} >>> optimizer.init(model.parameters(), weight_decay=1e-3, lr=0.001, momentum=0.8) >>> >>> optimizer.zero_grad() >>> loss = model(x).sum() >>> optimizer.backward(loss) >>> optimizer.step()
Switch to a mixed precision optimizer if needed
>>> optimizer = Optimizer('SGD', half=True)
Attributes: defaultsReturns the default hyper parameter of the underlying optimizer
- optimizer
- param_groups
- state
Methods
add_param_group(param_group)Add a param group to the Optimizersparam_groups.get_current_space()Get currently defined parameter space get_space()Return the dimension space of each parameters init([params, override])instantiate the underlying optimizer load_state_dict(state_dict[, strict, device])Loads the optimizer state. state_dict([destination, prefix, keep_vars])Returns the state of the optimizer as a dict.step([closure])Performs a single optimization step (parameter update). zero_grad()Sets the gradients of all optimized torch.Tensors to zero.backward to -
defaults¶ Returns the default hyper parameter of the underlying optimizer
-
half= False¶
-
half_args= {}¶
-
init(params=None, override=False, **kwargs)[source]¶ instantiate the underlying optimizer
Raises: - MissingParameters
if an hyper parameter is missing
-
load_state_dict(state_dict, strict=True, device=None)[source]¶ Loads the optimizer state.
- Args:
- state_dict (dict): optimizer state. Should be an object returned
- from a call to
state_dict().
-
optimizer¶
-
param_groups¶
-
state¶
-
state_dict(destination=None, prefix='', keep_vars=False)[source]¶ Returns the state of the optimizer as a
dict.It contains two entries:
- state - a dict holding current optimization state. Its content
- differs between optimizer classes.
- param_groups - a list containing all parameter groups where each
- parameter group is a dict
-
step(closure=None)[source]¶ Performs a single optimization step (parameter update).
- Args:
- closure (callable): A closure that reevaluates the model and
- returns the loss. Optional for most optimizers.
Note
Unless otherwise specified, this function should not modify the
.gradfield of the parameters.
-
zero_grad()[source]¶ Sets the gradients of all optimized
torch.Tensors to zero.- Args:
- set_to_none (bool): instead of setting to zero, set the grads to None.
- This will in general have lower memory footprint, and can modestly improve performance.
However, it changes certain behaviors. For example:
1. When the user tries to access a gradient and perform manual ops on it,
a None attribute or a Tensor full of 0s will behave differently.
2. If the user requests
zero_grad(set_to_none=True)followed by a backward pass,.grads are guaranteed to be None for params that did not receive a gradient. 3.torch.optimoptimizers have a different behavior if the gradient is 0 or None (in one case it does the step with a gradient of 0 and in the other it skips the step altogether).