Base¶
-
class
olympus.optimizers.base.OptimizerAdapter(factory, *args, **kwargs)[source]¶ Bases:
olympus.optimizers.base.OptimizerInterfaceWraps an existing Pytorch Optimizer into an Olympus optimizer
Attributes: - param_groups
- state
Methods
add_param_group(param_group)Add a param group to the Optimizersparam_groups.backward(loss)This method comes from FP16 Optimizer, for consistency we add it everywhere defaults()Specifies the hyper parameters defaults get_space()Specifies the hyper parameters that are supported by this optimizer load_state_dict(state_dict[, strict])Loads the optimizer state. state_dict([destination, prefix, keep_vars])Returns the state of the optimizer as a dict.step([closure])Performs a single optimization step (parameter update). zero_grad()Sets the gradients of all optimized torch.Tensors to zero.-
add_param_group(param_group)[source]¶ Add a param group to the
Optimizersparam_groups.This can be useful when fine tuning a pre-trained network as frozen layers can be made trainable and added to the
Optimizeras training progresses.- Args:
- param_group (dict): Specifies what Tensors should be optimized along with group specific optimization options.
-
load_state_dict(state_dict, strict=True)[source]¶ Loads the optimizer state.
- Args:
- state_dict (dict): optimizer state. Should be an object returned
- from a call to
state_dict().
-
param_groups¶
-
state¶
-
state_dict(destination=None, prefix='', keep_vars=False)[source]¶ Returns the state of the optimizer as a
dict.It contains two entries:
- state - a dict holding current optimization state. Its content
- differs between optimizer classes.
- param_groups - a list containing all parameter groups where each
- parameter group is a dict
-
step(closure=None)[source]¶ Performs a single optimization step (parameter update).
- Args:
- closure (callable): A closure that reevaluates the model and
- returns the loss. Optional for most optimizers.
Note
Unless otherwise specified, this function should not modify the
.gradfield of the parameters.
-
zero_grad()[source]¶ Sets the gradients of all optimized
torch.Tensors to zero.- Args:
- set_to_none (bool): instead of setting to zero, set the grads to None.
- This will in general have lower memory footprint, and can modestly improve performance.
However, it changes certain behaviors. For example:
1. When the user tries to access a gradient and perform manual ops on it,
a None attribute or a Tensor full of 0s will behave differently.
2. If the user requests
zero_grad(set_to_none=True)followed by a backward pass,.grads are guaranteed to be None for params that did not receive a gradient. 3.torch.optimoptimizers have a different behavior if the gradient is 0 or None (in one case it does the step with a gradient of 0 and in the other it skips the step altogether).
-
class
olympus.optimizers.base.OptimizerInterface(params)[source]¶ Bases:
torch.optim.optimizer.OptimizerBase Olympus Optimizer
Methods
add_param_group(param_group)Add a param group to the Optimizers param_groups.backward(loss)This method comes from FP16 Optimizer, for consistency we add it everywhere defaults()Specifies the hyper parameters defaults get_space()Specifies the hyper parameters that are supported by this optimizer load_state_dict(state_dict[, strict])Loads the optimizer state. state_dict([destination, prefix, keep_vars])Returns the state of the optimizer as a dict.step([closure])Performs a single optimization step (parameter update). zero_grad()Sets the gradients of all optimized torch.Tensors to zero.-
add_param_group(param_group)[source]¶ Add a param group to the
Optimizers param_groups.This can be useful when fine tuning a pre-trained network as frozen layers can be made trainable and added to the
Optimizeras training progresses.- Args:
- param_group (dict): Specifies what Tensors should be optimized along with group specific optimization options.
-
load_state_dict(state_dict, strict=True)[source]¶ Loads the optimizer state.
- Args:
- state_dict (dict): optimizer state. Should be an object returned
- from a call to
state_dict().
-
state_dict(destination=None, prefix='', keep_vars=False)[source]¶ Returns the state of the optimizer as a
dict.It contains two entries:
- state - a dict holding current optimization state. Its content
- differs between optimizer classes.
- param_groups - a list containing all parameter groups where each
- parameter group is a dict
-
step(closure=None)[source]¶ Performs a single optimization step (parameter update).
- Args:
- closure (callable): A closure that reevaluates the model and
- returns the loss. Optional for most optimizers.
Note
Unless otherwise specified, this function should not modify the
.gradfield of the parameters.
-
zero_grad()[source]¶ Sets the gradients of all optimized
torch.Tensors to zero.- Args:
- set_to_none (bool): instead of setting to zero, set the grads to None.
- This will in general have lower memory footprint, and can modestly improve performance.
However, it changes certain behaviors. For example:
1. When the user tries to access a gradient and perform manual ops on it,
a None attribute or a Tensor full of 0s will behave differently.
2. If the user requests
zero_grad(set_to_none=True)followed by a backward pass,.grads are guaranteed to be None for params that did not receive a gradient. 3.torch.optimoptimizers have a different behavior if the gradient is 0 or None (in one case it does the step with a gradient of 0 and in the other it skips the step altogether).
-