AMSGrad

class olympus.optimizers.amsgrad.AMSGrad(model_parameters, weight_decay, lr, beta1, beta2, eps=1e-08)[source]

Bases: olympus.optimizers.base.OptimizerAdapter

Variant of Adam

See also :class`.Adam`

References

[1]Tran Thi Phuong, Le Trieu Phong. “On the Convergence Proof of AMSGrad and a New Version”, 7 Apr 2019
Attributes:
model_parameters: List[Tensor]
weight_decay: float

Add L2 penalty to the cost (encourage smaller weights)

learning_rate: float = 0.001
beta1: float ∈ [0, 1) default = 0.9

Exponential decay rates for the fist moment estimate

beta2: float ∈ [0, 1) default = 0.999

Exponential decay rates for the second moment estimate

eps: float = 1e-8

Term added to the denominator to improve numerical stability

Methods

add_param_group(param_group) Add a param group to the Optimizer s param_groups.
backward(loss) This method comes from FP16 Optimizer, for consistency we add it everywhere
defaults() Specifies the hyper parameters defaults
get_space() Specifies the hyper parameters that are supported by this optimizer
load_state_dict(state_dict[, strict]) Loads the optimizer state.
state_dict([destination, prefix, keep_vars]) Returns the state of the optimizer as a dict.
step([closure]) Performs a single optimization step (parameter update).
zero_grad() Sets the gradients of all optimized torch.Tensor s to zero.
static defaults()[source]

Specifies the hyper parameters defaults

static get_space()[source]

Specifies the hyper parameters that are supported by this optimizer