AMSGrad¶

class olympus.optimizers.amsgrad.AMSGrad(model_parameters, weight_decay, lr, beta1, beta2, eps=1e-08)[source]¶

Variant of Adam

See also :class`.Adam`

References

[1]	Tran Thi Phuong, Le Trieu Phong. “On the Convergence Proof of AMSGrad and a New Version”, 7 Apr 2019

Attributes:

model_parameters: List[Tensor]
weight_decay: float: Add L2 penalty to the cost (encourage smaller weights)
learning_rate: float = 0.001
beta1: float ∈ [0, 1) default = 0.9: Exponential decay rates for the fist moment estimate
beta2: float ∈ [0, 1) default = 0.999: Exponential decay rates for the second moment estimate
eps: float = 1e-8: Term added to the denominator to improve numerical stability

Methods

`add_param_group`(param_group)	Add a param group to the `Optimizer` s `param_groups`.
`backward`(loss)	This method comes from FP16 Optimizer, for consistency we add it everywhere
`defaults`()	Specifies the hyper parameters defaults
`get_space`()	Specifies the hyper parameters that are supported by this optimizer
`load_state_dict`(state_dict[, strict])	Loads the optimizer state.
`state_dict`([destination, prefix, keep_vars])	Returns the state of the optimizer as a `dict`.
`step`([closure])	Performs a single optimization step (parameter update).
`zero_grad`()	Sets the gradients of all optimized `torch.Tensor` s to zero.

static get_space()[source]¶: Specifies the hyper parameters that are supported by this optimizer