AMSGrad¶
-
class
olympus.optimizers.amsgrad.AMSGrad(model_parameters, weight_decay, lr, beta1, beta2, eps=1e-08)[source]¶ Bases:
olympus.optimizers.base.OptimizerAdapterVariant of Adam
See also :class`.Adam`
References
[1] Tran Thi Phuong, Le Trieu Phong. “On the Convergence Proof of AMSGrad and a New Version”, 7 Apr 2019 Attributes: - model_parameters: List[Tensor]
- weight_decay: float
Add L2 penalty to the cost (encourage smaller weights)
- learning_rate: float = 0.001
- beta1: float ∈ [0, 1) default = 0.9
Exponential decay rates for the fist moment estimate
- beta2: float ∈ [0, 1) default = 0.999
Exponential decay rates for the second moment estimate
- eps: float = 1e-8
Term added to the denominator to improve numerical stability
Methods
add_param_group(param_group)Add a param group to the Optimizersparam_groups.backward(loss)This method comes from FP16 Optimizer, for consistency we add it everywhere defaults()Specifies the hyper parameters defaults get_space()Specifies the hyper parameters that are supported by this optimizer load_state_dict(state_dict[, strict])Loads the optimizer state. state_dict([destination, prefix, keep_vars])Returns the state of the optimizer as a dict.step([closure])Performs a single optimization step (parameter update). zero_grad()Sets the gradients of all optimized torch.Tensors to zero.