Adam

class olympus.optimizers.adam.Adam(model_parameters, weight_decay, lr, beta1, beta2, eps=1e-08)[source]

Bases: olympus.optimizers.base.OptimizerAdapter

Adam (Adaptive Moment estimation), an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. More on arxiv

See also :class`.AMSGrad`

References

[1]Diederik P. Kingma, Jimmy Ba. “Adam: A Method for Stochastic Optimization”, 22 Dec 2014
Attributes:
model_parameters: List[Tensor]
weight_decay: float

Add L2 penalty to the cost (encourage smaller weights)

learning_rate: float = 0.001
beta1: float ∈ [0, 1) default = 0.9

Exponential decay rates for the fist moment estimate

beta2: float ∈ [0, 1) default = 0.999

Exponential decay rates for the second moment estimate

eps: float = 1e-8

Term added to the denominator to improve numerical stability

Methods

add_param_group(param_group) Add a param group to the Optimizer s param_groups.
backward(loss) This method comes from FP16 Optimizer, for consistency we add it everywhere
defaults() Specifies the hyper parameters defaults
get_space() Specifies the hyper parameters that are supported by this optimizer
load_state_dict(state_dict[, strict]) Loads the optimizer state.
state_dict([destination, prefix, keep_vars]) Returns the state of the optimizer as a dict.
step([closure]) Performs a single optimization step (parameter update).
zero_grad() Sets the gradients of all optimized torch.Tensor s to zero.
static defaults()[source]

Specifies the hyper parameters defaults

static get_space()[source]

Specifies the hyper parameters that are supported by this optimizer