Replay Vector

class olympus.reinforcement.replay.ReplayVector[source]

Bases: object

Holds all the state transition of the simulation for training purposes

Notes

Steps:
Number of Simulation Steps
Simulation:
Number of parallel simulation

Examples

The output below shows the size of each fields with num_steps=32 num_simulation=4 and with a state size of 3, 210, 160 (images of the simulation)

>>> replay.describe()
>>> rewards      : torch.Size([32, 4])
>>> states       : torch.Size([32, 4, 3, 210, 160])
>>> next_states  : torch.Size([32, 4, 3, 210, 160])
>>> critic_values: torch.Size([32, 4])
>>> actions      : torch.Size([32, 4])
>>> log_probs    : torch.Size([32, 4])
>>> mask         : torch.Size([32, 4])
Attributes:
transitions:

List of all the stored transitions

state_size:

Size of the simulation state

simulation_batch:

Number of different simulation state in one Transition Struct

grad_batch:

Total number of states in this object grad_batch = simulation_batch * len(transitions)

>>> * <------------------- steps --------------------------------->
>>> ^ [states 0] [states 1] [states 2] [states 3]
>>> | [states 0] [states 1] [states 2]
>>> | [states 0] [states 1] [states 2] [states 3]
>>> v [states 0] [states 1] [states 2] [states 3] [states 4]
>>> * <------------------- steps --------------------------------->
>>>     Batch 0    Batch 1    Batch 2    Batch 3    Batch 4

Methods

actions(self)
Returns:
next_states(self)
Returns:
states(self)
Returns:
append  
critic_values  
describe  
entropies  
log_probs  
masks  
rewards  
to_dict  
actions(self)[source]
Returns:
A tensor of the action that was taken (Steps, Sim, 1)
append(self, transition: olympus.reinforcement.replay.Transition)[source]
critic_values(self)[source]
describe(self)[source]
entropies(self)[source]
grad_batch
log_probs(self)[source]
masks(self)[source]
next_states(self)[source]
Returns:
A tensor of the simulation states (Steps, Sim, State size…)
rewards(self)[source]
simulation_batch
state_size
states(self)[source]
Returns:
A tensor of the simulation states (Steps, Sim, State size…)
to_dict(self)[source]
transitions
class olympus.reinforcement.replay.Transition(state, action, reward, log_prob, entropy, critic, mask, next_state)

Bases: tuple

Attributes:
action

Alias for field number 1

critic

Alias for field number 5

entropy

Alias for field number 4

log_prob

Alias for field number 3

mask

Alias for field number 6

next_state

Alias for field number 7

reward

Alias for field number 2

state

Alias for field number 0

Methods

count(self, value, /) Return number of occurrences of value.
index(self, value[, start, stop]) Return first index of value.
action

Alias for field number 1

critic

Alias for field number 5

entropy

Alias for field number 4

log_prob

Alias for field number 3

mask

Alias for field number 6

next_state

Alias for field number 7

reward

Alias for field number 2

state

Alias for field number 0