Replay Vector¶

class olympus.reinforcement.replay.ReplayVector[source]¶

Bases: object

Holds all the state transition of the simulation for training purposes

Notes

Steps:: Number of Simulation Steps
Simulation:: Number of parallel simulation

Examples

The output below shows the size of each fields with num_steps=32 num_simulation=4 and with a state size of 3, 210, 160 (images of the simulation)

>>> replay.describe()
>>> rewards      : torch.Size([32, 4])
>>> states       : torch.Size([32, 4, 3, 210, 160])
>>> next_states  : torch.Size([32, 4, 3, 210, 160])
>>> critic_values: torch.Size([32, 4])
>>> actions      : torch.Size([32, 4])
>>> log_probs    : torch.Size([32, 4])
>>> mask         : torch.Size([32, 4])

Attributes:

transitions:

List of all the stored transitions

state_size:

Size of the simulation state

simulation_batch:

Number of different simulation state in one Transition Struct

grad_batch:

Total number of states in this object grad_batch = simulation_batch * len(transitions)

>>> * <------------------- steps --------------------------------->
>>> ^ [states 0] [states 1] [states 2] [states 3]
>>> | [states 0] [states 1] [states 2]
>>> | [states 0] [states 1] [states 2] [states 3]
>>> v [states 0] [states 1] [states 2] [states 3] [states 4]
>>> * <------------------- steps --------------------------------->
>>>     Batch 0    Batch 1    Batch 2    Batch 3    Batch 4

Methods

actions(self)

Returns:

next_states(self)

Returns:

states(self)

Returns:

append
critic_values
describe
entropies
log_probs
masks
rewards
to_dict

actions(self)[source]¶

Returns:	A tensor of the action that was taken (Steps, Sim, 1)

append(self, transition: olympus.reinforcement.replay.Transition)[source]¶

critic_values(self)[source]¶

describe(self)[source]¶

entropies(self)[source]¶

grad_batch¶

log_probs(self)[source]¶

masks(self)[source]¶

next_states(self)[source]¶

Returns:	A tensor of the simulation states (Steps, Sim, State size…)

rewards(self)[source]¶

simulation_batch¶

state_size¶

states(self)[source]¶

Returns:	A tensor of the simulation states (Steps, Sim, State size…)

to_dict(self)[source]¶

transitions¶

class olympus.reinforcement.replay.Transition(state, action, reward, log_prob, entropy, critic, mask, next_state)¶

Bases: tuple

Attributes:	`action` Alias for field number 1 `critic` Alias for field number 5 `entropy` Alias for field number 4 `log_prob` Alias for field number 3 `mask` Alias for field number 6 `next_state` Alias for field number 7 `reward` Alias for field number 2 `state` Alias for field number 0

Methods

`count`(self, value, /)	Return number of occurrences of value.
`index`(self, value[, start, stop])	Return first index of value.

action¶: Alias for field number 1

critic¶: Alias for field number 5

entropy¶: Alias for field number 4

log_prob¶: Alias for field number 3

mask¶: Alias for field number 6

next_state¶: Alias for field number 7

reward¶: Alias for field number 2

state¶: Alias for field number 0