Launcher

Olympus launcher is used as the common entry point for all Olympus baselines. While each baselines have their own entry point that can be used independently, The launcher has a few features of its own that can help user running the baselines as efficiently as possible.

The biggest features that the launcher brings is parallelization and monitoring.

Parallelization

  • --devices 0,1,2,3: List of devices that are available to olympus
    If not provided it will query pytorch and use all available GPUs
  • --device-sharing: Enable GPUs to be shared between workers.
    By default devices a split between workers. In some cases it might be interesting to run multiple workers on the same set of GPUs to increase efficiency (better utilization)
  • --workers: The number of worker you want the launcher to spawn.
    Workers execute independent experiments.

The image below illustrate the following configuration --devices 0,1,2,3 --workers 2 --no-device-sharing. Where two workers are spawn each using 2 GPUs to train a model using the distributed data parallel API or pytorch.

Parallelization  example

The image below illustrate the following configuration --devices 0,1,2,3 --workers 2 --device-sharing.

Parallelization example with GPU sharing

Monitoring

After all setting is done and all workers have been launch, the script starts to monitor GPU usage. It will display its results at the end. This can be used to make sure you are using the resources well.

  • --no-mon: Disable GPU monitoring

You can find below an output example of the monitoring process.

{
  "temperature.gpu": 34.083333333333336,
  "utilization.gpu": 10.333333333333334,
  "utilization.memory": 0.0,
  "memory.total": 32480.0,
  "memory.free": 31672.833333333332,
  "memory.used": 807.1666666666666
}
class olympus.baselines.launch.Worker(worker_id: int, devices: List[int], env: Dict = <factory>, processes: List = <factory>)[source]

Bases: object

Methods

launch(task_name, script_args)
Parameters:
launch(task_name, script_args)[source]
Parameters:
task_name: str

name of the task to run

script_args: List[str]

list of arguments to pass to the task

olympus.baselines.launch.arguments()[source]
olympus.baselines.launch.cleanup(all_processes)[source]
olympus.baselines.launch.get_available_port()[source]
olympus.baselines.launch.get_available_tasks()[source]
olympus.baselines.launch.get_device_count(devices)[source]
olympus.baselines.launch.local_multigpu_launch(task_name, script_args, job_env, device_id, rank, world_size, port)[source]

Launch the task using multiple GPUs

olympus.baselines.launch.main(argv=None)[source]
olympus.baselines.launch.make_device_groups(worker_count, devices, shared, cpu_mode)[source]
olympus.baselines.launch.run(workers, all_processes, task, script_args)[source]
olympus.baselines.launch.show_resource_stats(monitor)[source]
olympus.baselines.launch.simple_launch(task_name, script_args)[source]

Launch the task without creating another python interpreter

olympus.baselines.launch.single_gpu_launch(task_name, script_args, job_env, device_id, rank, world_size, port)[source]

Launch the task for a given GPU

olympus.baselines.launch.single_worker_single_gpu(task, args, no_mon)[source]