colossalai.initialize
- colossalai.initialize.get_default_parser()[source]
Reads user command line and uses an argument parser to parse the input arguments. Input arguments include configuration, host, port, world size, local rank, backend for torch.distributed.
- Returns
Returns the parser with the default arguments, the user may add customized arguments into this parser.
- Return type
Namespace
- colossalai.initialize.launch(config, rank, world_size, host, port, backend='nccl', local_rank=None, seed=1024, verbose=True)[source]
This function first parses the configuration arguments, using
parse_args()
in case one of the input arguments are not given. Then initialize and set distributed environment by calling global_context’s functions.- Parameters
config (Union[str, dict, Config]) – Config file or config file path are both acceptable
rank (int) – Rank for the default process group
world_size (int) – World size of the default process group
host (str) – The master address for distributed training
port (str) – The master port for distributed training
backend (str, optional) – Backend for
torch.distributed
, defaults tonccl
local_rank (int, optional) – Rank for the process on the node and is used to set the default CUDA device, defaults to None. If local_rank = None, the default device ordinal will be calculated automatically.
seed (int, optional) – Specified random seed for every process. Defaults to 1024.
verbose (bool, optional) – Whether to print logs. Defaults to True.
- Raises
Exception – Raise exception when config type is wrong
- colossalai.initialize.launch_from_slurm(config, host, port, backend='nccl', seed=1024, verbose=True)[source]
A wrapper for colossalai.launch for SLURM launcher by reading rank and world size from the environment variables set by SLURM
- Parameters
config (Union[str, dict, Config]) – Config file or config file path are both acceptable
host (str) – The master address for distributed training
port (str) – The master port for distributed training
backend (str, optional) – Backend for
torch.distributed
, defaults tonccl
seed (int, optional) – Specified random seed for every process. Defaults to 1024.
verbose (bool, optional) – Whether to print logs. Defaults to True.
- colossalai.initialize.launch_from_openmpi(config, host, port, backend='nccl', seed=1024, verbose=True)[source]
A wrapper for colossalai.launch for OpenMPI launcher by reading rank and world size from the environment variables set by OpenMPI
- Parameters
config (Union[str, dict, Config]) – Config file or config file path are both acceptable
host (str) – The master address for distributed training
port (str) – The master port for distributed training
backend (str, optional) – Backend for
torch.distributed
, defaults tonccl
seed (int, optional) – Specified random seed for every process. Defaults to 1024.
verbose (bool, optional) – Whether to print logs. Defaults to True.
- colossalai.initialize.launch_from_torch(config, backend='nccl', seed=1024, verbose=True)[source]
A wrapper for colossalai.launch for torchrun or torch.distributed.launch by reading rank and world size from the environment variables set by PyTorch
- Parameters
config (Union[str, dict, Config]) – Config file or config file path are both acceptable
backend (str, optional) – Backend for
torch.distributed
, defaults tonccl
seed (int, optional) – Specified random seed for every process. Defaults to 1024.
verbose (bool, optional) – Whether to print logs. Defaults to True.
- colossalai.initialize.initialize(model, optimizer, criterion=None, train_dataloader=None, test_dataloader=None, lr_scheduler=None, ophooks=None, verbose=True)[source]
Core function to wrap the essential training components with our functionality based on the config which is loaded into gpc.config.
- Parameters
model (
torch.nn.Module
or Callbale) – Your model instance or a function to build the model.optimizer (
torch.optim.optimizer.Optimizer
orType[torch.optim.optimizer]
) – Your optimizer instance.criterion (
torch.nn.modules.loss._Loss
, optional) – Your criterion instance.train_dataloader (
torch.utils.data.DataLoader
, optional) – Dataloader for training.test_dataloader (
torch.utils.data.DataLoader
, optional) – Dataloader for testing.lr_scheduler (
torch.nn.lr_scheduler._LRScheduler
, optional) – Your lr scheduler instance, optional.verbose (bool, optional) – Whether to print logs.
- Returns
A tuple of
(engine, train_dataloader, test_dataloader, lr_scheduler)
where onlyengine
could not be None.- Return type
Tuple (engine, train_dataloader, test_dataloader, lr_scheduler)