Shortcuts

connectomics.data

Datasets

class connectomics.data.dataset.TileDataset(chunk_num=[2, 2, 2], chunk_ind=None, chunk_ind_split=None, chunk_iter=- 1, chunk_stride=True, volume_json='path/to/image.json', label_json=None, valid_mask_json=None, mode='train', pad_size=[0, 0, 0], **kwargs)[source]

Dataset class for large-scale tile-based datasets. Large-scale volumetric datasets are usually stored as individual tiles. Directly loading them as a single array for training and inference is infeasible. This class reads the paths of the tiles and construct smaller chunks for processing.

Parameters
  • chunk_num (list) – volume spliting parameters in \((z, y, x)\) order. Default: \([2, 2, 2]\)

  • chunk_ind (list) – predefined list of chunks. Default: None

  • chunk_ind_split (list) – rank and world_size for spliting chunk_ind in multi-processing. Default: None

  • chunk_iter (int) – number of iterations on each chunk. Default: -1

  • chunk_stride (bool) – allow overlap between chunks. Default: True

  • volume_json (str) – json file for input image. Default: 'path/to/image'

  • label_json (str, optional) – json file for label. Default: None

  • valid_mask_json (str, optional) – json file for valid mask. Default: None

  • mode (str) – 'train', 'val' or 'test'. Default: 'train'

  • pad_size (list) – padding parameters in \((z, y, x)\) order. Default: \([0,0,0]\)

get_coord_name()[source]

Return the filename suffix based on the chunk coordinates.

loadchunk()[source]

Load the chunk based on current coordinates and construct a VolumeDataset for processing.

updatechunk(do_load=True)[source]

Update the coordinates to a new chunk in the large volume.

class connectomics.data.dataset.VolumeDataset(volume, label=None, valid_mask=None, valid_ratio=0.5, sample_volume_size=(8, 64, 64), sample_label_size=(8, 64, 64), sample_stride=(1, 1, 1), augmentor=None, target_opt=['1'], weight_opt=[['1']], erosion_rates=None, mode='train', do_2d=False, iter_num=- 1, reject_size_thres=0, reject_diversity=0, reject_p=0.95, data_mean=0.5, data_std=0.5)[source]

Dataset class for volumetric image datasets. At training time, subvolumes are randomly sampled from all the large input volumes with (optional) rejection sampling to increase the frequency of foreground regions in a batch. At inference time, subvolumes are yielded in a sliding-window manner with overlap to counter border artifacts.

Parameters
  • volume (list) – list of image volumes.

  • label (list, optional) – list of label volumes. Default: None

  • valid_mask (list, optional) – list of valid masks. Default: None

  • valid_ratio (float) – volume ratio threshold for valid samples. Default: 0.5

  • sample_volume_size (tuple, int) – model input size.

  • sample_label_size (tuple, int) – model output size.

  • sample_stride (tuple, int) – stride size for sampling.

  • augmentor (connectomics.data.augmentation.composition.Compose, optional) – data augmentor for training. Default: None

  • target_opt (list) – list the model targets generated from segmentation labels.

  • weight_opt (list) – list of options for generating pixel-wise weight masks.

  • mode (str) – 'train', 'val' or 'test'. Default: 'train'

  • do_2d (bool) – load 2d samples from 3d volumes. Default: False

  • iter_num (int) – total number of training iterations (-1 for inference). Default: -1

  • reject_size_thres (int, optional) – threshold to decide if a sampled volumes contains foreground objects. Default: 0

  • reject_diversity (int, optional) – threshold to decide if a sampled volumes contains multiple objects. Default: 0

  • reject_p (float, optional) – probability of rejecting non-foreground volumes. Default: 0.95

Note

For relatively small volumes, the total number of possible subvolumes can be smaller than the total number of samples required in training (the product of total iterations and mini-natch size), which raises StopIteration. Therefore the dataset length is also decided by the training settings.

connectomics.data.dataset.build_dataloader(cfg, augmentor, mode='train', dataset=None, rank=None)[source]

Prepare dataloader for training and inference.

connectomics.data.dataset.get_dataset(cfg, augmentor, mode='train', rank=None, dir_name_init=None, img_name_init=None)[source]

Prepare dataset for training and inference.

Parameters
  • dir_name_init (Optional[list]) –

  • img_name_init (Optional[list]) –

Augmentations

class connectomics.data.augmentation.Compose(transforms=[], input_size=(8, 256, 256), smooth=True, keep_uncropped=False, keep_non_smoothed=False, additional_targets=None)[source]

Composing a list of data transforms.

The sample size of the composed augmentor can be larger than the specified input size of the model to ensure that all pixels are valid after center-crop.

Parameters
  • transforms (list) – list of transformations to compose.

  • input_size (tuple) – input size of model in \((z, y, x)\) order. Default: \((8, 256, 256)\)

  • smooth (bool) – smoothing the object mask with Gaussian filtering. Default: True

  • keep_uncropped (bool) – keep uncropped image and label. Default: False

  • keep_non_smooth (bool) – keep the non-smoothed object mask. Default: False

  • additional_targets (dict, optional) – additional targets to augment. Default: None

  • keep_non_smoothed (bool) –

Examples::
>>> # specify addtional targets besides 'image'
>>> kwargs = {'additional_targets': {'label': 'mask'}}
>>> augmentor = Compose([Rotate(p=1.0, **kwargs),
>>>                      Flip(p=1.0, **kwargs),
>>>                      Elastic(alpha=12.0, p=0.75, **kwargs),
>>>                      Grayscale(p=0.75, **kwargs),
>>>                      MissingParts(p=0.9, **kwargs)],
>>>                      input_size = (8, 256, 256), **kwargs)
>>> sample = {'image':input, 'label':label}
>>> augmented = augmentor(data)
>>> out_input, out_label = augmented['image'], augmented['label']
class connectomics.data.augmentation.CutBlur(length_ratio=0.25, down_ratio_min=2.0, down_ratio_max=8.0, downsample_z=False, p=0.5, additional_targets=None)[source]

3D CutBlur data augmentation, adapted from https://arxiv.org/abs/2004.00448.

Randomly downsample a cuboid region in the volume to force the model to learn super-resolution when making predictions. This augmentation is only applied to images.

Parameters
  • length_ratio (float) – the ratio of the cuboid length compared with volume length.

  • down_ratio_min (float) – minimal downsample ratio to generate low-res region.

  • down_ratio_max (float) – maximal downsample ratio to generate low-res region.

  • downsample_z (bool) – downsample along the z axis (default: False).

  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

set_params()[source]

There is no change in sample size.

class connectomics.data.augmentation.CutNoise(length_ratio=0.25, mode='uniform', scale=0.2, p=0.5, additional_targets=None)[source]

3D CutNoise data augmentation.

Randomly add noise to a cuboid region in the volume to force the model to learn denoising when making predictions. This augmentation is only applied to images.

Parameters
  • length_ratio (float) – the ratio of the cuboid length compared with volume length.

  • mode (string) – the distribution of the noise pattern. Default: 'uniform'.

  • scale (float) – scale of the random noise. Default: 0.2.

  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

set_params()[source]

There is no change in sample size.

class connectomics.data.augmentation.DataAugment(p=0.5, additional_targets=None)[source]

DataAugment interface. A data augmentor needs to conduct the following steps:

  1. Set sample_params at initialization to compute required sample size.

  2. Randomly generate augmentation parameters for the current transform.

  3. Apply the transform to a pair of images and corresponding labels.

All the real data augmentations (except mix-up augmentor and test-time augmentor) should be a subclass of this class.

Parameters
  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

abstract set_params()[source]

Calculate the appropriate sample size with data augmentation.

Some data augmentations (wrap, misalignment, etc.) require a larger sample size than the original, depending on the augmentation parameters that are randomly chosen. This function takes the data augmentation parameters and returns an updated data sampling size accordingly.

class connectomics.data.augmentation.Elastic(alpha=16.0, sigma=4.0, p=0.5, additional_targets=None)[source]

Elastic deformation of images as described in [Simard2003] (with modifications). The implementation is based on https://gist.github.com/erniejunior/601cdf56d2b424757de5. This augmentation is applied to both images and masks.

Simard2003

Simard, Steinkraus and Platt, “Best Practices for Convolutional Neural Networks applied to Visual Document Analysis”, in Proc. of the International Conference on Document Analysis and Recognition, 2003.

Parameters
  • alpha (float) – maximum pixel-moving distance of elastic deformation. Default: 10.0

  • sigma (float) – standard deviation of the Gaussian filter. Default: 4.0

  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

set_params()[source]

The rescale augmentation is only applied to the xy-plane. The required sample size before transformation need to be larger as decided by the maximum pixel-moving distance (self.alpha).

class connectomics.data.augmentation.Flip(do_ztrans=0, p=0.5, additional_targets=None)[source]

Randomly flip along z-, y- and x-axes as well as swap y- and x-axes for anisotropic image volumes. For learning on isotropic image volumes set do_ztrans to 1 to swap z- and x-axes (the inputs need to be cubic). This augmentation is applied to both images and masks.

Parameters
  • do_ztrans (int) – set to 1 to swap z- and x-axes for isotropic data. Default: 0

  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

set_params()[source]

There is no change in sample size.

class connectomics.data.augmentation.Grayscale(contrast_factor=0.3, brightness_factor=0.3, mode='mix', invert=False, invert_p=0.0, p=0.5, additional_targets=None)[source]

Grayscale intensity augmentation, adapted from ELEKTRONN (http://elektronn.org/).

Randomly adjust contrast/brightness, randomly invert the color space and apply gamma correction. This augmentation is only applied to images.

Parameters
  • contrast_factor (float) – intensity of contrast change. Default: 0.3

  • brightness_factor (float) – intensity of brightness change. Default: 0.3

  • mode (string) – one of '2D', '3D' or 'mix'. Default: 'mix'

  • invert (bool) – whether to invert the images. Default: False

  • invert_p (float) – probability of inverting the images. Default: 0.0

  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

set_params()[source]

There is no change in sample size.

class connectomics.data.augmentation.MisAlignment(displacement=16, rotate_ratio=0.0, p=0.5, additional_targets=None)[source]

Mis-alignment data augmentation of image stacks. This augmentation is applied to both images and masks.

Parameters
  • displacement (int) – maximum pixel displacement in xy-plane. Default: 16

  • rotate_ratio (float) – ratio of rotation-based mis-alignment. Default: 0.0

  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

set_params()[source]

The mis-alignment augmentation is only applied to the xy-plane. The required sample size before transformation need to be larger as decided by self.displacement.

class connectomics.data.augmentation.MissingParts(iterations=64, p=0.5, additional_targets=None)[source]

Missing-parts augmentation of image stacks. This augmentation is only applied to images.

Parameters
  • iterations (int) – number of iterations in binary dilation. Default: 64

  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

set_params()[source]

There is no change in sample size.

class connectomics.data.augmentation.MissingSection(num_sections=2, p=0.5, additional_targets=None)[source]

Missing-section augmentation of image stacks. This augmentation is applied to both images and masks.

Parameters
  • num_sections (int) – number of missing sections. Default: 2

  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

set_params()[source]

The missing-section augmentation is only applied to the z-axis. The required sample size before transformation need to be larger as decided by self.num_sections.

class connectomics.data.augmentation.MixupAugmentor(min_ratio=0.7, max_ratio=0.9, num_aug=2)[source]

Mixup augmentor (experimental). Conduct linear interpolation between two image samples. The segmentation mask of the sample with higher weight should be used with the augmented output.

The input can be a numpy.ndarray or torch.Tensor of shape \((B, C, Z, Y, X)\).

Parameters
  • min_ratio (float) – minimal interpolation ratio of the target volume. Default: 0.7

  • max_ratio (float) – maximal interpolation ratio of the target volume. Default: 0.9

  • num_aug (int) – number of volumes to be augmented in a batch. Default: 2

Examples::
>>> from connectomics.data.augmentation import MixupAugmentor
>>> mixup_augmentor = MixupAugmentor(num_aug=2)
>>> volume = mixup_augmentor(volume)
>>> pred = model(volume)
class connectomics.data.augmentation.MotionBlur(sections=2, kernel_size=11, p=0.5, additional_targets=None)[source]

Motion blur data augmentation of image stacks. This augmentation is only applied to images.

Parameters
  • sections (int) – number of sections along z dimension to apply motion blur. Default: 2

  • kernel_size (int) – kernel size for motion blur. Default: 11

  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

set_params()[source]

There is no change in sample size.

class connectomics.data.augmentation.Rescale(low=0.8, high=1.25, fix_aspect=False, p=0.5, additional_targets=None)[source]

Rescale augmentation. This augmentation is applied to both images and masks.

Parameters
  • low (float) – lower bound of the random scale factor. Default: 0.8

  • high (float) – higher bound of the random scale factor. Default: 1.2

  • fix_aspect (bool) – fix aspect ratio or not. Default: False

  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

set_params()[source]

The rescale augmentation is only applied to the xy-plane. The required sample size before transformation need to be larger as decided by the lowest scaling factor (self.low).

class connectomics.data.augmentation.Rotate(rot90=True, p=0.5, additional_targets=None)[source]

Continuous rotatation of the xy-plane.

If the rotation degree is arbitrary, the sample size for x- and y-axes should be at least \(\sqrt{2}\) times larger than the input size to ensure there is no non-valid region after center-crop. This augmentation is applied to both images and masks.

Parameters
  • rot90 (bool) – rotate the sample by only 90 degrees. Default: True

  • p (float) – probability of applying the augmentation. Default: 0.5

  • additional_targets (dict, optional) – additional targets to augment. Default: None

set_params()[source]

The rescale augmentation is only applied to the xy-plane. If self.rot90=True, then there is no change in sample size. For arbitrary rotation degree, the required sample size before transformation need to be \(\sqrt{2}\) times larger.

class connectomics.data.augmentation.TestAugmentor(mode='mean', do_2d=False, num_aug=None, scale_factors=[1.0, 1.0, 1.0], inference_act=None)[source]

Test-time spatial augmentor.

Our test-time augmentation includes horizontal/vertical flips over the xy-plane, swap of x and y axes, and flip in z-dimension, resulting in 16 variants. Considering inference efficiency, we also provide the option to apply only horizontal/vertical flips over the xy-plane, resulting in 4 variants. The augmentation can also be applied to 2D outputs without the z-flip. By default the test-time augmentor returns the pixel-wise mean value of the predictions.

Parameters
  • mode (str) – one of 'min', 'max' or 'mean'. Default: 'mean'

  • do_2d (bool) – the test-time augmentation is applied to 2d images. Default: False

  • num_aug (int, optional) – number of data augmentation variants: 4, 8 or 16 (3D only). Default: None

  • scale_factors (List[float]) – scale factors for resizing the model output. Default: [1.0, 1.0, 1.0]

Examples::
>>> from connectomics.data.augmentation import TestAugmentor
>>> test_augmentor = TestAugmentor(mode='mean', num_aug=16)
>>> output = test_augmentor(model, inputs) # output is a numpy.ndarray on CPU
classmethod build_from_cfg(cfg, activation=True)[source]

Build a TestAugmentor from configs.

update_name(name)[source]

Update the name of the output file to indicate applied test-time augmentations.

connectomics.data.augmentation.build_train_augmentor(cfg, keep_uncropped=False, keep_non_smoothed=False)[source]

Build the training augmentor based on the options specified in the configuration file.

Parameters
  • cfg (yacs.config.CfgNode) – YACS configuration options.

  • keep_uncropped (bool) – keep uncropped data in the output. Default: False

  • keep_non_smoothed (bool) – keep the masks before smoothing in the output. Default: False

Note

The two arguments, keep_uncropped and keep_non_smoothed, are used only for debugging, which are False by defaults and can not be adjusted in the config file.

Utility Functions