nlpstack.data.dataloaders module#

class nlpstack.data.dataloaders.BasicBatchSampler(batch_size, shuffle=False, drop_last=False)[source]#

Bases: BatchSampler

A basic batch sampler that generates batches of indices from a dataset.

Parameters:
  • batch_size (int) – The batch size.

  • shuffle (bool) – Whether to shuffle the dataset before sampling.

  • drop_last (bool) – Whether to drop the last batch if it is smaller than the batch size.

get_batch_indices(dataset)[source]#

Returns an iterator over batches of indices from the dataset.

Parameters:

dataset (Sequence[Instance]) – The dataset to sample from.

Return type:

Iterator[List[int]]

Returns:

An iterator over batches of indices from the dataset.

get_batch_size()[source]#

Returns the batch size.

Return type:

int

Returns:

The batch size.

get_num_batches(dataset)[source]#

Returns the number of batches in the dataset.

Parameters:

dataset (Sequence[Instance]) – The dataset to sample from.

Return type:

int

Returns:

The number of batches in the dataset.

class nlpstack.data.dataloaders.BatchIterator(dataset, sampler)[source]#

Bases: object

Parameters:
  • dataset (Sequence[Instance]) –

  • sampler (BatchSampler) –

class nlpstack.data.dataloaders.BatchSampler[source]#

Bases: object

A batch sampler is responsible for generating batches of indices from a dataset.

get_batch_indices(dataset)[source]#

Returns an iterator over batches of indices from the dataset.

Parameters:

dataset (Sequence[Instance]) – The dataset to sample from.

Return type:

Iterator[List[int]]

Returns:

An iterator over batches of indices from the dataset.

get_batch_size()[source]#

Returns the batch size.

Return type:

int

Returns:

The batch size.

get_num_batches(dataset)[source]#

Returns the number of batches in the dataset.

Parameters:

dataset (Sequence[Instance]) – The dataset to sample from.

Return type:

int

Returns:

The number of batches in the dataset.

class nlpstack.data.dataloaders.DataLoader(sampler)[source]#

Bases: object

A data loader is responsible for iterating over batches of instances from a dataset.

Parameters:

sampler (BatchSampler) – The batch sampler to use.

get_batch_size()[source]#

Returns the batch size.

Return type:

int

Returns:

The batch size.