Class methods

Module storing the main Tensor class and related methods.

class pygrad.tensor.Tensor(value: list | ~numpy.ndarray, label: str = '', dtype=<class 'numpy.float64'>, learnable: bool = True, leaf: bool = False, _prev: tuple = ())

The main Tensor object.

property T: Tensor

Transposes self.value.

If self is 0 or 1 dimensional, no self is returned without modification. Otherwise, the last two dimensions of self are flipped.

__add__(other: int | float | integer | floating | ndarray | Tensor) Tensor

Performs self.Tensor + other, returning a new Tensor object.

Parameters:

other ((int, float, np.integer, np.floating, np.ndarray, list, Tensor)) – the object to add with shape broadcastable to self.shape.

__getitem__(idcs)

Fetches self.value[idcs]. Identical to NumPy syntax for fetching indices.

__init__(value: list | ~numpy.ndarray, label: str = '', dtype=<class 'numpy.float64'>, learnable: bool = True, leaf: bool = False, _prev: tuple = ()) None

Initializes a Tensor.

A Tensor at all times holds:
  • A value assigned to it indicating its value.

  • A gradient of the same shape as its value indicating its gradient.

  • A function indicating how to pass a gradient to its children Tensors.

  • A computational graph for all Tensors that eventually resulted in the Tensor.

Tensors store operations performed, allowing them to calculate the gradients of all Tensors in their computational graph via .backward().

Parameters:
  • value (Is either a numeric value or a list/np.ndarray. Complex or boolean values are not supported. Value is automatically recast to dtype.) – The input Tensor value.

  • label – A string giving the Tensor an identifiable name. Defaults to “”.

  • dtype – The dtype to cast the input value. Must be one of np.bool, np.integer, np.floating.

  • learnable – Optional. A boolean indicating whether or not to compute gradients for _prev Tensors this Tensor has. Setting this to False means the computational graph will stop at this node. This node will still have gradients computed.

  • leaf – Optional. A boolean indicating if the Tensor is to be considered a leaf node in the computational graph. Leaf nodes will have gradients tracked, but won’t appear as a weight in self.weights.

  • _prev – Optional. An empty tuple or a tuple of Tensor objects, referencing objects to pass gradients too when doing a backwards pass. _prev is automatically filled when performing a tensor method, manual specification is not necessary.

Returns:

A produced Tensor.

__matmul__(other: ndarray | Tensor) Tensor

Performs matrix multiplication with self and other: self@other.

Matrix multiplication is performed between the last two dimensions of self and other, broadcasting all those remaining.

Parameters:

other – The matrix to perform matrix multiplication against.

__mul__(other: int | float | integer | floating | ndarray | Tensor) Tensor

Performs multiplication between the values of self and other. If self and other are matrices, this is equivalent to the hadamard product.

Parameters:

other (One of int, float, np.integer, np.floating, np.ndarray, or Tensor.) – The value to multiply against. Must be broadcastable in shape to self.

__neg__() Tensor

Performs -1*self.Tensor, returning a new Tensor object.

__pow__(n: int | float | integer | floating) Tensor

Raises the Tensor to a power of n.

Parameters:

n – The power value to raise the current Tensor

__repr__() str

Return repr(self).

backward(reset_grad=True) None

Computes the gradients of all Tensors in self’s computation graph, storing results in self.topo and self.weights.

self is initialized with gradient 1, incrementing all children gradients by this multiplier.

This method first creates two topological graphs of self.
  1. A backwards-pass graph including all Tensors contributing to self.

  2. The backwards-pass graph, now omitting all Tensors with leaf=True.

    This is useful for seeing the exact parameters contributing to the current Tensor, ignoring any Tensors that were produced as intermediary values for producing the current Tensor.

Parameters:

reset_grad – Whether or not to reset the current backwards pass gradients.

conv2D(other: Tensor) Tensor

Applies a 2D convolution on other using self as the kernel. Strides are set to 1 by default, with no padding. Output a new Tensor.

If self.shape = (1, out_channels, in_channels, kH, kW)

other.shape = (bs, in_channels, H, W) output.shape = (bs, out_channels, H-kH+1, W-kW+1)

Parameters:

other (Tensor.) – A 4D Tensor.

create_graph() tuple[list, list]

Creates two reverse-ordered topological graphs: topo and weights.

_topo_ is the full backwards pass computational graph, which includes all intermediary Tensors. _weights_ is a subgraph, containing only the Tensors containing learnable weights.

For example, performing y = x**2 + 1 will create the following graphs:

  • topo, containing: x**2 + 1, x**2, 1, and x.

  • weights, containing: x

although all nodes in topo were responsible for producing a gradient for x, only the x node contains weights which would need to be updated by this gradient.

Both graphs are lists that perform a pre-order traversal starting at self as the root node.

Returns:

topo[list], weights[list]

log() Tensor

Applies the natural logarithm to self.

Negative values will raise an error.

mask_idcs(mask_idcs: tuple, value: float = 0.0) Tensor

Applies a mask to the Tensor via an array indicating indices of self.value. Outputs a new Tensor.

Parameters:
  • mask_idcs (tuple) – tuple of indices from which to mask values of self.

  • value (float) – The mask value. Defaults to 0.0 indicating that chosen indices are now set to 0.

mean(axis: int | tuple | None = -1, keepdims: bool = True) Tensor

Returns a new Tensor with value being the average of self’s value along a given axis.

Parameters:
  • axis (None, int, tuple of ints) – The axis to perform a mean over.

  • keepdims (bool) – Whether or not to keep the existing dimensions. True is yes.

new_value(x) None

Assigns a new value to the Tensor, Tensor.value = x, and resets gradients to 0 without changing computational graph topology.

relu() Tensor

Applies a point-wise ReLU to the Tensor values. Outputs a new Tensor.

reset_grad() None

Resets the gradient of the Tensor to 0, maintaining all other attributes.

reshape(shape: tuple) Tensor

Returns a new Tensor with Tensor.value.shape == shape.

Parameters:

shape – A tuple indicating the new shape self.value has to take.

sigmoid() Tensor

Applies sigmoid activation to self, returning a new Tensor.

self.value has to be of shape (…, 1).

softmax() Tensor

Applies softmax to self. Softmax is performed on the last axis.

self.shape has to be either 3 or 4 dimensional.
  • (B, H, W)

  • (B, O, H, W)

Returns a copy of the Tensor, with the softmax’d value.

softmax_log() Tensor

Computes .softmax().log() in one go. Use this if the former has numerical issues.

self.shape has to be either 3 or 4 dimensional.
  • (B, H, W)

  • (B, O, H, W)

Returns a copy of the Tensor, with the softmax’d value.

std(axis: int, keepdim=True) Tensor

Returns a new Tensor with value being the standard deviation of self along the specified axis. No bias correction performed.

Parameters:
  • axis (integer) – axis over which to perform a standard deviation over

  • keepdim (bool (defaults to True)) – whether or not to keep the axis dimension as self

sum(axis: None | int | tuple = None, keepdims: bool = False) Tensor

Performs a summation on self.value according to axis chosen. Returns a new Tensor object.

Parameters:
  • axis (None (default), int, or tuple of ints.) – Determines which axis to sum self.value over. Defaults to None: summing over all axes.

  • keepdims – Indicates whether to keep the current shape of self after summation.

tanh() Tensor

Applies a point-wise tanh activation to the Tensor. Outputs a new Tensor.

transpose(axes: None | tuple | list) Tensor

Returns a new Tensor with the same data but transposed axes.

Parameters:

axes – If specified, it must be a tuple or list which contains a permutation of [0,1,…,N-1] where N is the number of axes of self. The ith axis of the returned array will correspond to the axis numbered axes[i] of the input. If not specified, defaults to the reverse of the order of the axes.

pygrad.tensor.array(*args, **kwargs) Tensor

Helper function designed for initializing a Tensor object in the same way as a NumPy array. Ensure inputs match those of Tensor.

Returns:

A Tensor object with fields (*args, **kwargs)

Basic layers:

Module storing class-defined layers.

class pygrad.basics.AddNorm(gain: float = 1.0, bias: float = 0.0, epsilon: float = 1e-09)

Performs AddNorm on an input x and skip connection value skip. The forward pass performs the following, outputting a Tensor:

y = x + skip mu= mean of y sd= sd of y output = gain * (y-mu)/sd + bias

gain defaults to 1.0. bias defaults to 0.0.

__call__(x: Tensor, skip: Tensor) Tensor

Call self as a function.

__init__(gain: float = 1.0, bias: float = 0.0, epsilon: float = 1e-09)
__weakref__

list of weak references to the object

class pygrad.basics.Conv2D(o_dim: int, i_dim: int, kH: int, kW: int, bias: bool = True, label: None | int | str = 'Conv2D', dtype=<class 'numpy.float64'>)

Performs Conv2D from an input dimension i_dim to an output dimension o_dim using a kernel (kH, kW).

Kernels are initialized using Kaiming Uniform initialization. Only single strides are performed. No output padding is performed.

__call__(x: Tensor) Tensor

Call self as a function.

__init__(o_dim: int, i_dim: int, kH: int, kW: int, bias: bool = True, label: None | int | str = 'Conv2D', dtype=<class 'numpy.float64'>) None

Initialization for the Conv2D class.

Conv2D is a set of kernels, that calls on an input data x: Cx + B.

C is the convolution, B is the bias.

C is of shape (1, o_dim, i_dim, kH, kW), the leading dimension being the batch dimension.

If bias is True:

B is of shape (1, o_dim, 1, 1)

Weights are initialized via Kaiming Uniform Initialization.

Parameters:
  • o_dim (int) – The output channel dimension of the convolution. This indicates the number of kernels to apply to the input.

  • i_dim (int) – The number of channels of the input.

  • kH (int) – the height of the kernel

  • kW (int) – the width of the kernel

  • bias (bool) – Whether or not to include the bias term after performing the initial convolution. Defaults to true.

  • label (None, int, or str. Defaults to "Conv2D") – A label for the layer.

  • dtype (The data types allowable by the Tensor class.) – The data type of the weights and gradients. Defaults to np.float64.

__repr__() str

Return repr(self).

__weakref__

list of weak references to the object

class pygrad.basics.Dropout(rate: float = 0.1)

Dropout Class with specified rate parameter. Randomly masks input values with a probability of rate.

Rate defaults to 0.1.

__call__(x: Tensor, training: bool = True) Tensor

Call self as a function.

__init__(rate: float = 0.1)
__weakref__

list of weak references to the object

class pygrad.basics.Flatten(label: None | int | str = 'Flatten')

Flattens an input by reshaping it into a 1D Tensor.

__call__(x: Tensor) Tensor

Call self as a function.

__init__(label: None | int | str = 'Flatten') None
__repr__() str

Return repr(self).

__weakref__

list of weak references to the object

class pygrad.basics.Linear(i_dim: int, o_dim: int, bias: bool = True, label: None | int | str = 'Linear', dtype=<class 'numpy.float64'>)

Linear 2D layer. Performs Wx + B on an input x.

Inputs and outputs are in 3D: (bs, h, w) Weights are initialized using Kaiming Uniform initialization.

__call__(x: Tensor) Tensor

Call self as a function.

__init__(i_dim: int, o_dim: int, bias: bool = True, label: None | int | str = 'Linear', dtype=<class 'numpy.float64'>) None

Initializes a Dense Linear Layer with Kaiming Uniform initialization.

A Dense linear layer is Wx + B.

W is initialized as a Tensor of shape (1, i_dim, o_dim); with the leading dimension indicating the batch dimension. if bias is True: B is initialized as a Tensor of shape (1, 1, o_dim); the leading dimension indicating the batch dimension.

Parameters:
  • i_dim (int) – The input data dimension to the layer.

  • o_dim (int) – The output data dimension of the layer.

  • bias (bool.) – Whether or not to include the bias term. Defaults to True.

  • label (None, int, str (defaults to "Linear")) – An optional label to give to the layer.

  • dtype (The data types allowable by the Tensor class.) – The data type of the weights and gradients. Defaults to np.float64.

__repr__() str

Return repr(self).

__weakref__

list of weak references to the object

class pygrad.basics.Softmax(label: None | int | str = 'Softmax')

Performs Softmax on an input.

__call__(x: Tensor) Tensor

Call self as a function.

__init__(label: None | int | str = 'Softmax')
__repr__() str

Return repr(self).

__weakref__

list of weak references to the object

Activation functions:

Module storing class-defined activation functions.

class pygrad.activations.ReLU(label: None | int | str = 'ReLU')

Performs ReLU activation, defined as a Class.

__call__(x: Tensor) Tensor

Call self as a function.

__init__(label: None | int | str = 'ReLU')
__repr__() str

Return repr(self).

__weakref__

list of weak references to the object

Losses:

Module storing class-defined loss functions.

class pygrad.losses.BCELoss(label: str = 'BCELoss')

Binary Cross Entropy Loss.

__call__(pred: Tensor, target: Tensor) Tensor

Computes the BCE on pred and target, summing over the batch dimension.

Parameters:
  • pred – A Tensor of shape (batch_size, 1, 1) Values in [0,1]

  • target – A Tensor of shape (batch_size, 1, 1) Values in {0,1}

__init__(label: str = 'BCELoss')
__repr__() str

Return repr(self).

__weakref__

list of weak references to the object

class pygrad.losses.CCELoss(label='CCELoss')

Categorical Cross Entropy Loss.

__call__(pred: Tensor, target: Tensor, mask: bool = False) Tensor

Performs CCE on pred and target, with an optional mask.

Parameters:
  • pred – A Tensor of shape (batch_size, 1, w) with values in [0,1]

  • target – A Tensor of shape (batch_size, 1, w) with values in {0,1}

  • mask – A boolean. If true, the CCE is only computed across values where the target has an output in dimension -1.

__init__(label='CCELoss')
__repr__()

Return repr(self).

__weakref__

list of weak references to the object

Module storing (gradient descent) optimization methods.

class pygrad.optims.Adam(model_parameters: list, beta1: float = 0.9, beta2: float = 0.999, eps=1e-08, lr: float = 1e-05)

Adam Optimizer.

__init__(model_parameters: list, beta1: float = 0.9, beta2: float = 0.999, eps=1e-08, lr: float = 1e-05)

Initializes Adam.

Parameters:
  • model_parameters (list) – This is a list of Tensors specifying the pre-order traversal of a Tensor’s computational graph. This is given either from Tensor.create_graph[1] or for models subclassing Module as model.params

  • beta1 (float. Defaults to 0.9.) – the beta1 parameter to use

  • beta2 (float. Defaults to 0.999.) – the beta2 parameter to use

  • eps (float. Defaults to 1e-8.) – the epsilon to use

  • lr (float. Defaults to 1e-5.) – the learning rate.

__weakref__

list of weak references to the object

step(loss: Tensor)

Performs a single step of Adam on model_parameters according to the loss function’s gradients.

Gradients are both averaged across a batch, with Tensor values modified accordingly.

Parameters:

loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters

step_single(loss, batch_size, modify: bool = False)

Perform gradient descent on a loss, with control over value modification.

This function performing gradient descent on arbitrary batch sizes by allowing for any number of gradient updates before values are updated.

The single step of gradient descent is split into two components.
  1. Model parameter gradients are adjusted according to the average of the loss gradients. This is set when modify=False.

  2. Model parameter values are updated. This is set when modify=True

Parameters:
  • loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters

  • batch_size (int) – The final batch_size to average gradients over.

  • modify (bool, defaults to False.) – Whether or not to modify the model values.

zero_adam()

Resets the momentums and variances stored by Adam for each model parameter.

zero_grad()

Resets the model parameter gradients.

class pygrad.optims.RMSProp(model_parameters: list, beta: float = 0.9, lr: float = 1e-05)

RMS Prop.

__init__(model_parameters: list, beta: float = 0.9, lr: float = 1e-05)

Initializes the RMS Prop.

Parameters:
  • model_parameters (list) – This is a list of Tensors specifying the pre-order traversal of a Tensor’s computational graph. This is given either from Tensor.create_graph[1] or for models subclassing Module as model.params

  • beta (float. Defaults to 0.9.) – the beta parameter to use in RMSProp.

  • lr (float. Defaults to 1e-5.) – the learning rate.

__weakref__

list of weak references to the object

step(loss: Tensor)

Performs a single step of RMSProp on model_parameters according to the loss function’s gradients.

Gradients are both averaged across a batch, with Tensor values modified accordingly.

Parameters:

loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters

step_single(loss, batch_size, modify: bool = False)

This function performing gradient descent on arbitrary batch sizes by allowing for any number of gradient updates before values are updated.

The single step of gradient descent is split into two components.
  1. Model parameter gradients are adjusted according to the average of the loss gradients. This is set when modify=False.

  2. Model parameter values are updated. This is set when modify=True

Parameters:
  • loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters

  • batch_size (int) – The final batch_size to average gradients over.

  • modify (bool, defaults to False.) – Whether or not to modify the model values.

class pygrad.optims.SGD(model_parameters: list, lr: float = 1e-05)

Vanilla Gradient Descent.

__init__(model_parameters: list, lr: float = 1e-05)

Initializes the SGD optimizer.

Parameters:
  • model_parameters (list) – This is a list of Tensors specifying the pre-order traversal of a Tensor’s computational graph. This is given either from Tensor.create_graph[1] or for models subclassing Module as model.params

  • lr (float. Defaults to 1e-5.) – the learning rate for SGD

__weakref__

list of weak references to the object

step(loss: Tensor) None

Performs a single step of gradient descent on model_parameters according to the loss function’s gradients.

Gradients are both averaged across a batch, with Tensor values modified accordingly.

Parameters:

loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters

step_single(loss: Tensor, batch_size, modify: bool = False) None

This function performing gradient descent on arbitrary batch sizes by allowing for any number of gradient updates before values are updated.

The single step of gradient descent is split into two components.
  1. Model parameter gradients are adjusted according to the average of the loss gradients. This is set when modify=False.

  2. Model parameter values are updated. This is set when modify=True

Parameters:
  • loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters

  • batch_size (int) – The final batch_size to average gradients over.

  • modify (bool, defaults to False.) – Whether or not to modify the model values.

zero_grad()

Sets the gradient of each Tensor in model_parameters to 0.

class pygrad.optims.SGD_Momentum(model_parameters: list, beta: float = 0.9, lr: float = 1e-05)

Gradient Descent with Momentum.

__init__(model_parameters: list, beta: float = 0.9, lr: float = 1e-05)

Initializes the SGD with momentum optimizer.

Parameters:
  • model_parameters (list) – This is a list of Tensors specifying the pre-order traversal of a Tensor’s computational graph. This is given either from Tensor.create_graph[1] or for models subclassing Module as model.params

  • beta (float. Defaults to 0.9.) – the beta momentum parameter to use

  • lr (float. Defaults to 1e-5.) – the learning rate for SGD

__weakref__

list of weak references to the object

step(loss: Tensor)

Performs a single step of gradient descent with momentum on model_parameters according to the loss function’s gradients.

Gradients are both averaged across a batch, with Tensor values modified accordingly.

Parameters:

loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters

step_single(loss, batch_size, modify: bool = False)

This function performing gradient descent on arbitrary batch sizes by allowing for any number of gradient updates before values are updated.

The single step of gradient descent is split into two components.
  1. Model parameter gradients are adjusted according to the average of the loss gradients. This is set when modify=False.

  2. Model parameter values are updated. This is set when modify=True

Parameters:
  • loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters

  • batch_size (int) – The final batch_size to average gradients over.

  • modify (bool, defaults to False.) – Whether or not to modify the model values.

Module storing Module.

class pygrad.module.Module(**kwargs)

Module Class.

Allows for performing batched forward and backwards passes on a model without modifying the model directly. The subclassed models must perform any required **kwargs type checking.

__call__(**kwargs: Tensor) Tensor

Returns the forward pass output of the model on a batched input.

Further:
  • Creates a batch-friendly version of the original model to do backprop with.

  • Creates topological and weight graphs of the batched model, storing them in self.model_copy.

abstractmethod __init__(**kwargs)
__weakref__

list of weak references to the object

abstractmethod forward(**kwargs)

Ensure this method is defined in the subclass.

model_reset()

Deletes the batched model.