Class methods¶

Module storing the main Tensor class and related methods.

class pygrad.tensor.Tensor(value: list | ~numpy.ndarray, label: str = '', dtype=<class 'numpy.float64'>, learnable: bool = True, leaf: bool = False, _prev: tuple = ())¶

The main Tensor object.

property T: Tensor¶

Transposes self.value.

If self is 0 or 1 dimensional, no self is returned without modification. Otherwise, the last two dimensions of self are flipped.

Performs self.Tensor + other, returning a new Tensor object.

Parameters:: other ((int, float, np.integer, np.floating, np.ndarray, list, Tensor)) – the object to add with shape broadcastable to self.shape.

__getitem__(idcs)¶: Fetches self.value[idcs]. Identical to NumPy syntax for fetching indices.

__init__(value: list | ~numpy.ndarray, label: str = '', dtype=<class 'numpy.float64'>, learnable: bool = True, leaf: bool = False, _prev: tuple = ()) → None¶

Initializes a Tensor.

A Tensor at all times holds:

A value assigned to it indicating its value.
A gradient of the same shape as its value indicating its gradient.
A function indicating how to pass a gradient to its children Tensors.
A computational graph for all Tensors that eventually resulted in the Tensor.

Tensors store operations performed, allowing them to calculate the gradients of all Tensors in their computational graph via .backward().

Parameters:

value (Is either a numeric value or a list/np.ndarray. Complex or boolean values are not supported. Value is automatically recast to dtype.) – The input Tensor value.
label – A string giving the Tensor an identifiable name. Defaults to “”.
dtype – The dtype to cast the input value. Must be one of np.bool, np.integer, np.floating.
learnable – Optional. A boolean indicating whether or not to compute gradients for _prev Tensors this Tensor has. Setting this to False means the computational graph will stop at this node. This node will still have gradients computed.
leaf – Optional. A boolean indicating if the Tensor is to be considered a leaf node in the computational graph. Leaf nodes will have gradients tracked, but won’t appear as a weight in self.weights.
_prev – Optional. An empty tuple or a tuple of Tensor objects, referencing objects to pass gradients too when doing a backwards pass. _prev is automatically filled when performing a tensor method, manual specification is not necessary.

Returns:

A produced Tensor.

__matmul__(other: ndarray | Tensor) → Tensor¶

Performs matrix multiplication with self and other: self@other.

Matrix multiplication is performed between the last two dimensions of self and other, broadcasting all those remaining.

Parameters:: other – The matrix to perform matrix multiplication against.

Performs multiplication between the values of self and other. If self and other are matrices, this is equivalent to the hadamard product.

Parameters:: other (One of int, float, np.integer, np.floating, np.ndarray, or Tensor.) – The value to multiply against. Must be broadcastable in shape to self.

__neg__() → Tensor¶: Performs -1*self.Tensor, returning a new Tensor object.

__pow__(n: int | float | integer | floating) → Tensor¶

Raises the Tensor to a power of n.

Parameters:: n – The power value to raise the current Tensor

__repr__() → str¶: Return repr(self).

backward(reset_grad=True) → None¶

Computes the gradients of all Tensors in self’s computation graph, storing results in self.topo and self.weights.

self is initialized with gradient 1, incrementing all children gradients by this multiplier.

This method first creates two topological graphs of self.

A backwards-pass graph including all Tensors contributing to self.
The backwards-pass graph, now omitting all Tensors with leaf=True.
This is useful for seeing the exact parameters contributing to the current Tensor, ignoring any Tensors that were produced as intermediary values for producing the current Tensor.

Parameters:: reset_grad – Whether or not to reset the current backwards pass gradients.

conv2D(other: Tensor) → Tensor¶

Applies a 2D convolution on other using self as the kernel. Strides are set to 1 by default, with no padding. Output a new Tensor.

If self.shape = (1, out_channels, in_channels, kH, kW): other.shape = (bs, in_channels, H, W) output.shape = (bs, out_channels, H-kH+1, W-kW+1)

Parameters:: other (Tensor.) – A 4D Tensor.

create_graph() → tuple[list, list]¶

Creates two reverse-ordered topological graphs: topo and weights.

_topo_ is the full backwards pass computational graph, which includes all intermediary Tensors. _weights_ is a subgraph, containing only the Tensors containing learnable weights.

For example, performing y = x**2 + 1 will create the following graphs:

topo, containing: x**2 + 1, x**2, 1, and x.
weights, containing: x

although all nodes in topo were responsible for producing a gradient for x, only the x node contains weights which would need to be updated by this gradient.

Both graphs are lists that perform a pre-order traversal starting at self as the root node.

Returns:: topo[list], weights[list]

log() → Tensor¶

Applies the natural logarithm to self.

Negative values will raise an error.

mask_idcs(mask_idcs: tuple, value: float = 0.0) → Tensor¶

Applies a mask to the Tensor via an array indicating indices of self.value. Outputs a new Tensor.

Parameters:

mask_idcs (tuple) – tuple of indices from which to mask values of self.
value (float) – The mask value. Defaults to 0.0 indicating that chosen indices are now set to 0.

mean(axis: int | tuple | None = -1, keepdims: bool = True) → Tensor¶

Returns a new Tensor with value being the average of self’s value along a given axis.

Parameters:

axis (None, int, tuple of ints) – The axis to perform a mean over.
keepdims (bool) – Whether or not to keep the existing dimensions. True is yes.

new_value(x) → None¶: Assigns a new value to the Tensor, Tensor.value = x, and resets gradients to 0 without changing computational graph topology.

relu() → Tensor¶: Applies a point-wise ReLU to the Tensor values. Outputs a new Tensor.

reset_grad() → None¶: Resets the gradient of the Tensor to 0, maintaining all other attributes.

reshape(shape: tuple) → Tensor¶

Returns a new Tensor with Tensor.value.shape == shape.

Parameters:: shape – A tuple indicating the new shape self.value has to take.

sigmoid() → Tensor¶

Applies sigmoid activation to self, returning a new Tensor.

self.value has to be of shape (…, 1).

softmax() → Tensor¶

Applies softmax to self. Softmax is performed on the last axis.

self.shape has to be either 3 or 4 dimensional.

(B, H, W)
(B, O, H, W)

Returns a copy of the Tensor, with the softmax’d value.

softmax_log() → Tensor¶

Computes .softmax().log() in one go. Use this if the former has numerical issues.

self.shape has to be either 3 or 4 dimensional.

(B, H, W)
(B, O, H, W)

Returns a copy of the Tensor, with the softmax’d value.

std(axis: int, keepdim=True) → Tensor¶

Returns a new Tensor with value being the standard deviation of self along the specified axis. No bias correction performed.

Parameters:

axis (integer) – axis over which to perform a standard deviation over
keepdim (bool (defaults to True)) – whether or not to keep the axis dimension as self

sum(axis: None | int | tuple = None, keepdims: bool = False) → Tensor¶

Performs a summation on self.value according to axis chosen. Returns a new Tensor object.

Parameters:

axis (None (default), int, or tuple of ints.) – Determines which axis to sum self.value over. Defaults to None: summing over all axes.
keepdims – Indicates whether to keep the current shape of self after summation.

tanh() → Tensor¶: Applies a point-wise tanh activation to the Tensor. Outputs a new Tensor.

transpose(axes: None | tuple | list) → Tensor¶

Returns a new Tensor with the same data but transposed axes.

Parameters:: axes – If specified, it must be a tuple or list which contains a permutation of [0,1,…,N-1] where N is the number of axes of self. The ith axis of the returned array will correspond to the axis numbered axes[i] of the input. If not specified, defaults to the reverse of the order of the axes.

pygrad.tensor.array(*args, **kwargs) → Tensor¶

Helper function designed for initializing a Tensor object in the same way as a NumPy array. Ensure inputs match those of Tensor.

Returns:: A Tensor object with fields (*args, **kwargs)

Basic layers:

Module storing class-defined layers.

class pygrad.basics.AddNorm(gain: float = 1.0, bias: float = 0.0, epsilon: float = 1e-09)¶

Performs AddNorm on an input x and skip connection value skip. The forward pass performs the following, outputting a Tensor:

y = x + skip mu= mean of y sd= sd of y output = gain * (y-mu)/sd + bias

gain defaults to 1.0. bias defaults to 0.0.

__call__(x: Tensor, skip: Tensor) → Tensor¶: Call self as a function.

__init__(gain: float = 1.0, bias: float = 0.0, epsilon: float = 1e-09)¶

__weakref__¶: list of weak references to the object

class pygrad.basics.Conv2D(o_dim: int, i_dim: int, kH: int, kW: int, bias: bool = True, label: None | int | str = 'Conv2D', dtype=<class 'numpy.float64'>)¶

Performs Conv2D from an input dimension i_dim to an output dimension o_dim using a kernel (kH, kW).

Kernels are initialized using Kaiming Uniform initialization. Only single strides are performed. No output padding is performed.

__call__(x: Tensor) → Tensor¶: Call self as a function.

__init__(o_dim: int, i_dim: int, kH: int, kW: int, bias: bool = True, label: None | int | str = 'Conv2D', dtype=<class 'numpy.float64'>) → None¶

Initialization for the Conv2D class.

Conv2D is a set of kernels, that calls on an input data x: Cx + B.

C is the convolution, B is the bias.

C is of shape (1, o_dim, i_dim, kH, kW), the leading dimension being the batch dimension.

If bias is True:: B is of shape (1, o_dim, 1, 1)

Weights are initialized via Kaiming Uniform Initialization.

Parameters:

o_dim (int) – The output channel dimension of the convolution. This indicates the number of kernels to apply to the input.
i_dim (int) – The number of channels of the input.
kH (int) – the height of the kernel
kW (int) – the width of the kernel
bias (bool) – Whether or not to include the bias term after performing the initial convolution. Defaults to true.
label (None, int, or str. Defaults to "Conv2D") – A label for the layer.
dtype (The data types allowable by the Tensor class.) – The data type of the weights and gradients. Defaults to np.float64.

__repr__() → str¶: Return repr(self).

__weakref__¶: list of weak references to the object

class pygrad.basics.Dropout(rate: float = 0.1)¶

Dropout Class with specified rate parameter. Randomly masks input values with a probability of rate.

Rate defaults to 0.1.

__call__(x: Tensor, training: bool = True) → Tensor¶: Call self as a function.

__init__(rate: float = 0.1)¶

__weakref__¶: list of weak references to the object

class pygrad.basics.Flatten(label: None | int | str = 'Flatten')¶

Flattens an input by reshaping it into a 1D Tensor.

__call__(x: Tensor) → Tensor¶: Call self as a function.

__init__(label: None | int | str = 'Flatten') → None¶

__repr__() → str¶: Return repr(self).

__weakref__¶: list of weak references to the object

class pygrad.basics.Linear(i_dim: int, o_dim: int, bias: bool = True, label: None | int | str = 'Linear', dtype=<class 'numpy.float64'>)¶

Linear 2D layer. Performs Wx + B on an input x.

Inputs and outputs are in 3D: (bs, h, w) Weights are initialized using Kaiming Uniform initialization.

__call__(x: Tensor) → Tensor¶: Call self as a function.

__init__(i_dim: int, o_dim: int, bias: bool = True, label: None | int | str = 'Linear', dtype=<class 'numpy.float64'>) → None¶

Initializes a Dense Linear Layer with Kaiming Uniform initialization.

A Dense linear layer is Wx + B.

W is initialized as a Tensor of shape (1, i_dim, o_dim); with the leading dimension indicating the batch dimension. if bias is True: B is initialized as a Tensor of shape (1, 1, o_dim); the leading dimension indicating the batch dimension.

Parameters:

i_dim (int) – The input data dimension to the layer.
o_dim (int) – The output data dimension of the layer.
bias (bool.) – Whether or not to include the bias term. Defaults to True.
label (None, int, str (defaults to "Linear")) – An optional label to give to the layer.
dtype (The data types allowable by the Tensor class.) – The data type of the weights and gradients. Defaults to np.float64.

__repr__() → str¶: Return repr(self).

__weakref__¶: list of weak references to the object

class pygrad.basics.Softmax(label: None | int | str = 'Softmax')¶

Performs Softmax on an input.

__call__(x: Tensor) → Tensor¶: Call self as a function.

__init__(label: None | int | str = 'Softmax')¶

__repr__() → str¶: Return repr(self).

__weakref__¶: list of weak references to the object

Activation functions:

Module storing class-defined activation functions.

class pygrad.activations.ReLU(label: None | int | str = 'ReLU')¶

Performs ReLU activation, defined as a Class.

__call__(x: Tensor) → Tensor¶: Call self as a function.

__init__(label: None | int | str = 'ReLU')¶

__repr__() → str¶: Return repr(self).

__weakref__¶: list of weak references to the object

Losses:

Module storing class-defined loss functions.

class pygrad.losses.BCELoss(label: str = 'BCELoss')¶

Binary Cross Entropy Loss.

__call__(pred: Tensor, target: Tensor) → Tensor¶

Computes the BCE on pred and target, summing over the batch dimension.

Parameters:

pred – A Tensor of shape (batch_size, 1, 1) Values in [0,1]
target – A Tensor of shape (batch_size, 1, 1) Values in {0,1}

__init__(label: str = 'BCELoss')¶

__repr__() → str¶: Return repr(self).

__weakref__¶: list of weak references to the object

class pygrad.losses.CCELoss(label='CCELoss')¶

Categorical Cross Entropy Loss.

__call__(pred: Tensor, target: Tensor, mask: bool = False) → Tensor¶

Performs CCE on pred and target, with an optional mask.

Parameters:

pred – A Tensor of shape (batch_size, 1, w) with values in [0,1]
target – A Tensor of shape (batch_size, 1, w) with values in {0,1}
mask – A boolean. If true, the CCE is only computed across values where the target has an output in dimension -1.

__init__(label='CCELoss')¶

__repr__()¶: Return repr(self).

__weakref__¶: list of weak references to the object

Module storing (gradient descent) optimization methods.

class pygrad.optims.Adam(model_parameters: list, beta1: float = 0.9, beta2: float = 0.999, eps=1e-08, lr: float = 1e-05)¶

Adam Optimizer.

__init__(model_parameters: list, beta1: float = 0.9, beta2: float = 0.999, eps=1e-08, lr: float = 1e-05)¶

Initializes Adam.

Parameters:

model_parameters (list) – This is a list of Tensors specifying the pre-order traversal of a Tensor’s computational graph. This is given either from Tensor.create_graph[1] or for models subclassing Module as model.params
beta1 (float. Defaults to 0.9.) – the beta1 parameter to use
beta2 (float. Defaults to 0.999.) – the beta2 parameter to use
eps (float. Defaults to 1e-8.) – the epsilon to use
lr (float. Defaults to 1e-5.) – the learning rate.

__weakref__¶: list of weak references to the object

step(loss: Tensor)¶

Performs a single step of Adam on model_parameters according to the loss function’s gradients.

Gradients are both averaged across a batch, with Tensor values modified accordingly.

Parameters:: loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters

step_single(loss, batch_size, modify: bool = False)¶

Perform gradient descent on a loss, with control over value modification.

This function performing gradient descent on arbitrary batch sizes by allowing for any number of gradient updates before values are updated.

The single step of gradient descent is split into two components.

Model parameter gradients are adjusted according to the average of the loss gradients. This is set when modify=False.
Model parameter values are updated. This is set when modify=True

Parameters:

loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters
batch_size (int) – The final batch_size to average gradients over.
modify (bool, defaults to False.) – Whether or not to modify the model values.

zero_adam()¶: Resets the momentums and variances stored by Adam for each model parameter.

zero_grad()¶: Resets the model parameter gradients.

class pygrad.optims.RMSProp(model_parameters: list, beta: float = 0.9, lr: float = 1e-05)¶

RMS Prop.

__init__(model_parameters: list, beta: float = 0.9, lr: float = 1e-05)¶

Initializes the RMS Prop.

Parameters:

model_parameters (list) – This is a list of Tensors specifying the pre-order traversal of a Tensor’s computational graph. This is given either from Tensor.create_graph[1] or for models subclassing Module as model.params
beta (float. Defaults to 0.9.) – the beta parameter to use in RMSProp.
lr (float. Defaults to 1e-5.) – the learning rate.

__weakref__¶: list of weak references to the object

step(loss: Tensor)¶

Performs a single step of RMSProp on model_parameters according to the loss function’s gradients.

Gradients are both averaged across a batch, with Tensor values modified accordingly.

Parameters:: loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters

step_single(loss, batch_size, modify: bool = False)¶

This function performing gradient descent on arbitrary batch sizes by allowing for any number of gradient updates before values are updated.

The single step of gradient descent is split into two components.

Model parameter gradients are adjusted according to the average of the loss gradients. This is set when modify=False.
Model parameter values are updated. This is set when modify=True

Parameters:

loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters
batch_size (int) – The final batch_size to average gradients over.
modify (bool, defaults to False.) – Whether or not to modify the model values.

class pygrad.optims.SGD(model_parameters: list, lr: float = 1e-05)¶

Vanilla Gradient Descent.

__init__(model_parameters: list, lr: float = 1e-05)¶

Initializes the SGD optimizer.

Parameters:

model_parameters (list) – This is a list of Tensors specifying the pre-order traversal of a Tensor’s computational graph. This is given either from Tensor.create_graph[1] or for models subclassing Module as model.params
lr (float. Defaults to 1e-5.) – the learning rate for SGD

__weakref__¶: list of weak references to the object

step(loss: Tensor) → None¶

Performs a single step of gradient descent on model_parameters according to the loss function’s gradients.

Gradients are both averaged across a batch, with Tensor values modified accordingly.

Parameters:: loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters

step_single(loss: Tensor, batch_size, modify: bool = False) → None¶

This function performing gradient descent on arbitrary batch sizes by allowing for any number of gradient updates before values are updated.

The single step of gradient descent is split into two components.

Model parameter gradients are adjusted according to the average of the loss gradients. This is set when modify=False.
Model parameter values are updated. This is set when modify=True

Parameters:

loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters
batch_size (int) – The final batch_size to average gradients over.
modify (bool, defaults to False.) – Whether or not to modify the model values.

zero_grad()¶: Sets the gradient of each Tensor in model_parameters to 0.

class pygrad.optims.SGD_Momentum(model_parameters: list, beta: float = 0.9, lr: float = 1e-05)¶

Gradient Descent with Momentum.

__init__(model_parameters: list, beta: float = 0.9, lr: float = 1e-05)¶

Initializes the SGD with momentum optimizer.

Parameters:

model_parameters (list) – This is a list of Tensors specifying the pre-order traversal of a Tensor’s computational graph. This is given either from Tensor.create_graph[1] or for models subclassing Module as model.params
beta (float. Defaults to 0.9.) – the beta momentum parameter to use
lr (float. Defaults to 1e-5.) – the learning rate for SGD

__weakref__¶: list of weak references to the object

step(loss: Tensor)¶

Performs a single step of gradient descent with momentum on model_parameters according to the loss function’s gradients.

Gradients are both averaged across a batch, with Tensor values modified accordingly.

Parameters:: loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters

step_single(loss, batch_size, modify: bool = False)¶

This function performing gradient descent on arbitrary batch sizes by allowing for any number of gradient updates before values are updated.

The single step of gradient descent is split into two components.

Model parameter gradients are adjusted according to the average of the loss gradients. This is set when modify=False.
Model parameter values are updated. This is set when modify=True

Parameters:

loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters
batch_size (int) – The final batch_size to average gradients over.
modify (bool, defaults to False.) – Whether or not to modify the model values.

Module storing Module.

class pygrad.module.Module(**kwargs)¶

Module Class.

Allows for performing batched forward and backwards passes on a model without modifying the model directly. The subclassed models must perform any required **kwargs type checking.

__call__(**kwargs: Tensor) → Tensor¶

Returns the forward pass output of the model on a batched input.

Further:

Creates a batch-friendly version of the original model to do backprop with.
Creates topological and weight graphs of the batched model, storing them in self.model_copy.

abstractmethod __init__(**kwargs)¶

__weakref__¶: list of weak references to the object

abstractmethod forward(**kwargs)¶: Ensure this method is defined in the subclass.

model_reset()¶: Deletes the batched model.

pygrad

Navigation

Related Topics

Class methods¶