Class methods¶
Module storing the main Tensor class and related methods.
- class pygrad.tensor.Tensor(value: list | ~numpy.ndarray, label: str = '', dtype=<class 'numpy.float64'>, learnable: bool = True, leaf: bool = False, _prev: tuple = ())¶
The main Tensor object.
- property T: Tensor¶
Transposes self.value.
If self is 0 or 1 dimensional, no self is returned without modification. Otherwise, the last two dimensions of self are flipped.
- __add__(other: int | float | integer | floating | ndarray | Tensor) Tensor¶
Performs self.Tensor + other, returning a new Tensor object.
- Parameters:
other ((int, float, np.integer, np.floating, np.ndarray, list, Tensor)) – the object to add with shape broadcastable to self.shape.
- __getitem__(idcs)¶
Fetches self.value[idcs]. Identical to NumPy syntax for fetching indices.
- __init__(value: list | ~numpy.ndarray, label: str = '', dtype=<class 'numpy.float64'>, learnable: bool = True, leaf: bool = False, _prev: tuple = ()) None¶
Initializes a Tensor.
- A Tensor at all times holds:
A value assigned to it indicating its value.
A gradient of the same shape as its value indicating its gradient.
A function indicating how to pass a gradient to its children Tensors.
A computational graph for all Tensors that eventually resulted in the Tensor.
Tensors store operations performed, allowing them to calculate the gradients of all Tensors in their computational graph via .backward().
- Parameters:
value (Is either a numeric value or a list/np.ndarray. Complex or boolean values are not supported. Value is automatically recast to dtype.) – The input Tensor value.
label – A string giving the Tensor an identifiable name. Defaults to “”.
dtype – The dtype to cast the input value. Must be one of np.bool, np.integer, np.floating.
learnable – Optional. A boolean indicating whether or not to compute gradients for _prev Tensors this Tensor has. Setting this to False means the computational graph will stop at this node. This node will still have gradients computed.
leaf – Optional. A boolean indicating if the Tensor is to be considered a leaf node in the computational graph. Leaf nodes will have gradients tracked, but won’t appear as a weight in self.weights.
_prev – Optional. An empty tuple or a tuple of Tensor objects, referencing objects to pass gradients too when doing a backwards pass. _prev is automatically filled when performing a tensor method, manual specification is not necessary.
- Returns:
A produced Tensor.
- __matmul__(other: ndarray | Tensor) Tensor¶
Performs matrix multiplication with self and other: self@other.
Matrix multiplication is performed between the last two dimensions of self and other, broadcasting all those remaining.
- Parameters:
other – The matrix to perform matrix multiplication against.
- __mul__(other: int | float | integer | floating | ndarray | Tensor) Tensor¶
Performs multiplication between the values of self and other. If self and other are matrices, this is equivalent to the hadamard product.
- Parameters:
other (One of int, float, np.integer, np.floating, np.ndarray, or Tensor.) – The value to multiply against. Must be broadcastable in shape to self.
- __pow__(n: int | float | integer | floating) Tensor¶
Raises the Tensor to a power of n.
- Parameters:
n – The power value to raise the current Tensor
- __repr__() str¶
Return repr(self).
- backward(reset_grad=True) None¶
Computes the gradients of all Tensors in self’s computation graph, storing results in self.topo and self.weights.
self is initialized with gradient 1, incrementing all children gradients by this multiplier.
- This method first creates two topological graphs of self.
A backwards-pass graph including all Tensors contributing to self.
- The backwards-pass graph, now omitting all Tensors with leaf=True.
This is useful for seeing the exact parameters contributing to the current Tensor, ignoring any Tensors that were produced as intermediary values for producing the current Tensor.
- Parameters:
reset_grad – Whether or not to reset the current backwards pass gradients.
- conv2D(other: Tensor) Tensor¶
Applies a 2D convolution on other using self as the kernel. Strides are set to 1 by default, with no padding. Output a new Tensor.
- If self.shape = (1, out_channels, in_channels, kH, kW)
other.shape = (bs, in_channels, H, W) output.shape = (bs, out_channels, H-kH+1, W-kW+1)
- Parameters:
other (Tensor.) – A 4D Tensor.
- create_graph() tuple[list, list]¶
Creates two reverse-ordered topological graphs: topo and weights.
_topo_ is the full backwards pass computational graph, which includes all intermediary Tensors. _weights_ is a subgraph, containing only the Tensors containing learnable weights.
For example, performing y = x**2 + 1 will create the following graphs:
topo, containing: x**2 + 1, x**2, 1, and x.
weights, containing: x
although all nodes in topo were responsible for producing a gradient for x, only the x node contains weights which would need to be updated by this gradient.
Both graphs are lists that perform a pre-order traversal starting at self as the root node.
- Returns:
topo[list], weights[list]
- mask_idcs(mask_idcs: tuple, value: float = 0.0) Tensor¶
Applies a mask to the Tensor via an array indicating indices of self.value. Outputs a new Tensor.
- Parameters:
mask_idcs (tuple) – tuple of indices from which to mask values of self.
value (float) – The mask value. Defaults to 0.0 indicating that chosen indices are now set to 0.
- mean(axis: int | tuple | None = -1, keepdims: bool = True) Tensor¶
Returns a new Tensor with value being the average of self’s value along a given axis.
- Parameters:
axis (None, int, tuple of ints) – The axis to perform a mean over.
keepdims (bool) – Whether or not to keep the existing dimensions. True is yes.
- new_value(x) None¶
Assigns a new value to the Tensor, Tensor.value = x, and resets gradients to 0 without changing computational graph topology.
- reset_grad() None¶
Resets the gradient of the Tensor to 0, maintaining all other attributes.
- reshape(shape: tuple) Tensor¶
Returns a new Tensor with Tensor.value.shape == shape.
- Parameters:
shape – A tuple indicating the new shape self.value has to take.
- sigmoid() Tensor¶
Applies sigmoid activation to self, returning a new Tensor.
self.value has to be of shape (…, 1).
- softmax() Tensor¶
Applies softmax to self. Softmax is performed on the last axis.
- self.shape has to be either 3 or 4 dimensional.
(B, H, W)
(B, O, H, W)
Returns a copy of the Tensor, with the softmax’d value.
- softmax_log() Tensor¶
Computes .softmax().log() in one go. Use this if the former has numerical issues.
- self.shape has to be either 3 or 4 dimensional.
(B, H, W)
(B, O, H, W)
Returns a copy of the Tensor, with the softmax’d value.
- std(axis: int, keepdim=True) Tensor¶
Returns a new Tensor with value being the standard deviation of self along the specified axis. No bias correction performed.
- Parameters:
axis (integer) – axis over which to perform a standard deviation over
keepdim (bool (defaults to True)) – whether or not to keep the axis dimension as self
- sum(axis: None | int | tuple = None, keepdims: bool = False) Tensor¶
Performs a summation on self.value according to axis chosen. Returns a new Tensor object.
- Parameters:
axis (None (default), int, or tuple of ints.) – Determines which axis to sum self.value over. Defaults to None: summing over all axes.
keepdims – Indicates whether to keep the current shape of self after summation.
- transpose(axes: None | tuple | list) Tensor¶
Returns a new Tensor with the same data but transposed axes.
- Parameters:
axes – If specified, it must be a tuple or list which contains a permutation of [0,1,…,N-1] where N is the number of axes of self. The ith axis of the returned array will correspond to the axis numbered axes[i] of the input. If not specified, defaults to the reverse of the order of the axes.
- pygrad.tensor.array(*args, **kwargs) Tensor¶
Helper function designed for initializing a Tensor object in the same way as a NumPy array. Ensure inputs match those of Tensor.
Basic layers:
Module storing class-defined layers.
- class pygrad.basics.AddNorm(gain: float = 1.0, bias: float = 0.0, epsilon: float = 1e-09)¶
Performs AddNorm on an input x and skip connection value skip. The forward pass performs the following, outputting a Tensor:
y = x + skip mu= mean of y sd= sd of y output = gain * (y-mu)/sd + bias
gain defaults to 1.0. bias defaults to 0.0.
- __init__(gain: float = 1.0, bias: float = 0.0, epsilon: float = 1e-09)¶
- __weakref__¶
list of weak references to the object
- class pygrad.basics.Conv2D(o_dim: int, i_dim: int, kH: int, kW: int, bias: bool = True, label: None | int | str = 'Conv2D', dtype=<class 'numpy.float64'>)¶
Performs Conv2D from an input dimension i_dim to an output dimension o_dim using a kernel (kH, kW).
Kernels are initialized using Kaiming Uniform initialization. Only single strides are performed. No output padding is performed.
- __init__(o_dim: int, i_dim: int, kH: int, kW: int, bias: bool = True, label: None | int | str = 'Conv2D', dtype=<class 'numpy.float64'>) None¶
Initialization for the Conv2D class.
Conv2D is a set of kernels, that calls on an input data x: Cx + B.
C is the convolution, B is the bias.
C is of shape (1, o_dim, i_dim, kH, kW), the leading dimension being the batch dimension.
- If bias is True:
B is of shape (1, o_dim, 1, 1)
Weights are initialized via Kaiming Uniform Initialization.
- Parameters:
o_dim (int) – The output channel dimension of the convolution. This indicates the number of kernels to apply to the input.
i_dim (int) – The number of channels of the input.
kH (int) – the height of the kernel
kW (int) – the width of the kernel
bias (bool) – Whether or not to include the bias term after performing the initial convolution. Defaults to true.
label (None, int, or str. Defaults to "Conv2D") – A label for the layer.
dtype (The data types allowable by the Tensor class.) – The data type of the weights and gradients. Defaults to np.float64.
- __repr__() str¶
Return repr(self).
- __weakref__¶
list of weak references to the object
- class pygrad.basics.Dropout(rate: float = 0.1)¶
Dropout Class with specified rate parameter. Randomly masks input values with a probability of rate.
Rate defaults to 0.1.
- __init__(rate: float = 0.1)¶
- __weakref__¶
list of weak references to the object
- class pygrad.basics.Flatten(label: None | int | str = 'Flatten')¶
Flattens an input by reshaping it into a 1D Tensor.
- __init__(label: None | int | str = 'Flatten') None¶
- __repr__() str¶
Return repr(self).
- __weakref__¶
list of weak references to the object
- class pygrad.basics.Linear(i_dim: int, o_dim: int, bias: bool = True, label: None | int | str = 'Linear', dtype=<class 'numpy.float64'>)¶
Linear 2D layer. Performs Wx + B on an input x.
Inputs and outputs are in 3D: (bs, h, w) Weights are initialized using Kaiming Uniform initialization.
- __init__(i_dim: int, o_dim: int, bias: bool = True, label: None | int | str = 'Linear', dtype=<class 'numpy.float64'>) None¶
Initializes a Dense Linear Layer with Kaiming Uniform initialization.
A Dense linear layer is Wx + B.
W is initialized as a Tensor of shape (1, i_dim, o_dim); with the leading dimension indicating the batch dimension. if bias is True: B is initialized as a Tensor of shape (1, 1, o_dim); the leading dimension indicating the batch dimension.
- Parameters:
i_dim (int) – The input data dimension to the layer.
o_dim (int) – The output data dimension of the layer.
bias (bool.) – Whether or not to include the bias term. Defaults to True.
label (None, int, str (defaults to "Linear")) – An optional label to give to the layer.
dtype (The data types allowable by the Tensor class.) – The data type of the weights and gradients. Defaults to np.float64.
- __repr__() str¶
Return repr(self).
- __weakref__¶
list of weak references to the object
- class pygrad.basics.Softmax(label: None | int | str = 'Softmax')¶
Performs Softmax on an input.
- __init__(label: None | int | str = 'Softmax')¶
- __repr__() str¶
Return repr(self).
- __weakref__¶
list of weak references to the object
Activation functions:
Module storing class-defined activation functions.
- class pygrad.activations.ReLU(label: None | int | str = 'ReLU')¶
Performs ReLU activation, defined as a Class.
- __init__(label: None | int | str = 'ReLU')¶
- __repr__() str¶
Return repr(self).
- __weakref__¶
list of weak references to the object
Losses:
Module storing class-defined loss functions.
- class pygrad.losses.BCELoss(label: str = 'BCELoss')¶
Binary Cross Entropy Loss.
- __call__(pred: Tensor, target: Tensor) Tensor¶
Computes the BCE on pred and target, summing over the batch dimension.
- Parameters:
pred – A Tensor of shape (batch_size, 1, 1) Values in [0,1]
target – A Tensor of shape (batch_size, 1, 1) Values in {0,1}
- __init__(label: str = 'BCELoss')¶
- __repr__() str¶
Return repr(self).
- __weakref__¶
list of weak references to the object
- class pygrad.losses.CCELoss(label='CCELoss')¶
Categorical Cross Entropy Loss.
- __call__(pred: Tensor, target: Tensor, mask: bool = False) Tensor¶
Performs CCE on pred and target, with an optional mask.
- Parameters:
pred – A Tensor of shape (batch_size, 1, w) with values in [0,1]
target – A Tensor of shape (batch_size, 1, w) with values in {0,1}
mask – A boolean. If true, the CCE is only computed across values where the target has an output in dimension -1.
- __init__(label='CCELoss')¶
- __repr__()¶
Return repr(self).
- __weakref__¶
list of weak references to the object
Module storing (gradient descent) optimization methods.
- class pygrad.optims.Adam(model_parameters: list, beta1: float = 0.9, beta2: float = 0.999, eps=1e-08, lr: float = 1e-05)¶
Adam Optimizer.
- __init__(model_parameters: list, beta1: float = 0.9, beta2: float = 0.999, eps=1e-08, lr: float = 1e-05)¶
Initializes Adam.
- Parameters:
model_parameters (list) – This is a list of Tensors specifying the pre-order traversal of a Tensor’s computational graph. This is given either from Tensor.create_graph[1] or for models subclassing Module as model.params
beta1 (float. Defaults to 0.9.) – the beta1 parameter to use
beta2 (float. Defaults to 0.999.) – the beta2 parameter to use
eps (float. Defaults to 1e-8.) – the epsilon to use
lr (float. Defaults to 1e-5.) – the learning rate.
- __weakref__¶
list of weak references to the object
- step(loss: Tensor)¶
Performs a single step of Adam on model_parameters according to the loss function’s gradients.
Gradients are both averaged across a batch, with Tensor values modified accordingly.
- Parameters:
loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters
- step_single(loss, batch_size, modify: bool = False)¶
Perform gradient descent on a loss, with control over value modification.
This function performing gradient descent on arbitrary batch sizes by allowing for any number of gradient updates before values are updated.
- The single step of gradient descent is split into two components.
Model parameter gradients are adjusted according to the average of the loss gradients. This is set when modify=False.
Model parameter values are updated. This is set when modify=True
- Parameters:
loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters
batch_size (int) – The final batch_size to average gradients over.
modify (bool, defaults to False.) – Whether or not to modify the model values.
- zero_adam()¶
Resets the momentums and variances stored by Adam for each model parameter.
- zero_grad()¶
Resets the model parameter gradients.
- class pygrad.optims.RMSProp(model_parameters: list, beta: float = 0.9, lr: float = 1e-05)¶
RMS Prop.
- __init__(model_parameters: list, beta: float = 0.9, lr: float = 1e-05)¶
Initializes the RMS Prop.
- Parameters:
model_parameters (list) – This is a list of Tensors specifying the pre-order traversal of a Tensor’s computational graph. This is given either from Tensor.create_graph[1] or for models subclassing Module as model.params
beta (float. Defaults to 0.9.) – the beta parameter to use in RMSProp.
lr (float. Defaults to 1e-5.) – the learning rate.
- __weakref__¶
list of weak references to the object
- step(loss: Tensor)¶
Performs a single step of RMSProp on model_parameters according to the loss function’s gradients.
Gradients are both averaged across a batch, with Tensor values modified accordingly.
- Parameters:
loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters
- step_single(loss, batch_size, modify: bool = False)¶
This function performing gradient descent on arbitrary batch sizes by allowing for any number of gradient updates before values are updated.
- The single step of gradient descent is split into two components.
Model parameter gradients are adjusted according to the average of the loss gradients. This is set when modify=False.
Model parameter values are updated. This is set when modify=True
- Parameters:
loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters
batch_size (int) – The final batch_size to average gradients over.
modify (bool, defaults to False.) – Whether or not to modify the model values.
- class pygrad.optims.SGD(model_parameters: list, lr: float = 1e-05)¶
Vanilla Gradient Descent.
- __init__(model_parameters: list, lr: float = 1e-05)¶
Initializes the SGD optimizer.
- Parameters:
model_parameters (list) – This is a list of Tensors specifying the pre-order traversal of a Tensor’s computational graph. This is given either from Tensor.create_graph[1] or for models subclassing Module as model.params
lr (float. Defaults to 1e-5.) – the learning rate for SGD
- __weakref__¶
list of weak references to the object
- step(loss: Tensor) None¶
Performs a single step of gradient descent on model_parameters according to the loss function’s gradients.
Gradients are both averaged across a batch, with Tensor values modified accordingly.
- Parameters:
loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters
- step_single(loss: Tensor, batch_size, modify: bool = False) None¶
This function performing gradient descent on arbitrary batch sizes by allowing for any number of gradient updates before values are updated.
- The single step of gradient descent is split into two components.
Model parameter gradients are adjusted according to the average of the loss gradients. This is set when modify=False.
Model parameter values are updated. This is set when modify=True
- Parameters:
loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters
batch_size (int) – The final batch_size to average gradients over.
modify (bool, defaults to False.) – Whether or not to modify the model values.
- zero_grad()¶
Sets the gradient of each Tensor in model_parameters to 0.
- class pygrad.optims.SGD_Momentum(model_parameters: list, beta: float = 0.9, lr: float = 1e-05)¶
Gradient Descent with Momentum.
- __init__(model_parameters: list, beta: float = 0.9, lr: float = 1e-05)¶
Initializes the SGD with momentum optimizer.
- Parameters:
model_parameters (list) – This is a list of Tensors specifying the pre-order traversal of a Tensor’s computational graph. This is given either from Tensor.create_graph[1] or for models subclassing Module as model.params
beta (float. Defaults to 0.9.) – the beta momentum parameter to use
lr (float. Defaults to 1e-5.) – the learning rate for SGD
- __weakref__¶
list of weak references to the object
- step(loss: Tensor)¶
Performs a single step of gradient descent with momentum on model_parameters according to the loss function’s gradients.
Gradients are both averaged across a batch, with Tensor values modified accordingly.
- Parameters:
loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters
- step_single(loss, batch_size, modify: bool = False)¶
This function performing gradient descent on arbitrary batch sizes by allowing for any number of gradient updates before values are updated.
- The single step of gradient descent is split into two components.
Model parameter gradients are adjusted according to the average of the loss gradients. This is set when modify=False.
Model parameter values are updated. This is set when modify=True
- Parameters:
loss (Tensor) – A Tensor specifying a loss function. This loss needs to have taken the output from the same model which provided self.model_parameters
batch_size (int) – The final batch_size to average gradients over.
modify (bool, defaults to False.) – Whether or not to modify the model values.
Module storing Module.
- class pygrad.module.Module(**kwargs)¶
Module Class.
Allows for performing batched forward and backwards passes on a model without modifying the model directly. The subclassed models must perform any required **kwargs type checking.
- __call__(**kwargs: Tensor) Tensor¶
Returns the forward pass output of the model on a batched input.
- Further:
Creates a batch-friendly version of the original model to do backprop with.
Creates topological and weight graphs of the batched model, storing them in self.model_copy.
- abstractmethod __init__(**kwargs)¶
- __weakref__¶
list of weak references to the object
- abstractmethod forward(**kwargs)¶
Ensure this method is defined in the subclass.
- model_reset()¶
Deletes the batched model.