pygrad documentation

pygrad is a lightweight automatic differentiation engine:

  • written entirely in Python,

  • relying only on NumPy, Numba, and opt_einsum,

  • verified against Pytorch^*,

  • and less than 300kB in size.

Pygrad will be useful to you if you are looking to compute gradients and/or perform gradient descent for models with less than 1 million parameters.

Pygrad’s Tensor object operates like a NumPy array, additionally storing gradients.

Tensors:
  • Store operations performed on them, with support for broadcasting

  • Perform backpropagation with .backward()

  • Store gradients in .grad

  • Support np.float16 to np.float128 data types

A simple example performing gradient descent on a Tensor:

from pygrad.tensor import Tensor

loss_fn = lambda y, yh: (y-yh)**2   # L2 norm
x       = Tensor(1)                 # Tensor
y       = x**2 + 0.25               # model
yh      = 0.5                       # float

for _ in range(1000):
   y    = x**2 + 0.25               # fwd pass
   loss_fn(y,yh).backward()         # populates x.grad
   x.value = x.value - 0.01*x.grad  # gradient descent

x.value, loss_fn(y,yh).value        # 0.5, 0

This documentation includes examples using Tensors to perform gradient descent on the very simplest of functions to training a Vaswani Transformer with Adam.

For installation instructions and a quick glance at usage, see Usage. All classes and functions can be found in API. For in-depth module descriptions, check out Modules.

If you are interesting in contributing, please click here. If you are wondering who I am, click here

Note

*All operations are verified against Pytorch, except for Conv2D gradients when performing strictly more than 1 backwards pass when reset_grad=False.