Autograd#
An autograd engine is a tool that automatically compute gradients (i.e., the vector of partial derivatives of the function with respect to its inputs) of a target function. This is important because gradient computations are at the core of optimization algorithms, which are used to adjust the parameters of a machine learning model to minimize the difference between the predicted and actual values.
An autograd engine can work with scalar-valued functions like micrograd and with more complex data structures. Conceptually it can be visualized like a computational graph, like the one below:
In the context of machine learning, the target function is the loss function, which is normally the final function applied to the computational graph, and the parameters are also called weights or data. However, these parameters must be contained in some data structure that is convenient and optimal, which is why tensors are used.
Therefore, a decent autograd tensor library is almost all that is needed to build
neural networks and start doing deep learning. In giagrad the autograd engine is
built on top of giagrad.Tensor
and giagrad.tensor.Function
classes, which are the base of giagrad. In reality, giagrad could work only with
giagrad.Tensor
, but to mantain modularity and keeping the code more
readable, giagrad.tensor.Function
exists.
giagrad.Tensor
is the data structure, which is based on a numpy.array
for simplicity, and the computational graph is done with
giagrad.tensor.Function
. To add new functionalities to
giagrad.Tensor
, such as a new activation function, one only needs to
create that activation function class and add a new method to
giagrad.Tensor
, see giagrad.Tensor.comm()
.
giagrad.mlops.ReLU
source code should give an idea:
class ReLU(Function):
def __init__(self):
super().__init__()
def forward(self, t1) -> NDArray:
self.save_for_backward(t1)
return np.maximum(t1.data, 0)
def backward(self, partial: NDArray):
p = self.parents[0]
if p.requires_grad:
p.grad += partial * (p.data > 0).astype(int)