A simpler perspective of how to work with PyTorch can be explained by a simple example.
It's like a Christmas baby (PyTorch) that opens a multi-packed gift until it gets the final product - the desired gift.
The opening operations of the package involve smart moves called: forward and backward passes.
The child's feedback can be called: loss and backpropagate.
In this case, the child will try to remove from his package until he is satisfied and will not be lost (loss and backpropagate functions).
To compute the backward pass for a gradient and every time we backpropagate the gradient from a variable, the gradient is accumulative instead of being reset and replaced (most of networks designs call backward multiple times).
PyTorch comes with many loss functions.
Most of examples code create a mean square error loss function and later backpropagate the gradients based on the loss.
Will you ask me if the gift is shaped? I can tell you that the gift can contain from Verge à Saint-Nicolas (unidimensional) to complex (multidimensional) structures - the most simplistic and worn out is the square one (two-dimensional matrix).
This gift is packed with magic in mathematical functions which allows the child to understand what is in the gift.
But the child is more special. He recognizes forms (matrices, shapes, simple formulas) and this allows him to open parts of the gift.
El poate roti acesta parti din cadou (mm).
The mm is a matrix multiplication.
He can see the corners he can get from the gift.
ReLU stands for "rectified linear unit" and is a type of activation function.
Mathematically, it is defined as y = max(0, x).
He can see which parts of the gift are bigger or smaller so he can understand the gift.
This clamp function clamps all elements in input into the returns[ min, max ] and returns a resulting tensor:
The clamp should only affect gradients for values outside the min and max range.
The pow function power with the exponent.
The clone returns a copy of the self tensor. The copy has the same size and data type as self.
A common example is: clamp(min=0) is exactly ReLU().
PyTorch provides ReLU and its variants through the torch.nn module.
If you run the program to look at the output, you will understand that the child has only five operations left and is already pleased with the way the gift result.
The source code is based on one example from here:
import torch
dtype = torch.float
device = torch.device("cpu")
batch,input,hidden,output = 2,10,2,5
x = torch.randn(batch,input,device=device,dtype=dtype)
y = torch.randn(hidden,output,device=device,dtype=dtype)
w1 = torch.randn(input,hidden,device=device,dtype=dtype)
w2 = torch.randn(hidden,output,device=device,dtype=dtype)
l_r = 1e-6
for t in range(5):
h = x.mm(w1)
h_r = h.clamp(min=0)
y_p = h_r.mm(w2)
loss = (y_p - y).pow(2).sum().item()
print("t=",t,"loss=",loss,"\n")
g_y_p = 2.0 * (y_p -y)
g_w2 = h_r.t().mm(g_y_p)
g_h_r = g_y_p.mm(w2.t())
g_h = g_h_r.clone()
g_h[h<0 -="l_r" 0="" g_w1="" g_w2="" n="" print="" w1=",w1," w2=",w2,">0>
The child's result after five operations....
t= 4 loss= 25.40263557434082
w1= tensor([[ 1.5933, 0.3818],
[-1.0043, -1.3362],
[ 0.5841, -1.9811],
[ 2.3483, 0.5748],
[ 0.5904, -0.2521],
[-0.6612, 2.7945],
[ 0.4841, -0.5894],
[-1.4434, -0.1421],
[-1.2712, -1.4269],
[ 0.7929, 0.2040]]) w2= tensor([[ 1.7389, 0.4337, 0.4557, 1.3704, 0
.3819],
[ 0.2937, 0.0212, -0.4604, -1.0564, -1.5403]])