PyTorch is based on Torch. Torch is a Tensor library like Numpy. Unlike Numpy, Torch has strong GPU support.

You can use Torch either using the Lua programming language or if you like Python you can use PyTorch.

You can use PyTorch together with all major Python packages likes scipy, numpy, matplotlib and Cython and benefit from with PyTorch’s autograd system.

We will check some major PyTorch features in here and provide some feedback on PyTorch tensors, algebra and graphs.

Main PyTorch features

  • Eager execution
  • C++ support
  • Native ONNX Support (open nn exchange)
  • Supported on major cloud platforms
  • Supports all major network model architectures

Data types

We see in here the list of data types currently supported.

Data type Tensor
32-bit floating point torch.FloatTensor
64-bit floating point torch.DoubleTensor
16-bit floating point torch.HalfTensor
8-bit integer (unsigned) torch.ByteTensor
8-bit integer (signed) torch.CharTensor
16-bit integer (signed) torch.ShortTensor
32-bit integer (signed) torch.IntTensor
64-bit integer (signed) torch.LongTensor

What do you do first in PyTorch?

In order to use it you first import the torch library.

import torch

Probable the next thing would be to convert your numpy arryas to PyTorch tensors

pta = torch.from_numpy(a)

The above line would also convert the original data type (DType) from numpy array. For example, float64 will become torch.DoubleTensor

We can also convert PyTorch tensor to numpy array:

a = pta.numpy()

We can override default behaviour if we cast it:

pta = torch.FloatTensor(a)

PyTorch tensor

Has three atributes:

  • torch.dtype
  • torch.device
  • torch.layout

We already listed all dtype attrubutes in a section before. For the device we can have either “cpu” or “cuda”. Cuda is a software layer over hardware GPU unit. Currently it works only on Nvidia hardware. This “cuda” thing is what makes speed improvements.

A torch.layout is an object that represents the memory layout of a torch.Tensor. torch.strided means dense tensors. Experimental support for torch.sparse_coo exists.

You typically use the following ways to create the tensor in PyTorch:

  • torch.Tensor(data)
  • torch.tensor(data)
  • torch.as_tensor(data)
  • torch.from_numpy(data)

The privilege working with GPU

We can use both CPU and GPU with PyTorch. This would be how to move our data from CPU to GPU:

data = data.cuda()

Reshape the tensor

a = torch.range(1, 12)
a = a.view(3, 4)        # reshapes in 3 columns x 4 rows

Note you can use PyTorch reshape() method also but PyTorch uses view() method would return you the copy if possible.

As we know, if we spec. the dimension -1 this will “vectorize” the tensor.


Basic Algebra in PyTorch

Here we define tensors a and b and do basic algebra operations:

a = torch.rand(4,4)
b = torch.rand(4)

mm =       # matrix multiplication
mv =       # matrix-vector multiplication
t= a.t()           # matrix transpose


The output will be like this:

tensor([[0.9216, 0.0812, 0.3326, 0.1223],
        [0.4754, 0.1876, 0.8705, 0.0348],
        [0.8547, 0.2323, 0.1879, 0.7196],
        [0.0683, 0.3799, 0.8813, 0.9155]])
tensor([0.3089, 0.8028, 0.2625, 0.4179])
tensor([[1.1806, 0.2138, 0.5475, 0.4668],
        [1.2737, 0.2893, 0.5156, 0.7229],
        [1.1079, 0.4301, 1.1560, 0.9066],
        [1.0594, 0.6294, 1.3258, 1.4938]])
tensor([0.4883, 0.5404, 0.8006, 0.9401])
tensor([[0.9216, 0.4754, 0.8547, 0.0683],
        [0.0812, 0.1876, 0.2323, 0.3799],
        [0.3326, 0.8705, 0.1879, 0.8813],
        [0.1223, 0.0348, 0.7196, 0.9155]])

Also, we have Hadamard product with * and the dot product

a = torch.Tensor([[1,2],[3,4]])
b = torch.Tensor([5,6])
c = torch.Tensor([7,8])

print(a*b) # element-wise matrix multiplication (Hadamard product)
print(,c)) # not like in numpy, works just on 1D Tensors


tensor([[ 5., 12.],
        [15., 24.]])
tensor([[ 1.,  4.],
        [ 9., 16.]])

The seed

We can set the torch seed using this line:


We can set np.random.seed(seed) for the numpy library. This will help us set the deterministic results. Also note that Python programs need to set the hash seed in order to work deterministic.

The graph, the Variable, and the Function

The invasion of PyTorch tensors is becoming evident, but less evident is the idea of PyTorch variable and function. The interesting thing, if you try to print the variable you will get the output like you printed the Tensors. Let’s check this example:

import torch
from torch.autograd import Variable

def pr(obj):
  print("type:", type(obj))
a = torch.Tensor([[1,2],[3,4]])
a2= a+2
b = Variable(torch.Tensor([[1,2],[3,4]]), requires_grad=True)
b2 = b+2

The output:

type: <class 'NoneType'>
type: <class 'NoneType'>
type: <class 'NoneType'>
<AddBackward0 object at 0x7f857dc19f60>
type: <class 'AddBackward0'>

b2 is a variable that has the function under .grad_fn that has created the variable. The variable b we created doesn’t have the function that has created it, since we created it.

The complete history of computation is saved this way in the interconnected acyclic graph.

As we said the autograd package provides automatic differentiation for all operations in the graph so let’s check that next.


What is PyTorch autograd?

The autograd package provides automatic differentiation for all operations on Tensors. This is needed for the backpropagation algorithm to work.

The import torch.autograd package is the heart of PyTorch. It contains classes and functions implementing automatic differentiation on a computation graph. Computational graph is what you get out-of-the-box in PyTorch once you set your computations.

Important things about the graph:

  • when computing the forwards pass, autograd builds up a graph
  • graph holds functions and variables (tensors)
  • graph encodes a complete history of computation
  • after the backward pass the graph will be freed to save the memory
  • graph is recreated from scratch at every iteration (on forward pass)
  • graph is needed to compute the gradients

Consider this small program:

# Create tensors
x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

# Build the graph
y = w * x + b    # y = 2 * x + 3

# Compute gradients

# Print out the gradients.
print(x.grad)    # x.grad = 2 
print(w.grad)    # w.grad = 1 
print(b.grad)    # b.grad = 1 

For the previous program:

  • once we wrote the equation PyTorch creates computational graph on fly…
  • every Tensor with a flag: requires_grad=Fase will freeze backward computation
  • every Tensor with a flag: requires_grad=True will require gradient computation
  • when the forwards pass is done, we evaluate this graph in the backwards pass to compute the gradients.
  • y.backward() will do the backward pass
  • we used .grad attribute to show the tensor gradient

PyTorch computes backward gradients using a computational graph which keeps track of what operations have been done during your forward pass.

In each iteration we:

  • execute the forward pass,
  • compute the derivatives of output with respect to the parameters of the network
  • update the parameters to fit the given examples

After doing the backward pass, the graph will be freed to save memory. In the next iteration, a fresh new graph is created and ready for back-propagation.

The following program:

import torch
from torch.autograd import Variable
a = Variable(torch.rand(1, 4), requires_grad=True)
b = a**2
c = b*2
d = c.mean()
e = c.sum()
d.backward(retain_graph=True) # fine
e.backward(retain_graph=True) # fine
d.backward() # also fine
e.backward() # error will occur!
  • creates a variable (tensor)
  • creates new tensors from operations
  • calls backward on d and e in turns
  • we compute derivatives once we call .backward() on a Tensor
  • when ever we call backward(retain_graph=True) the graph will still be in memory
  • once we called d.backward() on leaf (graph element) graph will be removed
  • last call to e.backward() will show the error that the graph has been removed from memory

How to show the graphs?

I found a package torchviz (pip install torchviz) that prints the graphs. Our previous example will then become:

import torch
import torchviz
from torchviz import make_dot, make_dot_from_trace

# Create tensors.
x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

# Build a computational graph.
y = w * x + b    # y = 2 * x + 3

# Compute gradients.

# Print out the gradients.
print(x.grad)    # x.grad = 2 
print(w.grad)    # w.grad = 1 
print(b.grad)    # b.grad = 1 

make_dot(y, {'x': x, 'w':w, 'b': b})

Note how we set the names for the leaf graph elements inside the second make_dot parameter. The output will be like this:


One another way to create the graphs is torch.jit.get_trace_graph.