# Pytorch healthier life

## Create things on GPU

Consider these two lines:

```
torch.zeros(100, device="gpu")
torch.zeros(100).to("cuda")
```

They should effect the same, but first one is faster as it assumes creating GPU tensor directly without copy from CPU, while the second one uses the copy from CPU trick.

## Two different loss functions

If you have two different loss functions, finish the forwards for both of them separately, and then finally you can do `(loss1 + loss2).backward()`

.
It’s a bit more efficient, skips quite some computation.

## Sum the loss

In your code you want to do:

```
loss_sum += loss.item()
```

to make sure you do not keep track of the history of all your losses.

`item()`

will break the graph and thus allow it to be freed from one iteration of the loop to the next.
Also you could use `detach()`

for the same.

## Loss backward and DataParallel

When you do `loss.backward()`

, it is a shortcut for `loss.backward(torch.Tensor([1]))`

.
This in only valid if loss is a tensor containing a single element.

DataParallel returns to you the partial loss that was computed on each GPU, so you usually want to do `loss.backward(torch.Tensor([1, 1]))`

or `loss.sum().backward()`

.
Both will have the exact same behaviour.

## Disable the autograd

If you want to disable the autograd, you should wrap you function in a with `torch.no_grad()`

block.
Based on the PyTorch tutorial, during prediction (after training and evaluation phase), we are supposed to do something like

```
model.eval()
with torch.no_grad():
```

## Vector and matrix multiplication

In PyTorch we use tensors. What if we need to do element vise multiplication?

```
x=torch.rand(2,3,2,2)
y=torch.rand(3)
print(x)
print(y)
y = y.view(1,3,1,1)
out = x*y
print(out)
```

We use `view()`

. Note view is in place operation:

```
x*y.view(1, 3, 1, 1)
```

This is equivalent as:

```
y = y.view(1,3,1,1)
out = x*y
```

## What’s the difference between view() and expand()?

Expand allows you to repeat a tensor along a dimension of size 1. For instance if we have a convolution kernel as a tensor `t = torch.tensor([[1., 2. , 1.], [0., 0., 0.], [-1., -2. , -1.]])`

and we would like to repeat that tensor allong three image channels we would use `expand`

like this: `t.expand(1,3,3,3)`

. What `expand`

will do, it will pretend that the original tensor `t`

is copied three times.

View changes the size of the Tensor without changing the number of elements in it. It actually set the new “view” on the existing data.

## Use model on GPU

You should create your model as usual then call `model.cuda()`

to send all parameters of this model (and other stuff in the model) to the GPU.
Then you need to make sure that your inputs are on the gpu as well:

```
input = input.cuda()
```

Then you can forward on the GPU by doing:

```
model(input)
```

Note that `model.cuda()`

will change the model inplace while `input.cuda()`

will not change input inplace and you need to do `input = input.cuda()`

.

## Use detach()

Replace `output.data`

with `output.detach()`

that is the new way to do it.

## CUDA_LAUNCH_BLOCKING=1 to have clear stack trace

When running on GPU, use `CUDA_LAUNCH_BLOCKING=1`

otherwise the stack trace is going to be wrong.