GPU Acceleration#
Installation and Configuration#
Installing CUDA Toolkit#
The first step in utilizing the computational power of your Nvidia GPU is to install the CUDA toolkit. (If you’ve already configured your GPU for other software, you may skip this step.) To download the installer, visit this link and provide your system info to download the installer. You must be using either Linux or Windows, and you must be using one of these graphics cards. Once the toolkit has been installed, follow the instructions in the installer GUI. Once complete, restart your computer.
Python and PyTorch GPU configuration#
The next step is to check whether your Python and PyTorch installations are correctly configured to use your GPU. After installing MPoL, you can check whether everything installed correctly by opening up a Python interpreter, ard running
import torch
print(torch.cuda.is_available())
This command should return True
. If not, then you may need to use a more specific installation process. Go to the PyTorch Official Site and scroll down
on the page until you see the Install PyTorch section. Input your
specifications for your needs into this area and use the text that is
generated for your install. For example, making of this tutorial on a Windows
10 system with a Nvidia GTX 1080 required specific pip installation,
while another Windows 10 system using a Nvidia GTX 1660Ti worked with the default
pip install torch torchvision
. Your mileage may vary.
Why use the GPU?#
Using a GPU can accelerate computing speeds up to 100x over CPUs, especially for operations on large images, like is common for MPoL. The following is a quick example showing the addition of two large vectors. Your exact timing may vary, but for our hardware this calculation took 320 milliseconds seconds on the CPU, while it only took 3.1 milliseconds on the GPU.
import torch
import time
N = int(9.9e7)
A = torch.ones(N)
B = torch.ones(N)
start = time.time()
C = A + B
print(time.time() - start)
torch.cuda.empty_cache() # emptying the cache on the gpu just incase there was any memory left over from an old operation
A = A.cuda()
B = B.cuda()
start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)
start.record()
C = A + B
end.record()
torch.cuda.synchronize()
print(start.elapsed_time(end))
Using the GPU as part of PyTorch and MPoL#
Here is a short example demonstrating how to initialize an MPoL model and run it on the GPU. First we will set our device to the CUDA device.
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(device)
cuda:0
This if-else statement is used just to ensure that we aren’t trying to
run PyTorch on the GPU if it isn’t available. The rest of this tutorial
will assume that device=cuda:0
.
Note
cuda:0
is technically only required if you have more than one GPU. device='cuda'
will instruct PyTorch to use the default cuda device.
Now that we have our device set, we’ll initialize the MPoL dataset as in previous tutorials. This example uses a multi-channel dataset, but for demonstration purposes we will only use the central
channel (central_chan=4
).
from astropy.utils.data import download_file
import numpy as np
from mpol import gridding, coordinates
fname = download_file(
'https://zenodo.org/record/4498439/files/logo_cube.npz',
cache=True,
)
d = np.load(fname)
coords = coordinates.GridCoords(cell_size=0.03, npix=180)
central_chan = 4
gridder = gridding.Gridder(
coords=coords,
uu=d['uu'][central_chan],
vv=d['vv'][central_chan],
weight=d['weight'][central_chan],
data_re=d['data_re'][central_chan],
data_im=d['data_im'][central_chan],
)
dataset = gridder.to_pytorch_dataset()
Next we’ll create a SimpleNet
module to train to our
data. For more detailed
information, see the Optimization
Loop
tutorial or the MPoL SimpleNet Source
Code.
from mpol.precomposed import SimpleNet
model = SimpleNet(coords=coords, nchan=dataset.nchan)
We are now ready to move our model and data to the GPU using the tensor.to(device)
functionality common to most PyTorch objects. One can
also use the tensor.cuda()
to move the tensor to the default CUDA
device. Both of these methods return a copy of the object on the GPU.
We’ve borrowed a config
dictionary from the Cross Validation
Tutorial, which basically contains a set of parameters that resulted in a strong cross validation score for this particular dataset. For more
details on these variables, see the Cross Validation
Tutorial.
dset = dataset.to(device)
model = model.cuda()
config = {'lr':0.5, 'lambda_sparsity':1e-4, 'lambda_TV':1e-4, 'epochs':600}
optimizer = torch.optim.Adam(model.parameters(), lr=config['lr'])
We are now ready to train our network on the GPU. We will use a for-loop with 600 iterations (epochs) in which we will calculate the loss and step our optimizer.
from mpol import losses
# set the model to training mode
model.train()
for i in range(config['epochs']):
# set the model to zero grad
model.zero_grad()
# forward pass
vis = model()
# get skycube from our forward model
sky_cube = model.icube.sky_cube
# compute loss
loss = (
losses.nll_gridded(vis, dset)
+ config['lambda_sparsity'] * losses.sparsity(sky_cube)
+ config['lambda_TV'] * losses.TV_image(sky_cube))
# perform a backward pass
loss.backward()
# update the weights
optimizer.step()
Congratulations! You have now trained a neural network on your GPU. In general, the process for running on the GPU is designed to be simple. Once your CUDA device has been set-up, the main changes to a CPU-only run are the steps requried moving the data and the model to the GPU for training.