Connecting to ARC machines

Cascades

The ARC cluster that will be used for this class is ‘Cascades’. Detailed instructions on how to access this machine can be found here. A quick overview of how to login and submit jobs is given below.

To login:

ssh username@cascades1.arc.vt.edu

where username is your PID and your password is the VT PID password followed by a comma and the two-factor six-digit code. For e.g. the password looks like this:

VTPASSWORD,two-factor-six-digit-code

This will take you to the login node. Copy the Python environment files for Pytorch to your environment folder. To do this:

$ module load Anaconda/5.1.0
$ cp -r /groups/srijithr_shared/pytorch_env ~/.conda/envs/
$ conda env list
This method of loading environments is for instructional use only. For regular work, set up the desired environment from an environment file (environment.yml)

The login node is not to be used for computation, so please DO NOT run jobs there. In order to run a job/program, you request a compute node using the following command which will give you an interactive session for one hour where you can run commands:

interact -l walltime=01:00:00

Use the above command to get an interactive session however if you have an allocation that allows you access to GPU resources, the general command to get a single node and one processor with one GPU for an ‘allocation’ using the dev_q on the V100 nodes looks this:

interact -l nodes=1:ppn=1:gpus=1 -W group_list=GROUP_NAME -A ALLOCATION_NAME -q v100_dev_q

For more information, visit the general FAQ on HPC jobs: FAQ

On the linux shell (on the compute node), type:

$ module load Anaconda/5.1.0
$ source activate pytorch_env
$ python

In the Python interpreter type the following to make sure your Pytorch environment loaded correctly:

>>> import torch
>>> device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
>>> print(device)

If you made this far, Congratulations! Your environment is now set up for use!

VT Cloud

You also use PyTorch on the ARC Cloud machines. The JupyterHub server can be accessed here: ‘JupyterHub’. Username and password will be provided in class.

Setting up PyTorch on your local machine

If you would like to set up PyTorch for use on a local machine, a Conda environment file for use can be downloaded pytorch.yml. Some instructions on how to set up a conda virtual environment from this file can be found in the post ‘Virtual environments for Anaconda Python’.

Introduction to PyTorch: Jupyter Notebook

This Jupyter Notebook introducing Pytorch was originally put together by Dr. Ahmed Ibrahim. Some modifications have been made to account for version changes ro run with PyTorch 0.4.1 and up. You can download it here: Intro.ipynb. If the file downloads with a ‘.txt’ extension, please rename to ‘.ipynb’.

Neural Network for Image Classification on the CIFAR10 dataset using PyTorch

As per the CIFAR website, the CIFAR10 dataset consists of 60,000 32x32 images in 10 classes with 6000 images per class. This is made up on 50,000 training images and 10000 test images. The example for image classification using the CIFAR10 dataset is essentially from the Pytorch website with some additional explanantions.

Neural Network using only CPUs

import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim


class Net(nn.Module):
    # Define the layers and parameters
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5) # in_channels x out_channels x kernel size
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    # Define the connections of the layers created above
    def forward(self, x):   # this gets called during training and evaluation for input x
        layer1_out = self.pool(F.relu(self.conv1(x)))
        layer2_out = self.pool(F.relu(self.conv2(layer1_out)))
        reshaped_layer2 = layer2_out.view(-1, 16 * 5 * 5)
        layer3_out = F.relu(self.fc1(reshaped_layer2))
        layer4_out = F.relu(self.fc2(layer3_out))
        final_result = self.fc3(layer4_out)
        return final_result


# Preprocessing and data loading
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

#initialze the neural network
net = Net()
# Set up a loss function
criterion = nn.CrossEntropyLoss()
# Set up the optimizer and pass the network parameters that are to be optimized
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

for epoch in range(100):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

# Create an iterator over the test dataset and get one mini-batch for testing
dataiter = iter(testloader)
images, labels = dataiter.next()

outputs = net(images)

# Argument max to get the predicted class
_, predicted = torch.max(outputs, 1)

print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(4)))


# Loop over the rest of the test dataset and get accuracy metric
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

Neural Networks using the CPU and GPU

A version of the above code modified to run on GPUs is shown below.

import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import os

LOAD_TRAINED_MODEL = False
TRAIN_MODEL = True

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")  # DEFINE DEVICE
#device = torch.device("cpu")

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

if ( LOAD_TRAINED_MODEL == False):
    net = Net()
    net.to(device)                           # MOVE NETWORK TO DEVICE
elif( LOAD_TRAINED_MODEL == True):
    print("Loading saved model mode.net")
    net = Net()
    net.load_state_dict(torch.load(os.getcwd() + '/mode.net'))  # Load state dictionary to load a trained module
    net.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
LOW_LOSS = False

if(TRAIN_MODEL == True):
  for epoch in range(5):  # loop over the dataset multiple times

    if (LOW_LOSS == True):
        break
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data
        inputs, labels = inputs.to(device), labels.to(device)  # MOVE DATA TO DEVICE

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            if( running_loss / 2000 < 0.57):
                LOW_LOSS = True
                print("Lowesr loss found ",running_loss / 2000)
                break
            running_loss = 0.0

  print('Finished Training')

  torch.save(net.state_dict(), os.getcwd()+'/mode.net')

dataiter = iter(testloader)
images, labels = dataiter.next()
images = images.to(device)                 # PASS TEST DATA TO DEVICE
print(images.type())

outputs = net(images)
_, predicted = torch.max(outputs, 1)
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
                              for j in range(4)))

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        images = images.to(device)
        labels = labels.to(device)
        outputs = net(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % (
    100 * correct / total))

Word Embeddings and Continuous Bag of Words in Pytorch

Refer to the post on ‘Word2Vec using PyTorch’ for implementations of both the Continuous Bag of Words and Skipgram approaches. These do not have negative sampling or any of the optimizations, thereby making it easy to illustrate or experiment with the fundamental concepts.