About

Researcher, Educator and solver of computationally-intensive mathematical problems. I currently work as a Computational Scientist at Virginia Tech.

Rclone for Data Transfer - Google Drive

Data backup

Rclone website is a tool for data transfer to and from a variety of sources including your local machine. A few commands for interacting with Google Drive and transferring data to and from a local machine is shown below. Use the following to setup your remote for Google gdrive rclone config To list remotes rclone listremotes remote_google: To list the directories in this drive rclone lsd remote_google: To list all the files [Read More]

Backup of Gitlab repositories

Reproducible Git

If you ever had to download or backup your Gitlab repositories, you would probably have to do that manually for every repository you own. As of this writing, I had 54 and that was not my idea of a lazy afternoon. So I used the Gitlab API and with the help of Gitlab ‘Private Token’ setup this Python script to do the job for me. import requests import json import os def get_repo(repo): os. [Read More]

Multi-GPU Computing with Pytorch (Draft)

An overview

1. Introduction Pytorch provides a few options for mutli-GPU/multi-CPU computing or in other words distributed computing. While this is unsurprising for Deep learning, what is pleasantly surprising is the support for general purpose low-level distributed or parallel computing. Those who have used MPI will find this functionality to be familiar. Pytorch can be used for the following scenarios: Single GPU, single node (multiple CPUs on the same node) Single GPU, multiple nodes Multiple GPUs, single node Multiple GPUs, multiple nodes Pytorch allows ‘Gloo’, ‘MPI’ and ‘NCCL’ as backends for parallelization. [Read More]

Easiest way toward Multi-GPU training in Tensorflow 2

Quick tip

Overview Easy parallelization over multiple GPUs can be accomplished in Tensorflow 2 using the ‘MirroredStrategy’ approach, especially if one is using Keras through the Tensorflow integration. This can be used as a replacement for ‘multi_gpu_model’ in Keras. There are a few caveats (bugs) with using this on TF2.0 (see below). An example illustrating its use is shown below where two of the GPU devices are selected. import tensorflow as tf from tensorflow. [Read More]

RVATECH/DataSummit 2020

Introduction to AutoML

The following slides are an overview of AutoML. This is an updated version of the slides presented at SuperComputing18. Additionally, this session covers an introduction to H2O for model selection and Comet.ml for hyperparameter optimization. Introduction to AutoML H2O H2O is a tool that allows you to perform Automated Machine Learning. A Jupyter notebook with an introduction to H2O can be found in the GitHub repository. The binder path to the repository is located here [Read More]

Hyperparameter Optimization with Comet.ml

Data science workflow management tool & collaboration hub

Reader level: Introductory Introduction to Comet.ml Comet.ml is an API-driven framework for workflow management in Machine learning and Data Science experiments. Comet’s hyperparameter optimization is roughly based on the Advisor hyperparameter black box optimization tool. It allows you to add API calls from your code to perform optimization on a selected set of hyperparameters using Comet’s cloud service. This requires that you install the comet python package ‘comet_ml’. [Read More]

Tensorflow in Jupyter Notebook for Multi-GPU environments

Options/Best Practices

When running Jupyter notebooks on machines will multiple GPUs one might want to run individual notebooks on separate GPUs to take advantage of your available resources. Obviously, this is not the only type of parallelism available in TensorFlow, but not knowing how to do this can severely limit your ability to run multiple notebooks simultaneously since Tensorflow selects your physical device 0 for use. Now if you have two notebooks running and one happens to use up all the GPU memory on your physical device 0, then your second notebook will refuse to run complaining that it is out of memory! [Read More]

Data Science with Neptune.ml

Data science workflow management tool & collaboration hub

Reader level: Introductory Table of Contents 1. Introduction 2. Overview of Neptune UI 3. How I used Neptune in my Keras ML project 4. What I have not covered Introduction Neptune.ml is a workflow management and collaboration tool for Data Science and Machine Learning (DS/ML). I have had the pleasure of testing this platform out for my own work and I must admit that I am convinced that every Data Science team needs something like this. [Read More]