SuperComputing18 Presentations

Slides

The slides below were used for presentations at the SuperComputing 2018 conference in Dallas.

Overview of PyTorch

The posts associated with these slides can be found here and here.

Quick introduction to AutoML

The post associated with these slides can be found here. Note that this is still work in progress and will be updated periodically.

AutoML

An overview of Automated Machine Learning

Reader level: Intermediate Disclaimer: This post is work in progress and will be updated periodically. This is not meant to a comprehensive overview of the topic, but more of an introduction to AutoML, some tools and techniques. Overview Finding a model that works for a specific problem or a class of problems can be a time-consuming task. Usually, an engineer or a scientist determines what model class to use either based on his prior knowledge of the problem at hand or by evaluating several models and picking the best one. [Read More]

Gaussian Process Regression (Draft)

Uncertainty quantification

Reader level: Advanced Gaussian Distributions A Gaussian distribution exists over variables, i.e. the distribution explains how (relatively) frequently the values for those variables show up in observations. A Gaussian distribution for a n-dimensional vector variable is fully specified by a mean vector, μ, and covariance matrix Σ $$ \mathrm{x} = (x_{1},....x_{n})^{T} \sim \mathcal{N}(\mu,\Sigma) $$ A univariate Gaussian distribution is given by $$ p(x|\mu,\sigma^2) = \dfrac{1}{2\pi \sigma^2} e^{ \dfrac{ -(x - \mu)^2 }{2 \sigma^2} } $$ where μ is the mean and σ is the standard deviation for the Gaussian. [Read More]

Word2Vec in Pytorch - Continuous Bag of Words and Skipgrams

Pytorch implementation

Reader level: Intermediate Overview of Word Embeddings Word embeddings, in short, are numerical representations of text. They are represented as ‘n-dimensional’ vectors where the number of dimensions ‘n’ is determined on the corpus size and the expressiveness desired. The larger the size of your corpus, the larger you want ‘n’. A larger ‘n’ also allows you to capture more features in the embedding. However, a larger dimension involves a longer and more difficult optimization process so a sufficiently large ‘n’ is what you want to use, determining this size is often problem-specific. [Read More]

CS4984/5984 Big Data Summarization

Class notes

Connecting to ARC machines Cascades The ARC cluster that will be used for this class is ‘Cascades’. Detailed instructions on how to access this machine can be found here. A quick overview of how to login and submit jobs is given below. To login: ssh username@cascades1.arc.vt.edu where username is your PID and your password is the VT PID password followed by a comma and the two-factor six-digit code. For e.g. the password looks like this: [Read More]

Virtual environments for Anaconda Python

Useful conda commands

Installation using Conda To create a conda environment named ‘myenv’: conda create --name myenv To create an environment from a file ‘test.yml’: conda env create -f test.yml The environment name comes from the line ‘name: tag’ inside the ‘test.yml’ file. To create a named environment from a file ‘test.yml’: conda env create -f test.yml -n pytorch To create an environment from the base environment: conda create --name myenv --clone base To remove an environment named ‘envname’: [Read More]

PEARC 2018 Workshop

Workshop slides and Jupyter Notebooks

This page contains the materials for the workshop ‘Introduction to Machine Learning’ which has been accepted to be presented at PEARC18. Participants would have access to a server running the relevant Python 3 installation along with the tools Tensorflow and Keras. If you would like to install your own environment, please check the bottom of this page to download an conda environment file that can be used for configuration. The Jupyter notebooks can be downloaded from here. [Read More]