About

Researcher, Educator and solver of computationally-intensive mathematical problems. Currently I am employed as a Computational Scientist at Virginia Tech.

Data Science with Neptune.ml

Data science workflow management tool & collaboration hub

Reader level: Introductory Table of Contents 1. Introduction 2. Overview of Neptune UI 3. How I used Neptune in my Keras ML project 4. What I have not covered Introduction Neptune.ml is a workflow management and collaboration tool for Data Science and Machine Learning (DS/ML). I have had the pleasure of testing this platform out for my own work and I must admit that I am convinced that every Data Science team needs something like this. [Read More]

Self-attention for Text Analytics

Visualization

Reader level: Intermediate The Self-attention mechanism as shown in the paper is what will be covered in this post. This paper titled ‘A Structured Self-attentive Sentence Embedding’ is one of the best papers, IMHO, to illustrate the workings of the self-attention mechanism for Natural Language Processing. The structure of Self-attention is shown in the image below, courtesy of the paper: Suppose one has an LSTM of dim ‘u’ and takes as input batches of sentences of size ‘n’ words. [Read More]

Using RQ for scheduling tasks

RQ and remote scheduling

Reader level: Introductory RQ can be used to set up queues for executing long-running tasks on local or remote machines. Some steps on how to install and get started with RQ are listed below. Installation Create a virtual environment and we will have to install the following components: Redis-server RQ RQ-scheduler Install Redis using the following wget http://download.redis.io/redis-stable.tar.gz tar xvzf redis-stable.tar.gz cd redis-stable make Run ‘make test’ to make sure things are working properly, followed by ‘sudo make install’ to complete the installation. [Read More]

Publishing Jupyter Notebooks using Gatsby and Netlify

A quick overview

Reader level: Introductory Build a Gatsby website using the following command. This will start a server running at port 8000, navigate using your browser. You can also access the GraphQL query page at localhost:8000/___graphql. gatsby develop Once you are done developing, you can build this website so it can deployed to a server such as Netlify or Gitlab pages. gatsby build Once you have the above you can go ahead and set up your Netlify account and link your current folder. [Read More]

SuperComputing18 Presentations

Slides

The slides below were used for presentations at the SuperComputing 2018 conference in Dallas.

Overview of PyTorch

The posts associated with these slides can be found here and here.

Quick introduction to AutoML

The post associated with these slides can be found here. Note that this is still work in progress and will be updated periodically.

AutoML

An overview of Automated Machine Learning

Reader level: Intermediate Disclaimer: This post is work in progress and will be updated periodically. This is not meant to a comprehensive overview of the topic, but more of an introduction to AutoML, some tools and techniques. Overview Finding a model that works for a specific problem or a class of problems can be a time-consuming task. Usually, an engineer or a scientist determines what model class to use either based on his prior knowledge of the problem at hand or by evaluating several models and picking the best one. [Read More]

Gaussian Process Regression (Draft)

Uncertainty quantification

Reader level: Advanced Gaussian Distributions A Gaussian distribution exists over variables, i.e. the distribution explains how (relatively) frequently the values for those variables show up in observations. A Gaussian distribution for a n-dimensional vector variable is fully specified by a mean vector, μ, and covariance matrix Σ $$ \mathrm{x} = (x_{1},....x_{n})^{T} \sim \mathcal{N}(\mu,\Sigma) $$ A univariate Gaussian distribution is given by $$ p(x|\mu,\sigma^2) = \dfrac{1}{2\pi \sigma^2} e^{ \dfrac{ -(x - \mu)^2 }{2 \sigma^2} } $$ where μ is the mean and σ is the standard deviation for the Gaussian. [Read More]

Word2Vec in Pytorch - Continuous Bag of Words and Skipgrams

Pytorch implementation

Reader level: Intermediate Overview of Word Embeddings Word embeddings, in short, are numerical representations of text. They are represented as ‘n-dimensional’ vectors where the number of dimensions ‘n’ is determined on the corpus size and the expressiveness desired. The larger the size of your corpus, the larger you want ‘n’. A larger ‘n’ also allows you to capture more features in the embedding. However, a larger dimension involves a longer and more difficult optimization process so a sufficiently large ‘n’ is what you want to use, determining this size is often problem-specific. [Read More]

CS4984/5984 Big Data Summarization

Class notes

Connecting to ARC machines Cascades The ARC cluster that will be used for this class is ‘Cascades’. Detailed instructions on how to access this machine can be found here. A quick overview of how to login and submit jobs is given below. To login: ssh username@cascades1.arc.vt.edu where username is your PID and your password is the VT PID password followed by a comma and the two-factor six-digit code. For e.g. the password looks like this: [Read More]