Researcher, Educator and solver of computationally-intensive mathematical problems. Currently I am employed as a Computational Scientist at Virginia Tech.

## Data Science with Neptune.ml

### Data science workflow management tool & collaboration hub

Reader level: Introductory Table of Contents 1. Introduction 2. Overview of Neptune UI 3. How I used Neptune in my Keras ML project 4. What I have not covered Introduction Neptune.ml is a workflow management and collaboration tool for Data Science and Machine Learning (DS/ML). I have had the pleasure of testing this platform out for my own work and I must admit that I am convinced that every Data Science team needs something like this. [Read More]

## Self-attention for Text Analytics

### Visualization

Reader level: Intermediate The Self-attention mechanism as shown in the paper is what will be covered in this post. This paper titled ‘A Structured Self-attentive Sentence Embedding’ is one of the best papers, IMHO, to illustrate the workings of the self-attention mechanism for Natural Language Processing. The structure of Self-attention is shown in the image below, courtesy of the paper: Suppose one has an LSTM of dim ‘u’ and takes as input batches of sentences of size ‘n’ words. [Read More]

## Using RQ for scheduling tasks

### RQ and remote scheduling

Reader level: Introductory RQ can be used to set up queues for executing long-running tasks on local or remote machines. Some steps on how to install and get started with RQ are listed below. Installation Create a virtual environment and we will have to install the following components: Redis-server RQ RQ-scheduler Install Redis using the following wget http://download.redis.io/redis-stable.tar.gz tar xvzf redis-stable.tar.gz cd redis-stable make Run ‘make test’ to make sure things are working properly, followed by ‘sudo make install’ to complete the installation. [Read More]

## Publishing Jupyter Notebooks using Gatsby and Netlify

### A quick overview

Reader level: Introductory Build a Gatsby website using the following command. This will start a server running at port 8000, navigate using your browser. You can also access the GraphQL query page at localhost:8000/___graphql. gatsby develop Once you are done developing, you can build this website so it can deployed to a server such as Netlify or Gitlab pages. gatsby build Once you have the above you can go ahead and set up your Netlify account and link your current folder. [Read More]

## SuperComputing18 Presentations

### Slides

The slides below were used for presentations at the SuperComputing 2018 conference in Dallas.

### Overview of PyTorch

The posts associated with these slides can be found here and here.

### Quick introduction to AutoML

The post associated with these slides can be found here. Note that this is still work in progress and will be updated periodically.

## AutoML

### An overview of Automated Machine Learning

Reader level: Intermediate Disclaimer: This post is work in progress and will be updated periodically. This is not meant to a comprehensive overview of the topic, but more of an introduction to AutoML, some tools and techniques. Overview Finding a model that works for a specific problem or a class of problems can be a time-consuming task. Usually, an engineer or a scientist determines what model class to use either based on his prior knowledge of the problem at hand or by evaluating several models and picking the best one. [Read More]

## Gaussian Process Regression (Draft)

### Uncertainty quantification

Reader level: Advanced Gaussian Distributions A Gaussian distribution exists over variables, i.e. the distribution explains how (relatively) frequently the values for those variables show up in observations. A Gaussian distribution for a n-dimensional vector variable is fully specified by a mean vector, μ, and covariance matrix Σ $$\mathrm{x} = (x_{1},....x_{n})^{T} \sim \mathcal{N}(\mu,\Sigma)$$ A univariate Gaussian distribution is given by $$p(x|\mu,\sigma^2) = \dfrac{1}{2\pi \sigma^2} e^{ \dfrac{ -(x - \mu)^2 }{2 \sigma^2} }$$ where μ is the mean and σ is the standard deviation for the Gaussian. [Read More]

## Word2Vec in Pytorch - Continuous Bag of Words and Skipgrams

### Pytorch implementation

Reader level: Intermediate Overview of Word Embeddings Word embeddings, in short, are numerical representations of text. They are represented as ‘n-dimensional’ vectors where the number of dimensions ‘n’ is determined on the corpus size and the expressiveness desired. The larger the size of your corpus, the larger you want ‘n’. A larger ‘n’ also allows you to capture more features in the embedding. However, a larger dimension involves a longer and more difficult optimization process so a sufficiently large ‘n’ is what you want to use, determining this size is often problem-specific. [Read More]