Researcher, educator and solver of computationally-intensive mathematical problems. I currently work as a Senior Data Science Developer Advocate at Databricks where I work on Data Science and Machine Learning. Prior to this, I worked as a Computational Scientist at Virginia Tech (2014 - 2020), where I had the privilege of working with some great minds and state-of-the-art science.

SuperComputing18 Presentations

Slides

The slides below were used for presentations at the SuperComputing 2018 conference in Dallas.

Overview of PyTorch

The posts associated with these slides can be found here and here.

Quick introduction to AutoML

The post associated with these slides can be found here. Note that this is still work in progress and will be updated periodically.

AutoML

An overview of Automated Machine Learning

Reader level: Intermediate Disclaimer: This post is work in progress and will be updated periodically. This is not meant to a comprehensive overview of the topic, but more of an introduction to AutoML, some tools and techniques. Overview Finding a model that works for a specific problem or a class of problems can be a time-consuming task. Usually, an engineer or a scientist determines what model class to use either based on his prior knowledge of the problem at hand or by evaluating several models and picking the best one. [Read More]

Gaussian Process Regression (Draft)

Uncertainty quantification

Reader level: Advanced Gaussian Distributions A Gaussian distribution exists over variables, i.e. the distribution explains how (relatively) frequently the values for those variables show up in observations. A Gaussian distribution for a n-dimensional vector variable is fully specified by a mean vector, μ, and covariance matrix Σ $$\mathrm{x} = (x_{1},....x_{n})^{T} \sim \mathcal{N}(\mu,\Sigma)$$ A univariate Gaussian distribution is given by $$p(x|\mu,\sigma^2) = \dfrac{1}{2\pi \sigma^2} e^{ \dfrac{ -(x - \mu)^2 }{2 \sigma^2} }$$ where μ is the mean and σ is the standard deviation for the Gaussian. [Read More]

Word2Vec in Pytorch - Continuous Bag of Words and Skipgrams

Pytorch implementation

Reader level: Intermediate Overview of Word Embeddings Word embeddings, in short, are numerical representations of text. They are represented as ‘n-dimensional’ vectors where the number of dimensions ‘n’ is determined on the corpus size and the expressiveness desired. The larger the size of your corpus, the larger you want ‘n’. A larger ‘n’ also allows you to capture more features in the embedding. However, a larger dimension involves a longer and more difficult optimization process so a sufficiently large ‘n’ is what you want to use, determining this size is often problem-specific. [Read More]