This page contains slides to the ‘Data Scientist’s Python Toolbox’ workshop that I have taught at the PEARC conference.
Automated Gradient Computation
Overview of Techniques
Reader level: Beginner Essentials of Chain Rule for Differentiation This section describes some of the basics of chain rule for differentiation that we will use to describe automatic differentiation.
$$ \dfrac{ df(a \cdot b) }{dx} = b\dfrac{da}{dx} + a\dfrac{db}{dx} $$
For a function ‘f’ that is dependent on variables ‘ni(x)‘, the partial derivative w.r.t an independent variable ‘x’ can be written as shown below
$$ \dfrac{\partial f}{ \partial x} = \sum \dfrac{\partial f}{\partial n_{i} } \cdot \dfrac{ \partial n_i }{\partial x} $$
[Read More]
Continuous Integration for Research Computing
Best Practices for Research
Reader level: Beginner Continuous Integration Continuous Integration (CI) is the process of systematic code testing to ensure that changes made to a codebase do not break existing functionality. Although, firming entrenched into the software development lifecycle research projects rarely takes advantage of this important step. While this may be overkill for single-owner projects that are relatively small, any collaborative project of a reasonable size needs regular testing to avoid absolute pandemonium.
[Read More]
Virtualization Technology for Research Computing
Virtual Machines, system and application containers
Reader level: Beginner Virtualization Technology Some of the virtualization technologies that span Cloud Computing services, in increasing order of resource usage, is listed below. This list is not comprehensive, simply an overview of the popular options:
Application Containers (e.g. Docker/Rocket) Open platform for running applications in a linux container. This is usually used to run a single process per container and workloads requiring multiple processes are encouraged to be run in different containers that ‘talk’ to each other.
[Read More]
Introduction to Python Plot.ly for Data Visualization
Workshop Jupyter Notebook
This page contains the Jupyter notebook for the class ‘Introduction to Python Plot.ly for Data Visualization’ workshop that I have taught at Virginia Tech and at the XSEDE conference.
You can download the notebook here. If it downloads with a ‘.txt’ extension, please remove that extension and rename it to ‘Plotly.ipynb’ before opening it.
Introduction to Python for Scientific Computing
Workshop slides
This page contains slides to the ‘Python for Scientific Computing’ workshop that I have taught at Virginia Tech and at the XSEDEconference.