HPC Software
System Packages
C / C++ / C
BLAST
Blast Example
Python
Conda
This is one of the best package managers i have used. It has many uses in an HPC environment.
Conda list environments
conda env list
Conda - create new environment with python 3.7
conda create -y -n python3.7 python=3.7
Conda - create new environment with python 3.7 and install jupyter.
conda create -y -n jupyter python=3.7 jupyter
Conda - Delete environment
conda env remove -y -n python3.7
Conda - Enable conda environment
source activate <name-of-environment>
Conda - Disable conda environment
source deactivate
Jupyter Notebooks
The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.
This can be installed on your local laptop or workstation.
These can also be used on many HPC Clusters which is exciting because there is often more software there and access to many more cpu and memory.
Installing Jupyter Notebooks
If Anaconda is installed then Jupyter is all ready installed.
If just conda is installed than run the following to install Juypter Notebooks.
conda create -y -n jupyter python=3.7 jupyter
source activate jupyter
DASK
For the python ecosystem, consider using Dask which provides advanced parallelism for analytics. Why use Dask versus (or along with) other options? Dask integrates with Numpy, Pandas, and Scikit-Learn, and it also:
- scales up to clusters with multiple nodes
- deployable on job queuing systems like PBS, Slurm, MOAB, SGE, and LSF
- also scales down to parallel usage of a single-node such as a server or laptop modern laptops often have a multi-core CPU, 16-32GB of RAM, and flash-based hard drives that can stream through data several times faster than HDDs or SSDs of even a year or two ago.
- supports a map-shuffle-reduce pattern popularized by Hadoop and is a smaller, lightweight alternative to Spark.
- works with MPI via mpi4py library and compatible with infiniband or other high speed networks. See example Dask Jobqueue for PBS cluster
from dask_jobqueue import PBSCluster
cluster = PBSCluster()
cluster.scale(10) # Ask for ten workers
from dask.distributed import Client
client = Client(cluster) # Connect this local process to remote workers
# wait for jobs to arrive, depending on the queue, this may take some time
import dask.array as da
x = ... # Dask commands now use these distributed resources