View on GitHub

Computational Techniques for Life Sciences

Parallelization

According to the Stanford CPU database, processors haven’t gotten faster since 2005.

No matter how much we’ve spent on the latest and greatest PC, sequential (single-core) programs won’t be going any faster. Even at TACC, Stampede processors ran between 2.7 and 3.5 GHz and Stampede 2 KNL processors run at 1.4 GHz!

However, transistor and core counts are increasing.

Stampede had 16 cores in the main CPU, while Stampede 2 will have 68! We can take advantage of these resources with threaded code and concurrent task scheduling.

Threaded Applications

The most proactive thing we can do is run our applications on multiple threads/processors when possible. This means reading the manual, finding the option for scaling execution, and usually setting it to correspond with the system core count.

TACC System	Cores/Node
Stampede	16
Stampede 2 KNL	68
Stampede 2 SKX	48
Lonestar 5	24
Maverick	20
Wrangler	24

The version of sort on this system does have a parallel option, so lets give it a try using our standard workflow.

$ time sort -S 100M -k1,1 -k2,2n SRR2014925.bed | head
$ time sort --parallel 24 -S 100M -k1,1 -k2,2n SRR2014925.bed | head

We can also use a loop to see where increasing the number of threads becomes ineffective.

for N in {1..6}
do
   time sort --parallel $N -S 100M -k1,1 -k2,2n SRR2014925.bed > /dev/null
done 2>&1 | grep "real" | awk '{ print NR" cores\t"$0 }'

NOTE: time writes to stderr instead of stdout, so we need the 2>&1 redirection to grep out the “real” time.

Do you see a point where extra cores becomes ineffective? Does increasing the memory limit reduce the runtime further?

Workflow scaling

Lets see how well our tophat2 workflow scales on yeast data. I made a convenience script that you can copy to your scratch directory.

$ cp /work/03076/gzynda/stampede2/ctls-public/run_tophat_yeast.sh .

If you take a look at the file

PUBLIC=/work/03076/gzynda/stampede2/ctls-public
VER=Saccharomyces_cerevisiae/Ensembl/EF4
GENES=${PUBLIC}/${VER}/Annotation/Genes/genes.gtf
REF=${PUBLIC}/${VER}/Sequence/Bowtie2Index/genome

SIZE=${1:-500K}
CORES=${2:-8}

# Load standard module
ml tophat/2.1.1 bowtie/2.3.2


for prefix in WT_{C,N}R_A_${SIZE}; do
        OUT=${prefix}_n${CORES}_tophat
        [ -e $OUT ] && rm -rf $OUT
        tophat2 -p ${CORES} -G $GENES -o $OUT --no-novel-juncs $REF ${PUBLIC}/${prefix}.fastq &> ${OUT}.log
done

You’ll notice that it aligns yeast reads against a reference and annotation that I have in my public work directory, and it takes two parameters:

number of reads [500K] or 1M
number of cores [8]

You can run it 500K reads on 4 cores using the commands

$ bash run_tophat_yeast.sh 500K 4

Lets see how well this scales using the time command, and claiming a cell in this google doc to fill out.

Do you see any interesting results?

Explore

Try experimenting with or reading about thread count on other tools you know
Try using /bin/time -v to see if memory scales with input size

Back - Monitoring Jobs — Next - Task Distribution