Parallel computing on Quest (NW cluster)
posted on January 6, 2016


Parallel Computing with Matlab on Quest. First, some general tips for writing a good parfor loop.

  1. Pre-define a structure called WhatToFit that specifies what inputs go into the fitting function, what models to fit, and what results to return or write to disk. (e.g. which neuron, which model-type, the indices of the cross-validation training and test sets, whether to return pseudo-R2s or not, etc.). This minimizes the inputs to the function, and WhatToFit can be parsed within each parallel execution of the function
  2. Make your fitting function write out the result to file or collect the results in a cell type. Indexing is complicated within parfor and you want to avoid that as much as possible.

Now, make sure your parfor loop runs as expected by testing on your local machine. This saves a lot of bugs on quest. Write a simple wrapper.m file that calls the parallelized function doing all the heavy lifting.

matlabpool open 2
parfor i=1:100
fprintf('Fitting iteration %d\n', i);
Result{i} = fitMyModel(WhatToFit(i));
end
matlabpool close

Once you’re ready, copy all your necessary data and m files from your local device to your home directory on Quest. Use an FTP GUI if you have one. Type this on your shell and enter your Quest password when prompted. If the paths are right, you should see a list of files that are successfully copied.

scp -rp <localpath>/<localfile> <user>@quest.it.northwestern.edu:/home/<user>/<questpath>/.
scp Quest1n13c8_prod.settings <user>@quest.it.northwestern.edu:/home/<user>/.

Also make sure your Quest home directory has the Quest2_b1024.settings file that specifies whatever the parallel toolbox needs to distribute jobs to workers. If it doesn’t exist, you can grab a copy from /home/pry194 on Quest.

cp /home/pry194/Quest2_b1024.settings ~/.

Now ssh into Quest and change directory to quest path:

ssh -X 	<user>@quest.it.northwestern.edu
cd <questpath>

First, launch Matlab from the shell and make a parallel computing profile using the settings file.

module load matlab
matlab &

In the GUI version, go to parallel settings and import the settings file. Now set this profile as default. Strictly speaking, you should have a validation settings file and validate the parallel computing toolbox, but let’s skip this and assume the toolbox works well.

Ok, now edit your wrapper.m file by replacing the line: matlabpool open 2 with the line: matlabpool('open', 'Quest1n13c8_prod'); This will make sure that we use the correct profile which gives access to all the workers we requested.

Finally, make a simple text file runMyCode.txt with the following content:

#!/bin/bash
module load matlab
matlab -nodisplay -nosplash -r wrapper > log.txt &
exit

And same for python

#!/bin/bash
module load python/anaconda
python simplecode.py > simplecode_result.txt &
exit

Make sure that the txt file is excutable. At the shell, type:

chmod u+x runMyCode.txt

Now actually run the file: ./runMyCode.txt

That’s it! Your code should be queued to run on as many workers you requested. The console output from Matlab will be saved in log.txt.

Ok, now you need some basic tools to monitor what the hell is going on.

showq | grep $USER

gives a list of jobs started by the user. You can get your job number (usually an 8 digit number) as well as its status: idle, running, completed etc. You can get more details on your job by using the job number:

checkjob <jobnumber>

If you started a job by mistake and want to cancel it, try:

canceljob <jobnumber>

You can continuously monitor the console output by peering into ‘log.txt’:

more log.txt (OR)
vi log.txt