Running MATLAB on a cluster (at IU)

Getting an account

First, get an account (not sure if a student still needs a professor's approval or not). I would probably recommend getting an account on Quarry, as it seems to be less utilized than Big Red II, but suit yourself. Once you have an account, you should verify that you also have a directory created on the Data Capacitor, e.g. /N/dc2/scratch/your-username

https://itaccounts.iu.edu/ --> https://ams.iu.edu/skit/SkitMain.aspx

General info at:

https://kb.iu.edu

https://cybergateway.uits.iu.edu/iugateway/index

Help at email addr: hps-admin@iu.edu

Use Cases

There are probably two primary use cases for running MATLAB on a cluster:

generate independent (perhaps parameter-dependent) datasets

analyze existing, multiple datasets

We will demonstrate how to do the former first.

Limits on MATLAB licenses

I've been told that we (all of IU) have only 100 licenses to run MATLAB jobs. That's not many, especially if a class is trying to do, say, a parameter study involving 1000s of runs. (One more reason why I personally prefer an Octave, or better yet, Python approach. Note that Octave is not currently available on the clusters).

Copying data

To copy data to/from the cluster, you will need some type of secure copy program. On Mac/Linux, you should have 'scp'. On Windows, you might use 'putty'.

Batch jobs

Here's an example TORQUE(OpenPBS) script, matlab_mult.pbs, to submit MATLAB jobs. Obviously, one would want to change the email address for being notified of the job starting/stopping.


#!/bin/bash 
#PBS -l nodes=1:ppn=1,walltime=30:00 
#PBS -t 1-3
#PBS -M heiland@iu.edu 
#PBS -m abe 
#PBS -N matlabTest 
#PBS -o matlabTest.out
#PBS -e matlabTest.err

cd /N/u/heiland/Quarry/matlab

matlab -nojvm -nodisplay -nosplash -r "myfunc($PBS_ARRAYID)"

And here's the relevant MATLAB script, myfunc.m, that gets invoked. The 'arrayID' input parameter comes from the index range specified by "#PBS -t [range]". For demonstration purposes, this is used as a parameter representing the sin function's period.

Notice that I create an output data file on a shared disk space, in this case, the Data Capacitor: /N/dc2/scratch/heiland


function myfunc(arrayID)
 
%display(arrayID);

%a = str2num(a);
%display(a)
x=0:pi/100:2*pi;
y2=sin(double(arrayID)*x);
%plot(x,y2)
mydata = [x;y2];

% Create a unique filename using the 'arrayID' 
fname = strcat('/N/dc2/scratch/heiland/data',num2str(int32(arrayID)));
fname = strcat(fname,'.dat');;
fileID = fopen(fname,'w');
% If you want a header line, e.g.:
%fprintf(fileID,'%6s %12s\n','x','y');

% Create a CSV (comma separated values) ascii file:
fprintf(fileID,'%6.2f, %6.2f\n', mydata);
fclose(fileID);
quit;

Once you have these, you would submit your job via 'qsub':


[heiland@q0143 ~]$ qsub matlab_mult.pbs

You can check on the status of your job via 'qstat' (qstat -u [username]):


[heiland@q0143 ~]$ qstat -u heiland

Assuming the scripts run successfully, you should have output data created on the Data Capacitor scratch space. Be warned, that is exactly what it sounds like, "scratch" space, meaning temporary. You should copy it elsewhere as soon as you can.

After copying the data files from the cluster to my Mac:


scp heiland@quarry.uits.indiana.edu:/N/dc2/scratch/heiland/data* .

I can then use MATLAB/Octave (or Python, etc) to plot results, e.g.:


m=csvread('data1.dat');
plot(m(:,1),m(:,2))