Running MATLAB on a cluster (at IU)

Getting an account

First, get an account (not sure if a student still needs a professor's approval or not). I would probably recommend getting an account on Quarry, as it seems to be less utilized than Big Red II, but suit yourself. Once you have an account, you should verify that you also have a directory created on the Data Capacitor, e.g. /N/dc2/scratch/your-username

https://itaccounts.iu.edu/ --> https://ams.iu.edu/skit/SkitMain.aspx

General info at:

  • https://kb.iu.edu
  • https://cybergateway.uits.iu.edu/iugateway/index

    Help at email addr: hps-admin@iu.edu

    Use Cases

    There are probably two primary use cases for running MATLAB on a cluster:
  • generate independent (perhaps parameter-dependent) datasets
  • analyze existing, multiple datasets

    We will demonstrate how to do the former first.

    Limits on MATLAB licenses

    I've been told that we (all of IU) have only 100 licenses to run MATLAB jobs. That's not many, especially if a class is trying to do, say, a parameter study involving 1000s of runs. (One more reason why I personally prefer an Octave, or better yet, Python approach. Note that Octave is not currently available on the clusters).

    Copying data

    To copy data to/from the cluster, you will need some type of secure copy program. On Mac/Linux, you should have 'scp'. On Windows, you might use 'putty'.

    Batch jobs

    Here's an example TORQUE(OpenPBS) script, matlab_mult.pbs, to submit MATLAB jobs. Obviously, one would want to change the email address for being notified of the job starting/stopping.
    
    #!/bin/bash 
    #PBS -l nodes=1:ppn=1,walltime=30:00 
    #PBS -t 1-3
    #PBS -M heiland@iu.edu 
    #PBS -m abe 
    #PBS -N matlabTest 
    #PBS -o matlabTest.out
    #PBS -e matlabTest.err
    
    cd /N/u/heiland/Quarry/matlab
    
    matlab -nojvm -nodisplay -nosplash -r "myfunc($PBS_ARRAYID)" 
    
    
    And here's the relevant MATLAB script, myfunc.m, that gets invoked. The 'arrayID' input parameter comes from the index range specified by "#PBS -t [range]". For demonstration purposes, this is used as a parameter representing the sin function's period.

    Notice that I create an output data file on a shared disk space, in this case, the Data Capacitor: /N/dc2/scratch/heiland

    
    function myfunc(arrayID)
     
    %display(arrayID);
    
    %a = str2num(a);
    %display(a)
    x=0:pi/100:2*pi;
    y2=sin(double(arrayID)*x);
    %plot(x,y2)
    mydata = [x;y2];
    
    % Create a unique filename using the 'arrayID' 
    fname = strcat('/N/dc2/scratch/heiland/data',num2str(int32(arrayID)));
    fname = strcat(fname,'.dat');;
    fileID = fopen(fname,'w');
    % If you want a header line, e.g.:
    %fprintf(fileID,'%6s %12s\n','x','y');
    
    % Create a CSV (comma separated values) ascii file:
    fprintf(fileID,'%6.2f, %6.2f\n', mydata);
    fclose(fileID);
    quit;
    
    
    Once you have these, you would submit your job via 'qsub':
    
    [heiland@q0143 ~]$ qsub matlab_mult.pbs
    
    
    You can check on the status of your job via 'qstat' (qstat -u [username]):
    
    [heiland@q0143 ~]$ qstat -u heiland
    
    
    Assuming the scripts run successfully, you should have output data created on the Data Capacitor scratch space. Be warned, that is exactly what it sounds like, "scratch" space, meaning temporary. You should copy it elsewhere as soon as you can.

    After copying the data files from the cluster to my Mac:

    
    scp heiland@quarry.uits.indiana.edu:/N/dc2/scratch/heiland/data* .
    
    
    I can then use MATLAB/Octave (or Python, etc) to plot results, e.g.:
    
    m=csvread('data1.dat');
    plot(m(:,1),m(:,2))