UCSD Logo UCSD Logo For Printing Skip navigation links

Navigation

UCSD Triton Resource @ SDSC

Quick Status

Triton Resource Node Status

Saturday, November 21st 2009 11:09:01 AM PST


TCC Rack 3 Nodes Down (1)

tcc-3-71.local

Total TCC Nodes Up: 247

Total 256GB (PDAF) Nodes Up: 20

Total 512GB (PDAFM) Nodes Up: 8

Rack 2 Up Count: 80

Rack 3 Up Count: 77

Rack 4 Up Count: 11

Rack 5 Up Count: 79

Frequently-Asked Questions

This FAQ answers some questions we expect Triton users may have. Please join the Triton Discussion List to ask for details about the policies and services described below. More detailed discussions are also presented on the topic pages of this site. The search function at the top of each page performs a Google Custom Search on this site only. To search the main SDSC site or other UCSD sites, please use search functions on those sites' pages.


Table of Contents



Site Search


How can I search for content on the Triton Resource Web site?
Use the search function on the top of each page of the Triton Resource Web site to perform a site-specific search. This will return only pages on this site.
How can I search the Triton Discussion list archives?
Peform a Google site search on the list server site. For example, to find pages that discuss queuing on Triton, search Google with the following command:

site:lists.sdsc.edu/pipermail/triton-discuss queue

You can also load the archives in your browser (or download the files) and perform a direct text search. Access the complete discussion list history at the Discussion List Archives.

Accounts


Do I need an allocation to run small-scale startup jobs on Triton?
During the Early Adopter phase (prior to October 5, 2009), users can access Triton without an allocation. A UCSD Active Directory and a Triton account are required. After October 5, an allocation will be required for all users. To request an allocation, use the Triton Affiliates and Partners Program (TAPP).
To obtain a Triton account, you may join the Triton Discussion List and obtain your account by posting a request there.
How do I get an account on Triton?
All users should request access to Triton through TAPP, the Triton Affiliates and Partners Program. For UCSD faculty, staff, and SDSC staff, access is based on the user's UCSD Active Directory account. For users at other UC campuses and for those outside of the UC system who do not have AD accounts at UCSD, accounts will be setup for them after TAPP approval.
See the documentation on this at the Accessing the Triton Resource page.
What is the limit of a default account usage before I am required to obtain an allocation to use Triton?
During Early Adopter phase, the limit is 1000 SUs. This can be renewed on request to the Triton Discussion List. Once the formal production phase begins on October 5, 2009, users must have a TAPP allocation to run jobs on Triton.
How do I get a formal allocation to use Triton for my research project?
Make a request for a Triton allocation through TAPP, the Triton Affiliates and Partners Program. During Early Adopter phase, make a request to the Triton Discussion List.
What is a Service Unit (SU)?

A Service Unit is an accounting measure used to calculate the cost of running jobs on Triton. Users receive allocations of SUs, which they may use to purchase or exchange for time to run jobs on the various Triton components. Triton's accounting system automatically deducts SUs from the accounts of users with allocations, based on their login credentials.

Some of Triton's resources are more expensive than others and users are charged a premium for running jobs that use those components.

What is a Base SU?
A Base Service Unit (BaseSU) is an accounting measure equal to one core-hour on Triton's main compute cluster. Jobs run on the PDAF, or large memory nodes, are charged a premium of SUs. One core-hour on a 256-GB node results in a charge of 2 SUs, while a core-hour on a 512-GB node incurs a charge of 4 SUs.
What Accounting and Charging Policies are in Use?
See also the Charge Policy page.
  1. The system uses Gold from Cluster Resources and PNL
  2. Configuration parameters of Gold are being fine-tuned at this time
  3. Charges are based on rounded CPU hours consumed, with premiums for PDAF (x2) and PDAFM (x4) nodes
  4. Rounding is to the nearest hour, and is applied after the number of CPUs is factored. For example, if you use 32 CPUs for two minutes, this is 64 CPU-minutes or one hour and four minutes, and is rounded to one hour. If 32 CPUs are used for three minutes, the 96 CPU-minutes are rounded to two CPU hours. It is assumed that rounding will balance out over time
  5. Single CPU and Small CPU-count jobs
    1. Single and small-count CPU jobs (less than eight processors) submitted to the batch queue will be allocated and charged for all eight processors on a TCC node
    2. A set of ten nodes has been designated as "shared" to accomodate these low CPU-count jobs. The queue, named small, consists of nodes tcc-2-0 through tcc-2-9. In this queue, users will only be charged for the CPUs they request, but the jobs may have to share the node and its memory with other jobs
    3. The goal is to balance single CPU usage with larger jobs to reduce blocking
  6. Charges for reserved resources
    1. Single CPU jobs submitted to the batch queue will be allocated and charged for all processors on the node. To be charged for fewer processors, a job must be submitted to the small queue.
    2. TCC nodes Jobs larger than eight CPUs will be charged for the number of nodes that are allocated for the job. These jobs get their nodes in an exclusive mode. For example, if a job asks for 60 CPUs, it will incur a charge of 64 (since it blocks 64 CPUs and was allocated eight nodes). Since TORQUE allows specification of fewer than eight CPUs per node via the ppn directive, users are charged for the number nodes their job is actually allocated, not the number of CPUs requested
    3. PDAF/M nodes Jobs are charged by the amount memory allocated, in multiples of 128GB rounded to the nearest CPU-hour for the number of processing cores allocated. If a job requests 200GB, it will be charged for 256GB, and will get a complete PDAF node and all 32 processing cores that belong to it. Users are urged to request full PDAF/M nodes, since memory is the key asset on these nodes
What happens when my Early Adopter allocation runs out?
The staff watches your balance and adds SUs to your account as needed. When the system is in full production (after October 5, 2009), you will need a TAPP allocation or a special project allocation in order to run jobs, but until then your account will be renewed for the asking.
Do running jobs get stopped if my allocation runs out?
Jobs will not start unless you have enough credit in your account to run the job to completion. You can check this by running mybalance. If you submit a job when your allocation does not have enough SUs to run it to completion, it will go into the Deferred state. If your job is Deferred, you can do a checkjob <jobid> to find out why.
Running jobs will be allowed to complete.
What are the charge calculation formulas?

Before a job can be scheduled, the system verifies available credits in the user account. It does not actually charge the account at this time, but SUs (CPU-hour credits) equal to the estimated charges must be available. The system uses values from the job script to estimate these charges according to the following formulas:

The formula for batch queue requests is:

#CPUs x #nodes x wall time

The formula for large queue requests is:

ChargeFactor x #CPUs x #nodes x wall time

The ChargeFactor for a 256-GB node is 2; for a 512-GB node it is 4.

Here are some examples:

  1. Queue batch: for a single-node job requesting four CPUs with two hours maximum wall clock time, submitted via this script:

    #PBS -q batch
    #PBS -l nodes=1:ppn=4
    #PBS -l walltime=2:00:00

    For this request to get scheduled, the account must have available

    8 x 1 x 2 = 16 SUs

    Batch queue nodes are charged for all eight CPUs regardless of how many are actually requested or used by the job. To be charged only for CPUs actually used, submit to the small queue.

  2. Queue large: for a single-node job requesting four CPUs with two hours maximum wall clock time, submitted via this script:

    #PBS -q large
    #PBS -l nodes=1:ppn=4
    #PBS -l walltime=2:00:00

    For this request to get scheduled, the account must have available

    2 x 4 x 1 x 2 = 16 SUs

    Adding 256GB memory to this request increases the CPUs required. The ChargeFactor becomes 4 due to requesting the large queue.

    #PBS -l mem=256GB

    Note: This memory request will result in actual availability of 252 GB due to system overhead.

    For this request to get scheduled, the account must have available

    2 x 32 x 1 x 2 = 128 SUs

    These jobs could be scheduled on either a 256-GB node or a 512-GB node, depending on availability and system load. If the memory-requesting job runs on a 512-GB node, the ChargeFactor would be 4, but CPUs required would be 16, so the SUs required to schedule remains 128.

  3. Queue large: changing the memory request to 512 GB increases the CPUs required, and guarantees the job will be scheduled on a 512-GB node with 504 GB memory available (and using a ChargeFactor of 4):

    #PBS -l mem=512GB

    For this request to get scheduled, the account must have available

    4 x 32 x 1 x 2 = 256 SUs

When any job finishes, the account gets debited by actual CPU time used (rounded to the nearest half hour).

If a job runs more than five minutes beyond its requested wall time, it will be canceled by the system. Jobs charges that exceed available SUs in the account will not be canceled, but will result in a negative balance that can be credited later.


Running Jobs


How do I run my parallel job on Triton?
See the documentation on this at the Running Parallel Jobs and the Quick Start Guide pages.
How do I charge jobs to my account?

By default, user accounts are set to charge jobs against a personal account that matches their username. If preferred, the default can be set to charge against another account, such as a project or shared account based on a TAPP or campus allocation.

If a job is submitted and the account is depleted below the estimated SUs needed to run, it will be deferred until the account is replenished. The qstat -f command will report a message similar to:

cannot debit job account - no funds

After the account balance is adjusted, the job will be able to run without being resubmitted. It will go into the idle state when Moab rechecks balances, and then get scheduled normally.

How do I check my account balance?

You can check your account balance and status by running gbalance -u <username>. This will show your personal account as well as any other accounts you can charge to.

How do I charge a job to a non-default account?

To charge to an authorized non-default account, include the line

#PBS -A <acct_name>

in your submission script, or include -A <acct_name> in your qsub command. You can charge to any account displayed by gbalance -u <your_username>.

If you are running jobs with Star-P, this option would go into your Star-P configuration file on the machine where you are running the Star-P client. In the Star-P configuration editor, select "Workload Manager Overides". Click on "Extra Arguments" and enter

-A <acct_name>

What are the queue policies for running jobs?
  1. Updated Queue Policies
    1. Improved Wait Times for Interactive PDAF and PDAFM Nodes
      1. More predictable wait times for —interactive— access to large memory nodes
        1. An new queue called express has been provisioned. It is active between 8 a.m. and 8 p.m. Monday through Friday
        2. Jobs submitted to this queue should have more predictable wait times, thereby facilitating usage in interactive mode
        3. Users will be allowed only one job in this queue at a time, with a status of either waiting or running, and a maximum run time of two hours
        4. Initially, the express queue will have one PDAF node and one PDAFM node. The runtime, number of reserved nodes, and hours of the day when this is available may be modified if usage so dictates
        5. This queue can also be used for batch jobs, but the intention is to have non-interactive jobs run in the other queues whenever possible.
      2. Users will always be able to get an interactive session on PDAF/M nodes for more than two hours by running qsub -I -q large, but this has a less predictable wait time before startup

See the Policies page and the Running Jobs page for more details on job queues.

How can I combine several serial jobs to submit to Triton on one request?
Rather than being submitted individually, jobs can be grouped together and submitted using a single batch script procedure such as the one described on the Bundling Serial Jobs page. Although it's preferrable to run parallel codes whenever possible, sometimes that is not cost-effective, or the tasks are simply not parallelizable. In that case, using a procedure like this can save time and effort by organizing multiple serial jobs into a single input file and submitting them all in one step.
How can I determine how many nodes are in use at the current time?

There is a script that captures much of the qstat output and displays it in a convenient summary. The script is available at

/home/beta/scripts/node_usage

Invoke the script by running the command:

/home/beta/scripts/node_usage/nodes_in_use

It displays output similar the following:

TCC JOBS RUNNING  107  TCC JOBS WAITING  60
PDAF JOBS RUNNING  7  PDAF JOBS WAITING  0
TCC NODES IN USE  217  PDAF NODES IN USE  12
TCC NODES REQUESTED  281  PDAF NODES REQUESTED  0
TCC CORES IN USE  1688  PDAF CORES IN USE  172
AVERAGE NODES/JOB (TCC)   2.0 AVERAGE NODES/JOB (PDAF)   1.7
AVERAGE CORES/JOB (TCC)  15.8 AVERAGE CORES/JOB (PDAF)  24.6
What are the policies for running small and interactive jobs?
  1. Single/Small CPU-count jobs
    1. Single CPU jobs submitted to the batch queue will be allocated and charged for all eight processors on a TCC node
    2. A set of ten nodes are designated "shared" to accomodate low CPU-count jobs. These nodes belong to the queue named small. Users will only be charged the CPUs they request, but jobs in this queue may —share— the node with other jobs to balance single CPU usage with larger jobs and limit the idling of unused processors

    See the Policies page for more details.

  2. Improved Wait Times for Interactive PDAF and PDAFM Nodes
    1. More predictable wait times for —interactive— access to large memory nodes
      1. An new queue called express has been provisioned. It is active between 8 a.m. and 8 p.m. Monday through Friday
      2. Jobs submitted to this queue should have more predictable wait times, thereby facilitating usage in interactive mode
      3. Users will be allowed only one job in this queue at a time, with a status of either waiting or running, and a maximum run time of two hours
      4. Initially, the express queue will have one PDAF node and one PDAFM node. The runtime, number of reserved nodes, and hours of the day when this is available may be modified if usage so dictates
      5. This queue can also be used for batch jobs, but the intention is to have non-interactive jobs run in the other queues whenever possible.
    2. Users will always be able to get an interactive session on PDAF/M nodes for more than two hours by running qsub -I -q large, but this has a less predictable wait time before startup
How can I use a single submission script to utilize all processors on a node?

You can use the bash wait command. This will allow your submission script (and your charge time) to terminate as soon as the last process completes.

The following example puts two processes in the background, one for 10 seconds, one for 20. The whole process will complete in about 20 seconds, since the two sleep processes run simultaneously.

#!/bin/bash
date
sleep 10 &
sleep 20 &
wait
date

You can replace sleep xx with your own programs. Your PBS script will finish when the last process completes.

With batch systems, the script you submit to PBS is executed only on the first node allocated to you. Although the system may allocate additional resources to you, your script must start independent jobs on those resources. This is why in parallel batch submission scripts, mpirun (or an equivalent) is used to get codes running on the additional allocated nodes. In a trivial case, you might only request one node and run several single-CPU jobs simultaneously.

How do I request time on the large memory Petascale Data Analysis Facility (PDAF and PDAFM)?

Users can request large memory nodes by adding a "memory required" attribute to their job. All memory requests are rounded up to the nearest multiple of 128GB. Large memory nodes are allocated and charged in the following way:

I. Exclusive Access

Memory requests of 256GB or 512GB are scheduled as exclusive-use on nodes. You are charged for 32-cores at the appropriate rate (e.g. 64SU/256GB and 128SU/512GB)

II. Shared Access

128GB and 384GB requests are always allocated on shared access nodes. With Shared Access, there may be resource contention for CPUs, network Bandwidth, or Memory Bandwidth. 128GB requests cost 32SU, 384GB requests cost 96SU.

III. Access to Specific Memory Configurations

The Triton node allocation policy favors PDAF nodes over more expensive PDAFM nodes, but this will not ensure a job will run on a PDAF node if a PDAFM node fits the scheduler's plan. To require a specific large queue node type, use the memory feature:

#PBS -l nodes=1:mem256gb
or
#PBS -l nodes=1:mem512gb

Note: The first example is not the same as specifying:

#PBS -l mem=256GB

which will only suggest and not require a PDAF node.

Optional Request for 256GB memory requirements A user, at his/her discretion may also flag a submission with "Shared OK" to indicate that a 256GB request may be placed onto a shared access node. The only benefit for the user is that queue times may be shorter because the Shared Access 512-GB nodes can also match the request.

Recommendation: It is highly recommended that users of large memory nodes either request 256GB or 512GB memory allocations. In this way, applications are guaranteed exclusive access to nodes and will have more predictable performance.

How many SUs will I be charged to run a 3-hour job that requires 384 GB on the large memory nodes?

Jobs on the PDAF and PDAFM (large memory) nodes are allocated on the basis of processing cores required. There are 32 cores on Triton's PDAF and PDAFM nodes. A 256-GB (PDAF) node has 8 GB of memory per core, and a 512-GB (PDAFM) node has 16 GB per core.

The PDAF and PDAFM nodes are allocated in 128GB chunks (either eight or 16 cores). For a 384-GB job, users must request either 384 GB (24 Cores) or 512 GB (32 Cores). A 384-GB request may result in sharing the node with another process (to consume the remaining 128 GB). For exclusive access to the node, a user must request 512 GB of memory, and would be allocated 32-cores. The charge would be:

3 hours * 32 Cores * Price/Core = 96 * (4 * BaseSU) = 384 BaseSU

For Shared Access to the node, the calculation would be:

3 hours * 24 Cores * (4 * BaseSU) = 288 BaseSU

To run a job on the large memory nodes, your batch script should specify the queue "large":

#PBS -q large

One additional complication: due to overhead, a small amount of memory is consumed by the system from each processing core. Thus, for a 128-GB allocation, only about 126 GB is actually available to the job. Similarly, for a 256-GB allocation only 252 GB is available, and for a 512-GB allocation 504 GB is available. One must request the next highest 128-GB increment (and be allocated and charged as such) in order to have the nominal amount of memory actually accessible to the job.


System Specifications


What are the hardware specifications of Triton?
See the documentation on hardware beginning at the Hardware Overview page.
What software is available on Triton?
Software additions and updates
  • Latest updates of CentOS 5.3
  • New applications and compilers (partial list)
    • Intel Compilers with MPICH, OpenMPI Libraries
    • StarP for "scalable" Matlab
    • FSA
    • HDF4, HDF5, NetCDF
    • GAMESS, NAMD
    • Additional Python Packages (nose, numpy, scipy)
  • Between-release software area — a new area has been created to "preview" software prior to the next release of Triton. This is located at /home/beta
Also see the documentation on software at the Software Packages page.
What type of jobs are best suited to run on Triton?

Triton is designed to support jobs that require very large amounts of memory. The eight PDAFM nodes each have 512 gigabytes of memory, which is more than most HPC platforms available today anywhere in the research community.

Triton is also best utilized for parallel jobs that use multiple processing cores simultaneously rather than serial jobs that run on single processing cores. However, single-CPU jobs can be run on Triton, either by accessing the "shared" queue or by paying a premium for allocation of multiple processing cores.

To run multiple serial jobs with a single job submission request, see the documentation on bundling at the Bundling Serial Jobs page.


Linking with Libraries


What is an example of linking the FFTW library to my C or Fortran program?

PGI example

SoftwareFFTW
Rollfftw_pgi
Location/opt/pgi/fftw_pgi
Include/opt/pgi/fftw_pgi/include
Lib/opt/pgi/fftw_pgi/lib

PGI Usage

C interface:

pgcc -o fftw-testc fftw-test.c -I/opt/pgi/fftw_pgi/include -L/opt/pgi/fftw_pgi/lib -lfftw3

FORTRAN interface:

pgf90 -o fftw-testf fftw-test.f -I/opt/pgi/fftw_pgi/include -L/opt/pgi/fftw_pgi/lib -lfftw3

Intel example

SoftwareFFTW
Rollfftw_intel
Location/opt/intel/fftw_intel
Include/opt/intel/fftw_intel/include
Lib/opt/intel/fftw_intel/lib

Intel Usage

C interface:

icc -o fftw-testc fftw-test.c -I/opt/intel/fftw_intel/include -L/opt/intel/fftw_intel/lib -lfftw3

FORTRAN interface:

ifort -o fftw-testf fftw-test.f -I/opt/intel/fftw_intel/include -L/opt/intel/fftw_intel/lib -lfftw3

Find out more about Triton compilers on the Compile Jobs and Compiling Parallel Codes pages.


Debugging


Parallel Debuggers on Triton

What parallel debugging software is available on Triton?
Triton has a license for the DDT debugger.

Debugging with DDT

How can I use DDT to debug my parallel codes?
DDT on Triton may be run as follows:
  1. Login to Triton with X11 forwarding turned on (-X option to ssh command)
  2. Put this line in your .bash_profile file

    export DDT_LICENSE_FILE=/home/beta/ddt/License.client

  3. Run this command to reload the current shell environment:

    . ~/.bash_profile

  4. Make sure your code is compiled with optimization turned off by compiling with -O0 (that is capital letter "O" followed by number zero), and symbol table information enabled by compiling with the -g option
  5. Run this command to start the DDT client:

    /home/beta/ddt/bin/ddt

  6. To start a debugging session, from the "Session" menu, select "New Session" and then "Run" from the submenu.
  7. In the "Run" window, enter the full path to the executable in the "Application" field and any command line arguments in the "Arguments" field
  8. In the "Run" window click on "Change" and select the correct "MPI Implementation" or if you encounter a problem while debugging select "generic". Select "none" if you are debugging a serial or non-mpi code.
  9. If you are running an interactive debugging job or plan to attach to a running job, specify a hosts file in the "Attach hosts file" field and add host names to that file, 1 line per host.
  10. To start a job in the queue
    1. click on the "Job Submission" icon in the "Options" window.
    2. click on the "Submit job through queue or configure own 'mpirun' command" check box.
    3. To use the predefined template for a pbs/torque job, click on the folder (browser) icon and select the /home/beta/ddt/templates/pbs.qtf file).
    4. In the "Submit command" field, enter "qsub". Leave the "Regexp for job id" field blank. In the "Cancel command", enter "qdel". In the "Display command" field, enter "qstat".
    5. Normally you would select the "NUM_NODES_TAG and PROCS_PER_NODE" check box, and enter "8" in the "PROC_PER_NODE_TAG" field.
    6. Next click on the "Edit Queue Submission Parameters..." box.
    7. In the queue submission window you can enter the wall clock time limit (in xx:xx:xx format), the queue name and the full path to mpirun if you are using mpi.
    8. Finally click on the "OK" box to return to the run window.
    9. In the run window, select the number of nodes you want to allocate and the select the "SUBMIT" button.
  11. To attach to a running job
    1. from the "Session" menu, select "New session" and then "Attach" from the submenu.
    2. In the field "Filter for process names containing" enter the name of the executable (just the name is sufficient, do not enter the full path).
    3. Based on the the host names in the host file (see step 9), DDT will scan the specified hosts for processes with the given name and attempt to attach to them. If you have submitted a job to the queue, obtain the host list from (for example) checkjob <job number>.

Terms of Use | Privacy