Skip navigation links
Saturday, November 21st 2009 11:09:01 AM PST
tcc-3-71.local
This FAQ answers some questions we expect Triton users may have. Please join the Triton Discussion List to ask for details about the policies and services described below. More detailed discussions are also presented on the topic pages of this site. The search function at the top of each page performs a Google Custom Search on this site only. To search the main SDSC site or other UCSD sites, please use search functions on those sites' pages.
A Service Unit is an accounting measure used to calculate the cost of running jobs on Triton. Users receive allocations of SUs, which they may use to purchase or exchange for time to run jobs on the various Triton components. Triton's accounting system automatically deducts SUs from the accounts of users with allocations, based on their login credentials.
Some of Triton's resources are more expensive than others and users are charged a premium for running jobs that use those components.
Before a job can be scheduled, the system verifies available credits in the user account. It does not actually charge the account at this time, but SUs (CPU-hour credits) equal to the estimated charges must be available. The system uses values from the job script to estimate these charges according to the following formulas:
The formula for batch queue requests is:
#CPUs x #nodes x wall time
The formula for large queue requests is:
ChargeFactor x #CPUs x #nodes x wall time
The ChargeFactor for a 256-GB node is 2; for a 512-GB node it is 4.
Here are some examples:
Queue batch: for a single-node job requesting four CPUs with two hours maximum wall clock time, submitted via this script:
#PBS -q batch#PBS -l nodes=1:ppn=4#PBS -l walltime=2:00:00For this request to get scheduled, the account must have available
8 x 1 x 2 = 16 SUs
Batch queue nodes are charged for all eight CPUs regardless of how many are actually requested or used by the job. To be charged only for CPUs actually used, submit to the small queue.
Queue large: for a single-node job requesting four CPUs with two hours maximum wall clock time, submitted via this script:
#PBS -q large#PBS -l nodes=1:ppn=4#PBS -l walltime=2:00:00For this request to get scheduled, the account must have available
2 x 4 x 1 x 2 = 16 SUs
Adding 256GB memory to this request increases the CPUs required. The ChargeFactor becomes 4 due to requesting the large queue.
#PBS -l mem=256GBNote: This memory request will result in actual availability of 252 GB due to system overhead.
For this request to get scheduled, the account must have available
2 x 32 x 1 x 2 = 128 SUs
These jobs could be scheduled on either a 256-GB node or a 512-GB node, depending on availability and system load. If the memory-requesting job runs on a 512-GB node, the ChargeFactor would be 4, but CPUs required would be 16, so the SUs required to schedule remains 128.
Queue large: changing the memory request to 512 GB increases the CPUs required, and guarantees the job will be scheduled on a 512-GB node with 504 GB memory available (and using a ChargeFactor of 4):
#PBS -l mem=512GBFor this request to get scheduled, the account must have available
4 x 32 x 1 x 2 = 256 SUs
When any job finishes, the account gets debited by actual CPU time used (rounded to the nearest half hour).
If a job runs more than five minutes beyond its requested wall time, it will be canceled by the system. Jobs charges that exceed available SUs in the account will not be canceled, but will result in a negative balance that can be credited later.
By default, user accounts are set to charge jobs against a personal account that matches their username. If preferred, the default can be set to charge against another account, such as a project or shared account based on a TAPP or campus allocation.
If a job is submitted and the account is depleted below the estimated SUs needed to run, it will be deferred until the account is replenished. The qstat -f command will report a message similar to:
cannot debit job account - no funds
After the account balance is adjusted, the job will be able to run without being resubmitted. It will go into the idle state when Moab rechecks balances, and then get scheduled normally.
You can check your account balance and status by running gbalance -u <username>. This will show your personal account as well as any other accounts you can charge to.
To charge to an authorized non-default account, include the line
#PBS -A <acct_name>
in your submission script, or include -A <acct_name> in your qsub command. You can charge to any account displayed by gbalance -u <your_username>.
If you are running jobs with Star-P, this option would go into your Star-P configuration file on the machine where you are running the Star-P client. In the Star-P configuration editor, select "Workload Manager Overides". Click on "Extra Arguments" and enter
-A <acct_name>
See the Policies page and the Running Jobs page for more details on job queues.
There is a script that captures much of the qstat output and displays it in a convenient summary. The script is available at
/home/beta/scripts/node_usage
Invoke the script by running the command:
/home/beta/scripts/node_usage/nodes_in_use
It displays output similar the following:
TCC JOBS RUNNING 107 TCC JOBS WAITING 60 PDAF JOBS RUNNING 7 PDAF JOBS WAITING 0 TCC NODES IN USE 217 PDAF NODES IN USE 12 TCC NODES REQUESTED 281 PDAF NODES REQUESTED 0 TCC CORES IN USE 1688 PDAF CORES IN USE 172 AVERAGE NODES/JOB (TCC) 2.0 AVERAGE NODES/JOB (PDAF) 1.7 AVERAGE CORES/JOB (TCC) 15.8 AVERAGE CORES/JOB (PDAF) 24.6
See the Policies page for more details.
You can use the bash wait command. This will allow your submission script (and your charge time) to terminate as soon as the last process completes.
The following example puts two processes in the background, one for 10 seconds, one for 20. The whole process will complete in about 20 seconds, since the two sleep processes run simultaneously.
#!/bin/bash date sleep 10 & sleep 20 & wait date
You can replace sleep xx with your own programs. Your PBS script will finish when the last process completes.
With batch systems, the script you submit to PBS is executed only on the first node allocated to you. Although the system may allocate additional resources to you, your script must start independent jobs on those resources. This is why in parallel batch submission scripts, mpirun (or an equivalent) is used to get codes running on the additional allocated nodes. In a trivial case, you might only request one node and run several single-CPU jobs simultaneously.
Users can request large memory nodes by adding a "memory required" attribute to their job. All memory requests are rounded up to the nearest multiple of 128GB. Large memory nodes are allocated and charged in the following way:
I. Exclusive Access
Memory requests of 256GB or 512GB are scheduled as exclusive-use on nodes. You are charged for 32-cores at the appropriate rate (e.g. 64SU/256GB and 128SU/512GB)
II. Shared Access
128GB and 384GB requests are always allocated on shared access nodes. With Shared Access, there may be resource contention for CPUs, network Bandwidth, or Memory Bandwidth. 128GB requests cost 32SU, 384GB requests cost 96SU.
III. Access to Specific Memory Configurations
The Triton node allocation policy favors PDAF nodes over more expensive
PDAFM nodes, but this will not ensure a job will run on a PDAF node if
a PDAFM node fits the scheduler's plan. To require a specific
large queue node type, use the memory
feature:#PBS -l nodes=1:mem256gb
or
#PBS -l nodes=1:mem512gb
Note: The first example is not the same as specifying:
#PBS -l mem=256GB
which will only
suggest and not require a PDAF node.
Optional Request for 256GB memory requirements A user, at his/her discretion may also flag a submission with "Shared OK" to indicate that a 256GB request may be placed onto a shared access node. The only benefit for the user is that queue times may be shorter because the Shared Access 512-GB nodes can also match the request.
Recommendation: It is highly recommended that users of large memory nodes either request 256GB or 512GB memory allocations. In this way, applications are guaranteed exclusive access to nodes and will have more predictable performance.
Jobs on the PDAF and PDAFM (large memory) nodes are allocated on the basis of processing cores required. There are 32 cores on Triton's PDAF and PDAFM nodes. A 256-GB (PDAF) node has 8 GB of memory per core, and a 512-GB (PDAFM) node has 16 GB per core.
The PDAF and PDAFM nodes are allocated in 128GB chunks (either eight or 16 cores). For a 384-GB job, users must request either 384 GB (24 Cores) or 512 GB (32 Cores). A 384-GB request may result in sharing the node with another process (to consume the remaining 128 GB). For exclusive access to the node, a user must request 512 GB of memory, and would be allocated 32-cores. The charge would be:
3 hours * 32 Cores * Price/Core = 96 * (4 * BaseSU) = 384 BaseSU
For Shared Access to the node, the calculation would be:
3 hours * 24 Cores * (4 * BaseSU) = 288 BaseSU
To run a job on the large memory nodes, your batch script should specify the queue "large":
#PBS -q large
One additional complication: due to overhead, a small amount of memory is consumed by the system from each processing core. Thus, for a 128-GB allocation, only about 126 GB is actually available to the job. Similarly, for a 256-GB allocation only 252 GB is available, and for a 512-GB allocation 504 GB is available. One must request the next highest 128-GB increment (and be allocated and charged as such) in order to have the nominal amount of memory actually accessible to the job.
Triton is designed to support jobs that require very large amounts of memory. The eight PDAFM nodes each have 512 gigabytes of memory, which is more than most HPC platforms available today anywhere in the research community.
Triton is also best utilized for parallel jobs that use multiple processing cores simultaneously rather than serial jobs that run on single processing cores. However, single-CPU jobs can be run on Triton, either by accessing the "shared" queue or by paying a premium for allocation of multiple processing cores.
To run multiple serial jobs with a single job submission request, see the documentation on bundling at the Bundling Serial Jobs page.
| Software | FFTW |
| Roll | fftw_pgi |
| Location | /opt/pgi/fftw_pgi |
| Include | /opt/pgi/fftw_pgi/include |
| Lib | /opt/pgi/fftw_pgi/lib |
C interface:
pgcc -o fftw-testc fftw-test.c -I/opt/pgi/fftw_pgi/include -L/opt/pgi/fftw_pgi/lib -lfftw3
FORTRAN interface:
pgf90 -o fftw-testf fftw-test.f -I/opt/pgi/fftw_pgi/include -L/opt/pgi/fftw_pgi/lib -lfftw3
| Software | FFTW |
| Roll | fftw_intel |
| Location | /opt/intel/fftw_intel |
| Include | /opt/intel/fftw_intel/include |
| Lib | /opt/intel/fftw_intel/lib |
C interface:
icc -o fftw-testc fftw-test.c -I/opt/intel/fftw_intel/include -L/opt/intel/fftw_intel/lib -lfftw3
FORTRAN interface:
ifort -o fftw-testf fftw-test.f -I/opt/intel/fftw_intel/include -L/opt/intel/fftw_intel/lib -lfftw3
Find out more about Triton compilers on the Compile Jobs and Compiling Parallel Codes pages.
export DDT_LICENSE_FILE=/home/beta/ddt/License.client
. ~/.bash_profile
/home/beta/ddt/bin/ddt
