Skip navigation links
Friday, January 4th 2013 11:25:01 PM PST
Triton Resource supports four queues for job submission. Users must have an account through either the TAPP or a specific project, in order that the accounting system be able to charge the job time.
The default charge is per processing core per hour; if a node is allocated, all the cores on that node will be charged regardless of whether or not the job actually uses them. The base SU is calibrated off of the TCC nodes, which run at 2.4 GHz. Each node has eight such cores.
If PDAFM nodes are specified, the charge has a premium of two times the base rate. These nodes have 32 cores and run at 2.5 GHz.
The following general policies are in effect for scheduling jobs on Triton:
The table below provides details about available queues and charges.
By default, user accounts are set to charge jobs against a personal account that matches their username. If preferred, the default can be set to charge against another account, such as a project or shared account based on a TAPP or campus allocation.
If a job is submitted and the account is depleted below the estimated SUs needed to run, it will be deferred until the account is replenished. The qstat -f command will report a message similar to:
cannot debit job account - no funds
After the account balance is adjusted, the job will be able to run without being resubmitted. It will go into the idle state when the scheduler rechecks balances, and then get scheduled normally.
You can check your account balance and status by running gbalance -u <username>. This will show your personal account as well as any other accounts you can charge to.
For more information on Triton accounts, please see the FAQ Accounts section.
To specify the account to be charged, use the -A option. It is recommended to use this option with all job submission scripts and qsub commands, to clearly indicate which account the user wants to be charged for the job.
#PBS -A <account name>Memory requests are for all nodes combined. Node and core requests are per-node.
#PBS -l nodes=2:ppn=16
#PBS -l mem=1008gb
If submitted to the large queue, the above request would not be deferred, since it is possible to match the request with existing resources. However, it would result in a charge factor of 128 (32 cores x 2 nodes x 2) because it must be scheduled on the PDAFM nodes (the only way to satisfy 1008GB on two nodes, which blocks all 32 cores on each node).
If the above request asked for two nodes and 2016GB, it would be reduced to 1008GB, since the system cannot provide more than 504GB per node.
A special case of this example involves requests of more than 20 PDAF nodes. In this case, a combination of PDAF (256GB) and PDAFM (512GB) nodes is required. The scheduler is configured to require that the memory demand is satisfied by the smaller (256GB) nodes for all requested nodes, even though some of the allocated nodes would have 512GB.
Requests for resources exceeding the available maximums will be deferred and retried by the scheduler. After a limited number of retries, they will be put on hold and require administrator intervention.
A request for an interactive queue made between 8 p.m and 8 a.m. on weeknights will be deferred until the next 8 a.m.-8 p.m. weekday window and then scheduled.
Requests for more than the maximum number of nodes will not be rejected, as the scheduler makes no assumptions regarding future node availability. Requests that do not specify a memory size will be given the default amount of memory per node (see table below).
| Queue | Cluster | Nodes in Queue (node Max) |
Cores per Node (ppn Max) | Charge Premium | Hours Available | Max Node Memory | Default Node Memory | Max Queue Memory |
|---|---|---|---|---|---|---|---|---|
| batch | TCC | 246 | 8 | none | 24x7 | 24GB | 24GB | 5904GB |
| small (shared) | TCC | variable | 8 | none | 24x7 | 24GB | 24GB | 960GB |
| Queue | Cluster | Nodes in Queue (node Max) |
Cores per Node (ppn Max) | Charge Premium | Hours Available | Max Node Memory | Default Node Memory | Max Queue Memory |
|---|---|---|---|---|---|---|---|---|
| large | PDAF | 19 | 32 | 1x | 8 a.m. to 8 p.m. PT Monday through Friday | 256GB | 128GB | 4864GB |
| large | PDAFM | 7 | 32 | 2x | 8 a.m. to 8 p.m. PT Monday through Friday | 512GB | 128GB | 3584GB |
| express (interactive) | PDAF | 1 | 32 | 1x | 8 a.m. to 8 p.m. PT Monday through Friday | 256GB | 128GB | 256GB |
| express (interactive) | PDAFM | 1 | 32 | 2x | 8 a.m. to 8 p.m. PT Monday through Friday | 512GB | 128GB | 512GB |
| Queue | Cluster | Nodes in Queue (node Max) |
Cores per Node (ppn Max) | Charge Premium | Hours Available | Max Node Memory | Default Node Memory | Max Queue Memory |
|---|---|---|---|---|---|---|---|---|
| large | PDAF | 20 | 32 | 1x | 8 p.m. to 8 a.m. PT Monday through Friday and all weekend | 256GB | 128GB | 5120GB |
| large | PDAFM | 8 | 32 | 2x | 8 p.m. to 8 a.m. PT Monday through Friday and all weekend | 512GB | 128GB | 4096GB |
SUs are charged at the rate of the number of processing cores per hour that are allocated to a job, regardless of the number of cores actually used. The value is rounded to the nearest hour after multiplying the cores and node premium (described below). See the Running Jobs page for more information on how to submit jobs to each queue.
No Premium on PDAF PDAF (256 GB) nodes are charged at the same rate as TCC node cores; one hour of use incurs a charge of 32 SUs (32 cores x 1).
2x Premium on PDAFM PDAFM (512 GB) nodes are charged at two times the rate of TCC node cores; one hour incurs a charge of 64 SUs (32 cores x 2).
A job that specifically requests PDAF (see example below) may actually get scheduled on a PDAFM node, but it will be charged at the lower PDAF rate.
The small queue will be charged "per core", not per node as all the others are, but the node must be shared if other jobs can use it; there are a variable number of nodes in this queue, depending on demand.
The large queue policy for these nodes is based on the amount of memory requested, and charges are proportional to the relative number of CPUs on the node. For example, if a job requests 126 GB and one node, the charge will be for 16 cores (half of the cores on a 256-GB node). If a job requests 504 GB and two nodes, it will be charged for 64 cores (two entire 256-GB nodes). To get 32 cores on two 512-GB nodes (16 cores on each node), the job would need to request 16 processors per node. The user would still be charged for two full nodes.
The Triton node allocation policy favors PDAF nodes over more expensive
PDAFM nodes. If a job requests 252 GB, it will only be charged for the PDAF rate if the full
complement of 32 processors is also requested. To request a specific
large queue node type, use the memory attribute:
#PBS -q large
#PBS -l nodes=1:ppn=32
and either
#PBS -l mem=252gb
or
#PBS -l mem=504gb
The express queue is only available between 8 a.m and 8 p.m. Pacific Time Monday through Friday; this queue is intended for interactive use, and users may only have one job running at a time and no more than two jobs waiting.
This table shows the main factors determining how account charges are generated for each of the main queues and node types of the Triton Resource.
| Queue | Memory (determining factor) | Allocated Cores (on single node of this type) |
Charge Factor | SUs Charged (per CPU-hour) |
|---|---|---|---|---|
| *large | 128gb | 16 (PDAF) | 1x | 16 |
| large | 128gb | 8 (PDAFM) | 2x | 16 |
| large | 256gb | 32 (PDAF) | 1x | 32 |
| large | 256gb | 16 (PDAFM) | 2x | 32 |
| large | 384gb | 24 (PDAFM) | 2x | 48 |
| large | 512gb | 32 (PDAFM) | 2x | 64 |
*Note: Large queue requests can be scheduled on either a PDAF or PDAFM node. To explicitly request time on a 256-GB (PDAF) node, use the memory attribute:
#PBS -q large
#PBS -l nodes=1:ppn=32
#PBS -l mem=252gb
To explicitly request time on a 512-GB (PDAFM) node, request > 252 GB/node or use the memory attribute:
#PBS -q large
#PBS -l nodes=1:ppn=32
#PBS -l mem=504gb
| Queue | Node Type | PPN (Requested) | Node Count | Allocated Cores (determining factor) |
SUs Charged (per CPU-hour) |
|---|---|---|---|---|---|
| batch | TCC | 1 | 1 | 8 | 8 |
| batch | TCC | Not specified | 1 | 8 | 8 |
| batch | TCC | Not specified | 2 | 16 | 16 |
| Queue | Node Type | PPN (Requested) | Allocated Cores (determining factor) |
SUs Charged (per CPU-hour) |
|---|---|---|---|---|
| *small | TCC | 1 | 1 | 1 |
| small | TCC | 4 | 4 | 4 |
*Note: Small queue requests must share the node with other jobs requesting the small queue. There will be contention for memory, network, and disk space available to the node when sharing with other jobs.
Before a job can be scheduled, the system verifies available credits in the user account. It does not actually charge the account at this time, but SUs (CPU-hour credits) equal to the estimated charges must be available. The system uses values from the job script to estimate these charges according to the following formulas:
The formula for batch queue requests is:
#CPUs x #nodes x wall time
The formula for large queue requests is:
ChargeFactor x #CPUs x #nodes x wall time
The ChargeFactor for a 256-GB node is 1; for a 512-GB node it is 2.
Here are some examples:
Queue batch: for a single-node job requesting four CPUs with two hours maximum wall clock time, submitted via this script:
#PBS -q batch
#PBS -l nodes=1:ppn=4
#PBS -l walltime=2:00:00
For this request to get scheduled, the account must have available
8 x 1 x 2 = 16 SUs
Batch queue nodes are charged for all eight CPUs regardless of how many are actually requested or used by the job. To be charged only for CPUs actually used, submit to the small queue.
Queue large: for a single-node job requesting four CPUs with two hours maximum wall clock time, submitted via this script:
#PBS -q large
#PBS -l nodes=1:ppn=4
#PBS -l walltime=2:00:00
Note: PDAF/M requests will be allocated either 16 or 32 cores per node. No smaller CPU allocations are supported.
For this request to get scheduled, the account must have available
1 x 16 x 1 x 2 = 32 SUs
Adding 252GB memory to this request increases the CPUs required. The #CPUs becomes 32 due to requesting entire memory of the node.
#PBS -l mem=252gbFor this request to get scheduled, the account must have available
1 x 32 x 1 x 2 = 64 SUs
These jobs could be scheduled on either a 256-GB node or a 512-GB node, depending on availability and system load. If the memory-requesting job runs on a 512-GB node, the ChargeFactor would be 2, but CPUs required would only be 16 (because only half of the PDAFM node's memory is required), so the SUs required to schedule remains 64.
2 x 16 x 1 x 2 = 64 SUs
Queue large: changing the memory request to 504 GB increases the CPUs required, and guarantees the job will be scheduled on a 512-GB node with 504 GB memory available (and using a ChargeFactor of 2):
#PBS -l mem=504gbFor this request to get scheduled, the account must have available
2 x 32 x 1 x 2 = 128 SUs
When any job finishes, the account gets debited by actual CPU time used (rounded to the nearest half hour).
If a job runs more than five minutes beyond its requested wall time, it will be canceled by the system. Jobs charges that exceed available SUs in the account will not be canceled, but will result in a negative balance that can be credited later.
Open a Ticket with Triton Resource Support using the Support Ticket Form.
Join the Discussion Forum Sign up for our Email Discussion List.
FAQ Read the FAQ Page.
