UCSD Logo UCSD Logo For Printing Skip navigation links

Navigation

UCSD Triton Resource @ SDSC

Quick Status

Triton Resource Node Status

Saturday, November 21st 2009 02:05:01 PM PST


TCC Rack 3 Nodes Down (1)

tcc-3-71.local

Total TCC Nodes Up: 247

Total 256GB (PDAF) Nodes Up: 20

Total 512GB (PDAFM) Nodes Up: 8

Rack 2 Up Count: 80

Rack 3 Up Count: 77

Rack 4 Up Count: 11

Rack 5 Up Count: 79

Charge Policies for Triton Compute Jobs

Overview of Triton Accounting

Triton Resource supports four queues for job submission. Users must have an account through either the TAPP or a specific project, in order that the accounting system be able to charge the job time.

The default charge is per processing core per hour; if a node is allocated, all the cores on that node will be charged regardless of whether or not the job actually uses them. The base SU is anchored to the TCC nodes, which run at 2.4 GHz. Each node has eight such cores.

If PDAF/M nodes are specified, the charge has a premium of two or four times the base rate, depending on the node's memory capacity. In addition, these nodes have 32 cores and run at 2.5 GHz.

The table below provides details about available queues and their associated charges.

Job Charging on Triton

Memory requests are for all nodes combined. Node and core requests are per-node.

#PBS -l nodes=2:ppn=16
#PBS -l mem=1024GB

If submitted to the large queue, the above request would not be deferred, since it is possible to match the request with existing resources. However, it would result in a charge factor of 256 (32 cores x 2 nodes x 4) because it must be scheduled on the PDAFM nodes (the only way to satisfy 1024GB on two nodes, which blocks all 32 cores on each node).

If the above request asked for two nodes and 2048GB, it would be reduced to 1024GB, since the system cannot provide more than 512GB per node.

A special case of this example involves requests of more than 20 PDAF nodes. In this case, a combination of PDAF (256GB) and PDAFM (512GB) nodes is required. The scheduler is configured to require that the memory demand is satisfied by the smaller (256GB) nodes for all requested nodes, even though some of the allocated nodes would have 512GB.

Requests for resources exceeding the available maximums will be deferred and retried by the scheduler. After a limited number of retries, they will be put on hold and require administrator intervention.

A request for an interactive queue made between 8 p.m and 8 a.m. on weeknights will be deferred until the next 8 a.m.-8 p.m. weekday window and then scheduled.

Requests for more than the maximum number of nodes will not be rejected, as the scheduler makes no assumptions regarding future node availability. Requests that do not specify a memory size will be given the default amount of memory per node (see table below).

Job Queues Available Any Time

QueueClusterNodes in Queue
(node Max)
Cores per Node
(ppn Max)
Charge Premium Hours AvailableMax Node MemoryDefault Node Memory Max Queue Memory
batchTCC2468none 24x724GB24GB5904GB
small (shared)TCC108none 24x724GB24GB240GB

Job Queues Only Available on Weekdays

QueueClusterNodes in Queue
(node Max)
Cores per Node
(ppn Max)
Charge Premium Hours AvailableMax Node MemoryDefault Node Memory Max Queue Memory
largePDAF19322x 8 a.m. to 8 p.m. PT Monday through Friday256GB 128GB4864GB
largePDAFM7324x 8 a.m. to 8 p.m. PT Monday through Friday512GB 128GB3584GB
express (interactive)PDAF1322x 8 a.m. to 8 p.m. PT Monday through Friday256GB 128GB256GB
express (interactive)PDAFM1324x 8 a.m. to 8 p.m. PT Monday through Friday512GB 128GB512GB

Job Queues Available on Nights and Weekends

QueueClusterNodes in Queue
(node Max)
Cores per Node
(ppn Max)
Charge Premium Hours AvailableMax Node MemoryDefault Node Memory Max Queue Memory
largePDAF20322x 8 p.m. to 8 a.m. PT Monday through Friday and all weekend256GB 128GB5120GB
largePDAFM8324x 8 p.m. to 8 a.m. PT Monday through Friday and all weekend512GB 128GB4096GB

2x Premium on PDAF PDAF (256 GB) nodes are charged at twice the rate of TCC node cores; one hour of use incurs a charge of 64 SUs (32 cores x 2).

4x Premium on PDAFM PDAFM (512 GB) nodes are charged at four times the rate of TCC node cores; one hour incurs a charge of 128 SUs (32 cores x 4).

The small queue will be charged "per core", not per node as all the others are, but the node must be shared if other jobs can use it; there are 10 nodes in this queue.

The large queue policy for these nodes is based on the amount of memory requested, and charges are proportional to the relative number of CPUs on the node. For example, if a job requests 128 GB and one node, the charge will be for 16 cores (half of the cores on a 256-GB node). If a job requests 512 GB and two nodes, it will be charged for 64 cores (two entire 256-GB nodes). To get 32 cores on two 512-GB nodes (16 cores on each node), the job would need to request 16 processors per node. The user would still be charged for two full nodes.

The Triton node allocation policy favors PDAF nodes over more expensive PDAFM nodes, but this will not ensure a job will run on a PDAF node if a PDAFM node fits the scheduler's plan. To require a specific large queue node type, use the memory feature:

#PBS -l nodes=1:mem256gb
or
#PBS -l nodes=1:mem512gb

Note: The first example is not the same as specifying:

#PBS -l mem=256GB

which will only suggest and not require a PDAF node.

The express queue is only available between 8 a.m and 8 p.m. Pacific Time Monday through Friday; this queue is intended for interactive use, and users may only have one job running at a time and no more than two jobs waiting.

Job Charging Examples

Before a job can be scheduled, the system verifies available credits in the user account. It does not actually charge the account at this time, but SUs (CPU-hour credits) equal to the estimated charges must be available. The system uses values from the job script to estimate these charges according to the following formulas:

The formula for batch queue requests is:

#CPUs x #nodes x wall time

The formula for large queue requests is:

ChargeFactor x #CPUs x #nodes x wall time

The ChargeFactor for a 256-GB node is 2; for a 512-GB node it is 4.

Here are some examples:

  1. Queue batch: for a single-node job requesting four CPUs with two hours maximum wall clock time, submitted via this script:

    #PBS -q batch
    #PBS -l nodes=1:ppn=4
    #PBS -l walltime=2:00:00

    For this request to get scheduled, the account must have available

    8 x 1 x 2 = 16 SUs

    Batch queue nodes are charged for all eight CPUs regardless of how many are actually requested or used by the job. To be charged only for CPUs actually used, submit to the small queue.

  2. Queue large: for a single-node job requesting four CPUs with two hours maximum wall clock time, submitted via this script:

    #PBS -q large
    #PBS -l nodes=1:ppn=4
    #PBS -l walltime=2:00:00

    For this request to get scheduled, the account must have available

    2 x 4 x 1 x 2 = 16 SUs

    Adding 256GB memory to this request increases the CPUs required. The ChargeFactor becomes 4 due to requesting the large queue.

    #PBS -l mem=256GB

    Note: This memory request will result in actual availability of 252 GB due to system overhead.

    For this request to get scheduled, the account must have available

    2 x 32 x 1 x 2 = 128 SUs

    These jobs could be scheduled on either a 256-GB node or a 512-GB node, depending on availability and system load. If the memory-requesting job runs on a 512-GB node, the ChargeFactor would be 4, but CPUs required would be 16, so the SUs required to schedule remains 128.

  3. Queue large: changing the memory request to 512 GB increases the CPUs required, and guarantees the job will be scheduled on a 512-GB node with 504 GB memory available (and using a ChargeFactor of 4):

    #PBS -l mem=512GB

    For this request to get scheduled, the account must have available

    4 x 32 x 1 x 2 = 256 SUs

When any job finishes, the account gets debited by actual CPU time used (rounded to the nearest half hour).

If a job runs more than five minutes beyond its requested wall time, it will be canceled by the system. Jobs charges that exceed available SUs in the account will not be canceled, but will result in a negative balance that can be credited later.

Contact Us

Open a Ticket with Triton Resource Support using the Support Ticket Form.

Join the Discussion Forum Sign up for our Email Discussion List.

Follow Triton on Twitter

FAQ Read the FAQ Page.

Terms of Use | Privacy