UCSD Logo UCSD Logo For Printing Skip navigation links

Navigation

UCSD Triton Resource @ SDSC

Quick Status

Triton Resource Node Status

Saturday, November 7th 2009 07:01:01 AM PST


TCC Rack 3 Nodes Down (1)

tcc-3-1.local

Total TCC Nodes Up: 250

Total 256GB (PDAF) Nodes Up: 20

Total 512GB (PDAFM) Nodes Up: 8

Rack 2 Up Count: 80

Rack 3 Up Count: 79

Rack 4 Up Count: 11

Rack 5 Up Count: 80

The Triton Resource provides easily accessible, affordable, high-performance and data-intensive compute resources to UCSD researchers, faculty, affiliates, government and commercial partners through innovative, locally supported, scalable hardware and software over multiple 10-gigabit networks extending from campus laboratories to the UC network, the state, and the entire nation.

Section Navigation

Features: Jobs

Access

System

Triton Resource Certified for Production

Triton Availability Update

Current Daily Status Update: Saturday, November 7, 2009


Early Adopter Phase Ended October 5

Production Phase Announcement: The full production phase of the Triton Resource began on Monday, October 5, 2009. The Early Adopter phase ended at that time.

What this means for users

Triton's migration to the charged-for usage model was completed on Monday, October 5 with the implementation of the usage accounting service. Early Adopter accounts are no longer being created or renewed, and TAPP or project allocations are now required to run jobs. This marks the beginning of the full production phase of Triton.

If an allocation runs out of SUs, TAPP procedures should be followed to extend or renew the account. Triton system administrators will not be authorized to replenish accounts the way they did during the Early Adopter phase.

Refunds for certain failed jobs and system errors will be considered on a case-by-case basis. Please direct requests to the discussion list.

Users can discover what their calculations will cost and view their usage statements by running the mybalance and gstatement -u $USER commands to see the status of their accounts.

Details on the latest changes and policy decisions can be found on the following FAQ pages:

Maintenance History (Partial)

RC2 System Software Upgrade 18 Sept 2009

Both TCC and PDAF were upgraded to Release Candidate 2 in preparation for full production usage and accounting. The system will remain in Early Adopter mode for about two weeks. The upgrade maintenance went smoothly and required about six hours of downtime to update the login node and all compute nodes.

Security Patch 14 Aug 2009

A security patch was applied to Triton on August 14 between 2 and 3 PM PDT. This patch was necessary to close a local privilege security vulnerability first reported on August 11. The RHEL patch became available on August 14 and was installed on the Triton login node almost immediately. Details of the patch are available on Bugzilla. The login node update began at approximately 14:20 PDT and was completed by about 15:10 PDT.

Completion of this security patch accomplished the following:

  • Closed the security hole
  • Did not affect running and queued batch jobs
  • Allowed addition of the Bio roll to Triton
  • Batch nodes were updated to the same level at the first opportunity after the login node was updated
Cluster Reinstallation 23 Jul 2009 (TRITON_RC1)

A full reinstallation of Triton was performed on July 23, and completed within the expected 2-3 hour window, after which Triton was again running normally.

The cluster's public IP addresses were changed during this maintenance. The IP address of the login node was changed to 132.249.122.43.

User home data areas were restored intact.

Mirage Installation 20 Jul 2009

During a planned outage on July 20, the Mirage Lustre servers were physically moved to a new rack and new power. Two dead LUNs were also recovered so that all 100 storage targets are currently available. Mirage is now mounted on the login node and all compute nodes. All 100 TB are now available on /mirage.

General Status of Triton Resource

The Triton Resource is in full production. TRITON_RC2 (Release Candidate 2) is installed, and full job accounting is in effect.

This site will be kept up-to-date as node statuses change, or when the system has a scheduled maintenance. Currently, all of the nodes are in service and available via the scheduler. When nodes undergo unplanned maintenance, this site will be updated and messages will be posted on the discussion list and Triton's Twitter feed.

Early Adopter User Accounts

Early Adopter accounts were reset with a complimentary 1000 SU balance on October 5. You can contact discussion mailman list (triton-discuss@sdsc.edu) with usage and general access questions. You can join the list here.

Triton's exceptional data-intensive computing power is now available to the University of California HPC research community.

If you have an account and are ready to access to the Triton Resource, please visit the User Access page for details and to obtain login information. For information on first-time logins to the Triton Resource, please read the New User page. To request an account, please use TAPP.

To read about the current hardware status and get details of the system building process, read the Triton Resource blog.

Triton Full Production as of 5 Oct

Triton's compute components moved to production on October 5, 2009. Early Adopters helped to identify software needs and support requirements starting in July. Users and potential users are encouraged to continue sending feedback and suggestions to the Triton support team.

The 28 large-memory nodes of the PDAF provide some of the most extensive data analysis power available commercially or at any research institution in the country. The cluster includes four special nodes dedicated to database server interaction.

The 256-node TCC is a Rocks cluster with 24 gigabytes of memory and eight processing cores on each node.

Normal Triton Access via TAPP

For general and long-term access to Triton Resource, users are asked to request an allocation through the Triton Affiliates and Partners program, or TAPP. Once the Early Adopter phase is completed and Triton is in full production, this will be the primary way for users to gain access to Triton for running jobs and conducting research.

Triton in Full Production

The Triton compute resources are now in full production. The resource received its production certification with the deployment and acceptance of TRITON_RC2 on October 5, 2009.

All 28 PDAF nodes and all 256 TCC nodes are generally available. One or more of the compute nodes are occasionally set aside for staff testing and development of OS and software packages.

Contact Us

Open a Ticket with Triton Resource Support using the Support Ticket Form.

Join the Discussion Forum Sign up for our Email Discussion List.

Follow Triton on Twitter

FAQ Read the FAQ Page.

Terms of Use | Privacy