Atlas Status Update Chris Fuson Atlas Update Timeline

  • Slides: 6
Download presentation
Atlas Status Update Chris Fuson

Atlas Status Update Chris Fuson

Atlas Update - Timeline • March 06, 2014 – Installed patches that targeted memory

Atlas Update - Timeline • March 06, 2014 – Installed patches that targeted memory contention on the meta data server to address server side performance problems • February 26, 2014 – Installed patch to reduce impact of close operations to address server side meta data performance problems • February 10, 2014 – Titan’s Lustre client rolled back to 1. 8. 6 to address client side performance problems • January 28, 2014 – Titan’s Lustre client upgraded to 2. 4. Un-mounted Widow. • January 10, 2014 – As the user load from this transition increased, we began to see problems with both the Lustre server and client (compute node) performance • January 07, 2014 – Widow[1 -3] became read-only • December 05, 2013 – Atlas was mounted on all OLCF systems, announced, and opened for use 2

Atlas Update - Current • Following the March 06, 2014 change to reduce memory

Atlas Update - Current • Following the March 06, 2014 change to reduce memory contention on the metadata server, we continue to see qualified improvements in the interaction with Atlas. • Improvements have been substantial for several applications that were negatively affected before. • We encourage users to continue testing their application performance in light of these changes and report their results. • We will continue to pursue the remaining issues, and will intentionally address them outside of the production environment as to minimize further interruption to the Atlas file systems. • Your feedback is incredibly valuable. Please continue to report problems related to the file system, including any specific timings for I/O operations, to help@olcf. ornl. gov. 3

Atlas Update – Stripe Count Warning • Warning: Stripe Counts Greater than 160 Not

Atlas Update – Stripe Count Warning • Warning: Stripe Counts Greater than 160 Not Currently Supported • Warning: “-1” should NOT be used while setting up striping patterns • The 1. 8 Lustre clients running on Titan do not support stripe counts greater than 160. Interaction from Titan (including ‘lfs getstripe’) with files that have a stripe greater than 160 is problematic. 1004> lfs was setstripe test. file • titan-ext 3 If ‘lfs setstripe’ used-cto-1 set the stripe of a directory or file and titan-ext 3 lfswas getstripe | grep stripe_count the stripe 1005> count set totest. file a value greater than 160 or ‘-1′, you lmm_stripe_count: 1008 should reduce the stripe value. *** glibc detected *** lfs: munmap_chunk(): invalid pointer: 0 x 0000067 fed 0 *** • Please note the stripe count is only an issue on Titan; the count is not an issue on Eos, Rhea, or the Data Transfer Nodes due to the more recent Lustre client version in use on those systems. 4

Atlas Update – Reduce Stripe Count • Create new directory with reduced striping •

Atlas Update – Reduce Stripe Count • Create new directory with reduced striping • Copy data into new directory – cp for small data amounts – dcp from the Data Transfer Nodes for larger amounts of data dtn 04 115> mkdir New. Dir dtn 04 116> lfs setstripe -c 128 New. Dir dtn 04 117> cp test. file New. Dir/. dtn 04 118> lfs getstripe New. Dir/test. file | grep stripe_count lmm_stripe_count: 128 dtn 04 119> 5

Questions? • More information: – www. olcf. ornl. gov/kb_articles/atlas-update/ – www. olcf. ornl. gov/kb_articles/lustre-basics/

Questions? • More information: – www. olcf. ornl. gov/kb_articles/atlas-update/ – www. olcf. ornl. gov/kb_articles/lustre-basics/ • Email: – help@olcf. ornl. gov 6