The discussion forums in the XSEDE User Portal are for users to share experiences, questions, and comments with other users and XSEDE staff. Visitors are welcome to browse and search, but you must login to contribute to the forums. While XSEDE staff monitor the lists, XSEDE does not guarantee that questions will be answered. Please note that the forums are not a replacement for formal support or bug reporting procedures through the XSEDE Help Desk. You must be logged in to post to the user forums.

« Back

I/O 3x Slower

Combination View Flat View Tree View
Threads [ Previous | Next ]
toggle
I/O 3x Slower
scratch
Answer
4/26/16 4:56 AM
Everything seems to be the same, but I/O is taking 3x longer - why might that be?


I have a series of tests I ran on a system in December. The program hasn't changed since then and, as far as I know, the data hasn't either; I have sole user-level access to the data.

The directory containing the data came from `mkdir` without any special striping permissions and contains the 13,389 files of 39MB each comprising the dataset.

The directory is located within: /oasis/scratch/comet/$USER/temp_project

I ran the dataset again today and it posted a much longer time-to-completion.

Calculation times are the same to within 3%.

I/O time has increased from 24815s (total) to 680862s (total), or 2.7x longer. The writing portion seems to be taking disproportionately longer: 2.7x longer versus 2.1x longer.

I have a similar data set of 1023 files of 468MB each comprising a second dataset. In this case I/O is taking 1.1x longer (8272s versus 7497s).

The only real hypothesis I have is that all my files are on the same drive (not sure what the default policy is) or a set of drives and someone else happens to be making heavy use of those.

Is there a way to verify this?

Could something else be the cause?

RE: I/O 3x Slower
Answer
4/26/16 9:58 PM as a reply to Richard Barnes.
From your description it looks like you have lot of small files (13K+). I think you are likely hitting a meta data bottleneck and thats most likely due to just a much higher user load now. Lustre just has two metadata servers and all the meta data load is going through them. December was pretty lightly loaded and now we are running at 85-90%. Also, the load on the Lustre filesystem is higher too. On the larger files, the I/O is more bandwidth limited (meta data is fine because you only have 1023 files) and hence the lower impact.

Given that the files are small ( 39MB ), it might be best to try this run out of the local scratch on our compute nodes. On the normal compute nodes we have close to 225GB of SSD space available local to each node. The IOPs you get should be much much higher and help on this kind of I/O. We also have some nodes with 1.5TB of SSD space if needed. Can you give us more info on the code and how it does I/O. I can set up the scripts for you to use the local storage. This might be better done via a ticket so please send an email to help@xsede.org with the details.

Thanks,
Mahidhar