Skip to content
This repository has been archived by the owner on Dec 17, 2024. It is now read-only.

Latest commit

 

History

History

02-examine-data-repository-integration

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

Examine data repository integration

Summary

The Lustre file system utilizes a client software package to mount and interact with the file system. The Lustre client contains the lfs helper utility used to manage aspects of the file system. The lfs helper utility should always be used to interact with the file system instead of native Linux utilities. Using the lfs helper utility you can query the file system object storage targets (OSTs) and metadata targets (MDTs). All file data in Lustre is store on the OST storage volumes. All file metadata including file names, timestamps, permissions, and more is stored on the MDT.

This section will examine Amazon FSx for Lustre data repository integration with Amazon S3. The workshop environment created an FSx for Lustre file system with a data repository integrated with the "nasanex" bucket in the US West (Oregon) region. NASA NEX is a part of the Registry of Open Data on AWS project and is a collection of Earth science datasets maintained by NASA, including climate change projections and satellite images of the Earth’s surface.

Duration

Note
It will take approximately 15 minutes to complete this section.

Step-by-step Guide

Important
Read through all steps below before continuing.

Connect to Linux Instance 0

  1. Open the Amazon EC2 console.

    Tip
    Context-click (right-click) the link above and open the link in a new tab or window to make it easy to navigate between this github workshop and Amazon EC2 console.
    Note
    Make sure you are in the AWS Region of your workshop environment. If you need to change the AWS Region of the Amazon EC2 console, in the top right corner of the browser window click the region name next to Support and click the appropriate AWS Region from the drop-down menu.
  2. Click Instances (running).

  3. Click the check box next to the instance with the name Linux Instance 0.

  4. Click the Connect button.

  5. Connect using AWS Systems Manager - select the Session Manager tab and click the Connect button to open a session.

Examine s3://nasanex data repository integration

Copy, paste, then execute the shell commands below in the Session Manager terminal session of Linux Instance 0 to answer the following questions:

  1. Is the FSx for Lustre file system mounted?

    bash
    mount -t lustre
  2. How long does it take to list the entire file system?

    Note
    The lfs client helper utility is used to work with the Lustre file system.
    time lfs find /fsx > /dev/null
  3. What file types did you see? How many files?

    lfs find /fsx --type f | wc -l
  4. How many directories?

    lfs find /fsx --type d | wc -l
  5. How many small files (< 512 KiB)?

    lfs find /fsx --type f --size -512k | wc -l
  6. How many large files (> 100 MiB)?

    lfs find /fsx --type f --size +100M | wc -l
  7. How many .nc, .hdf, .tif, .gz files?

    lfs find /fsx --type f  | rev | cut -d '.' -f 1 | rev | sort -n | uniq -c | egrep '(nc|hdf|tif|gz)'
  8. How much metadata (MDT) has been loaded into the file system?

    lfs df -h
    Note
    The metadata target (MDT) holds the Lustre file systems directory structure, permissions time stamps, system namespace, and other file system details.

    How much data (all the OSTs) has been loaded into the file system?

    Note
    The object storage targets (OSTs) are the storage volumes where data is stored within the Lustre file system.

    How much data storage capacity is available?

Verify your results

The results of your queries should match the following:

Query Results

Is the FSx for Lustre file system mounted?

10.0.1.193@tcp:/fsx on /fsx type lustre (rw,lazystatfs) (you will have a different IP address)

How long does it take to list the entire file system?

~real 1m10.618s

What file types did you see?

.hdf .nc .gz .tif .json .md5 .txt .pdf

How many files?

373572

How many directories?

42242

How many small files (< 512 KiB)?

23692

How many large files (> 100 MiB)?

169617

How many .nc, .hdf, .tif, .gz files?

.nc = 87002; .hdf = 207552; .tif = 11095; .gz = 42009

How much storage is used by the metadata target (MDT)?

1.9G

How much storage is used by all the object storage targets (OSTs)?

27.M

How much data storage capacity is available?

6.6T

Next section

Click the button below to go to the next section.

load data from repository