Enable data deduplication

Enable data deduplication

This section will enable data deduplication.

Note

It will take approximately 20 minutes to complete this section.

Important

Read through all steps below and watch the quick video before continuing.

enable-data-dedup.gif

  1. Copy the script below into your favorite text editor.

    $WindowsRemotePowerShellEndpoint = "windows_remote_powershell_endpoint" # e.g. "fs-0123456789abcdef.example.com"
    enter-pssession -ComputerName ${WindowsRemotePowerShellEndpoint} -ConfigurationName FsxRemoteAdmin
    
  2. From the Amazon FSx console, click the link to the STG326 - SAZ file system and select the Network & security tab. Copy the Windows Remote PowerShell Endpoint of the file system to the clipboard (e.g. fs-0123456789abcdef.example.com).

  3. Return to your favorite text editor and replace “windows_remote_powershell_endpoint” with the Windows Remote PowerShell Endpoint of STG326 - SAZ. Copy the updated script.

  4. Go to the remote desktop session for your Windows Instance 0.

  5. Click Start >> Windows PowerShell.

  6. Run the updated script in the Windows PowerShell window.

NOTE: Complete the next few steps using the remote PowerShell session to the FSx file server.

  1. Review the PowerShell function commands for data deduplication available using the Amazon FSx CLI for remote management on PowerShell.

    • Run the command in the Remote Windows PowerShell Session.
    Get-Command *-FSxDedup*
    
  2. What commands are available?

  3. Enable data depduplication for the entire FSx file system.

    • Run the command in the Remote Windows PowerShell Session.
    Enable-FSxDedup
    
  4. Examine your data deduplication environment using the commands in the table below.

Command
Get-FSxDedupConfiguration
Get-FSxDedupStatus
Get-FSxDedupJob
Get-FSxDedupMetadata
Get-FSxDedupSchedule
Measure-FSxDedupFileMetadata -path “D:\share”
  • Were all these commands successful? Why not?

  • When is the next scheduled “Optimization” task?

    1. End the remote PowerShell session. Run Exit-PSSession.

    2. Close the PowerShell window. Run exit.

Create new data deduplication optimization schedule

Important

Read through all steps below and watch the quick video before continuing.

new-data-dedup-schedule.gif

  1. Copy the script below into your favorite text editor.

    $WindowsRemotePowerShellEndpoint = "windows_remote_powershell_endpoint" # e.g. "fs-0123456789abcdef.example.com"
    enter-pssession -ComputerName ${WindowsRemotePowerShellEndpoint} -ConfigurationName FsxRemoteAdmin
    
  2. From the Amazon FSx console, click the link to the STG326 - SAZ file system and select the Network & security tab. Copy the Windows Remote PowerShell Endpoint of the file system to the clipboard (e.g. fs-0123456789abcdef.example.com).

  3. Return to your favorite text editor and replace “windows_remote_powershell_endpoint” with the Windows Remote PowerShell Endpoint of STG326 - SAZ. Copy the updated script.

  4. Go to the remote desktop session for your Windows Instance 0.

  5. Click Start >> Windows PowerShell.

  6. Run the updated script in the Windows PowerShell window.

Important

Complete the next few steps using the remote PowerShell session to the FSx file server.

  1. Create a new data deduplication optimization schedule.

    • Run the command in the Remote Windows PowerShell Session.
    New-FSxDedupSchedule
    
    • Use the table values when prompted.
    Prompt Value
    Name DailyOptimization
    Type Optimization
  2. What time will the optimization start?

  3. Examine the different options available to data deduplication jobs.

    • Run the command in the Remote Windows PowerShell Session.
    Set-FSxDedupSchedule -?
    
  4. Copy the command below into your favorite text editor and update the start_time parameter with the current time plus 2 minutes. Look at the clock in bottom right corner of the remote desktop window. Add 2 minutes to this time and replace the start_time parameter with this value. (i.e. 5:32pm). This time is in UTC.

Set-FSxDedupSchedule -Name DailyOptimization -Start start_time
  • Run the updated command in the Windows PowerShell window.

  • Wait for the time of the DailyOptimization scheduled job to pass (i.e. 1 minute after the start_time you entered above) and Run the command below to check the status.

  • Run the command in the Remote Windows PowerShell Session.

Get-FSxDedupStatus
  1. Did the optimization schedule run?

    • Look at the LastOptimizationTime value of the Get-FSxDedupStatus output.
  2. How many files were optimized and how much space is saved?

    • Find the corresponding Get-FSxDedupStatus output for the command attributes in the table below
    Attribute
    LastOptimizationResult
    OptimizedFilesCount
    OptimizedFilesSavingsRate
    OptimizedFilesSize
    SavedSpace
  3. Do you see any optimization? Why not?

  4. Quickly read the Enabling data deduplication section of the Amazon FSx for Windows File Server User Guide to find the answer.

    • Run the command in the Remote Windows PowerShell Session.
    Get-FSxDedupConfiguration
    
  5. What is the MinimumFileAgeDays attribute value?

  6. Update the data deduplication configuration and set the minimum file age days attribute to 0.

    • Run the command in the Remote Windows PowerShell Session.
    Set-FSxDedupConfiguration -MinimumFileAgeDays 0
    
  7. Update the DailyOptimization data deduplication schedule to Run in 2 minutes.

  8. Copy the command below into your favorite text editor and update the start_time parameter with the current time plus 2 minutes. Look at the clock in bottom right corner of the remote desktop window. Add 2 minutes to this time and replace the start_time parameter with this value. (i.e. 5:32pm)

Set-FSxDedupSchedule -Name DailyOptimization -Start start_time
  • Run the updated command in the Remote Windows PowerShell Session.

  • Wait for the time of the DailyOptimization scheduled job to pass (i.e. 1 minute after the start_time you entered above) and Runthe command below to check on the status.

  • Run the command in the Remote Windows PowerShell Session.

Get-FSxDedupStatus
  1. Did the optimization schedule run?

    • Look at the LastOptimizationTime value of the Get-FSxDedupStatus output.
  2. The active data deduplication job may still be running. run the following command in the Remote Windows PowerShell Session to check on the status of the data deduplication job.

    Get-FSxDedupJob
    
  3. Continue to re-Runthe Get-FSxDedupJob command every few minutes to check on the status of the job. This may take 5-10 minutes depending on the amount of data you creating during the test performance section.

  4. Continue with the tutorial while the data deduplication job runs in the background.

  5. If the Get-FSxDedupJob command returns an error, then there are no more active jobs and the job has completed.

  6. Run the command in the Remote Windows PowerShell Session.

Get-FSxDedupStatus
  1. How many files were optimized and how much space is saved?

    • Find the corresponding Get-FSxDedupStatus output for the command attributes in the table below.
    Attribute
    LastOptimizationResult
    OptimizedFilesCount
    OptimizedFilesSavingsRate
    OptimizedFilesSize
    SavedSpace
  2. End the remote PowerShell session. Run Exit-PSSession.

  3. Close the PowerShell window. Run exit.