How to Count Duplicate Lines in a Text File Using PowerShell?

Recently, I was working on some text files where I was required to count the number of duplicate lines using PowerShell. I have tried different methods to do this. In this tutorial, we’ll explore different ways to count duplicate lines in a text file using PowerShell.

Method 1: Using Group-Object

PowerShell provides the Group-Object cmdlet to count duplicate lines in a text file. This method reads the file, groups the lines, and then counts the occurrences of each line.

Here is a complete script.

# Path to the text file
$filePath = "C:\MyFolder\MyExample.txt"

# Read the file and group by each line
$lineGroups = Get-Content $filePath | Group-Object

# Display the results
$lineGroups | ForEach-Object {
    [PSCustomObject]@{
        Line = $_.Name
        Count = $_.Count
    }
} | Sort-Object -Property Count -Descending
  • Get-Content $filePath reads the content of the file.
  • Group-Object groups identical lines together.
  • ForEach-Object iterates over each group to create a custom object with the line and its count.
  • Sort-Object -Property Count -Descending sorts the results by the count in descending order.

I executed the above script, and you can see it gave me a detailed report like in the screenshot below:

Count Duplicate Lines in a Text File Using PowerShell

Check out How to Find and Remove Empty Folders Using PowerShell?

Method 2: Using a Hashtable

Using a hashtable is another efficient way to count duplicate lines in a text file in PowerShell. This method involves iterating through each line and updating the count in a hashtable.

Here is a complete example; have a look at it:

# Path to the text file
$filePath = "C:\MyFolder\MyExample.txt"

# Initialize a hashtable to store line counts
$lineCounts = @{}

# Read the file and count each line
Get-Content $filePath | ForEach-Object {
    if ($lineCounts.ContainsKey($_)) {
        $lineCounts[$_]++
    } else {
        $lineCounts[$_] = 1
    }
}

# Display the results
$lineCounts.GetEnumerator() | ForEach-Object {
    [PSCustomObject]@{
        Line = $_.Key
        Count = $_.Value
    }
} | Sort-Object -Property Count -Descending
  • @{} initializes an empty hashtable.
  • Get-Content $filePath | ForEach-Object reads the file and iterates through each line.
  • The script checks if the line exists in the hashtable. If it does, it increments the count; otherwise, it adds the line with a count of 1.
  • GetEnumerator() retrieves the key-value pairs from the hashtable.
  • The results are sorted by count in descending order.

You can see the output in the screenshot below:

How to Count Duplicate Lines in a Text File Using PowerShell

Method 3: Using a Custom Function

You can also create a custom function to count duplicate lines in a text file. The function reads the content of the file, groups the lines, and then filters out the groups where the count of lines is greater than one. This indicates that the line is duplicated in the file.

Here is the complete script:

function Get-DuplicateLines {
    param (
        [string]$filePath
    )

    # Check if the file exists
    if (-Not (Test-Path $filePath)) {
        Write-Error "The file path '$filePath' does not exist."
        return
    }

    # Read the file and group by each line
    $lineGroups = Get-Content $filePath | Group-Object

    # Filter groups where the count is greater than 1 (duplicates) and select the line and count
    $duplicates = $lineGroups | Where-Object { $_.Count -gt 1 } | Select-Object Name, Count

    # Output the duplicate lines and their counts
    return $duplicates
}

# Example usage of the function
$filePath = "C:\MyFolder\MyExample.txt"
$duplicateLines = Get-DuplicateLines -filePath $filePath

# Display the duplicate lines
if ($duplicateLines) {
    $duplicateLines | Format-Table -AutoSize
} else {
    Write-Output "No duplicate lines found."
}

You can see the output in the screenshot below after I executed the PowerShell script using the VS code.

Find Duplicate Lines in a Text File Using PowerShell Function

Conclusion

In this PowerShell tutorial, I explain how to count duplicate lines in a text file using PowerShell. We then check different methods with examples.

You may like the following tutorials:

100 PowerShell cmdlets download free

100 POWERSHELL CMDLETS E-BOOK

FREE Download an eBook that contains 100 PowerShell cmdlets with complete script and examples.