Suppose you have large number of files but most of them are identical. For example, the files’ contents are: A A A B B C C C C .
You’d like to find one of each kind and for example compare.
mkdir cmp
cd cmp/
find .. -name stupid-page.html | xargs md5sum | sort | awk '{print >$1}'
head -1 -q [0-f]* | awk '{print $2}' | xargs diffuse
Or if you just need the list, replace the last line with
head -1 -q [0-f]* | awk '{print $2}'
Note that you are left with the lists of files. Each list is named after MD5 of the content of the files listed in it.
Like this:
> ls -1 09b37d3089b1c1837e4741973df1e67e 4d701e2420bf49c85dd21c9b1dbb10e1 6135d23fcb0113ab9a2f574d7f0bf703 > cat 09b37d3089b1c1837e4741973df1e67e 09b37d3089b1c1837e4741973df1e67e ../some-folder/stupid-page.html 09b37d3089b1c1837e4741973df1e67e ../some-other-folder/stupid-page.html