Archive for the ‘Snippets’ Category

Find one file of each kind (on Linux)

Thursday, November 18th, 2010

Suppose you have large number of files but most of them are identical. For example, the files’ contents are: A A A B B C C C C .
You’d like to find one of each kind and for example compare.

mkdir cmp
cd cmp/
find .. -name stupid-page.html | xargs md5sum | sort | awk '{print >$1}'
head -1 -q [0-f]* | awk '{print $2}' | xargs diffuse

Or if you just need the list, replace the last line with

head -1 -q [0-f]* | awk '{print $2}'

Note that you are left with the lists of files. Each list is named after MD5 of the content of the files listed in it.
Like this:

> ls -1
09b37d3089b1c1837e4741973df1e67e
4d701e2420bf49c85dd21c9b1dbb10e1
6135d23fcb0113ab9a2f574d7f0bf703
> cat 09b37d3089b1c1837e4741973df1e67e
09b37d3089b1c1837e4741973df1e67e  ../some-folder/stupid-page.html
09b37d3089b1c1837e4741973df1e67e  ../some-other-folder/stupid-page.html