Snippets – Ilya's blog

Suppose you have large number of files but most of them are identical. For example, the files’ contents are: A A A B B C C C C .
You’d like to find one of each kind and for example compare.
mkdir cmp cd cmp/ find .. -name stupid-page.html | xargs md5sum | sort | awk '{print >$1}' head -1 -q [0-f]* | awk '{print $2}' | xargs diffuse
Or if you just need the list, replace the last line with
head -1 -q [0-f]* | awk '{print $2}'
Note that you are left with the lists of files. Each list is named after MD5 of the content of the files listed in it.
Like this:

> ls -1
09b37d3089b1c1837e4741973df1e67e
4d701e2420bf49c85dd21c9b1dbb10e1
6135d23fcb0113ab9a2f574d7f0bf703
> cat 09b37d3089b1c1837e4741973df1e67e
09b37d3089b1c1837e4741973df1e67e  ../some-folder/stupid-page.html
09b37d3089b1c1837e4741973df1e67e  ../some-other-folder/stupid-page.html

Ilya's blog

Systems and software engineering

Category: Snippets

Find one file of each kind (on Linux)