Find one file of each kind (on Linux)

Suppose you have large number of files but most of them are identical. For example, the files’ contents are: A A A B B C C C C .
You’d like to find one of each kind and for example compare.

mkdir cmp
cd cmp/
find .. -name stupid-page.html | xargs md5sum | sort | awk '{print >$1}'
head -1 -q [0-f]* | awk '{print $2}' | xargs diffuse

Or if you just need the list, replace the last line with

head -1 -q [0-f]* | awk '{print $2}'

Note that you are left with the lists of files. Each list is named after MD5 of the content of the files listed in it.
Like this:

> ls -1
09b37d3089b1c1837e4741973df1e67e
4d701e2420bf49c85dd21c9b1dbb10e1
6135d23fcb0113ab9a2f574d7f0bf703
> cat 09b37d3089b1c1837e4741973df1e67e
09b37d3089b1c1837e4741973df1e67e  ../some-folder/stupid-page.html
09b37d3089b1c1837e4741973df1e67e  ../some-other-folder/stupid-page.html

Editing YAML with VI

Put the following in your .vimrc file and you are set:

au BufNewFile,BufRead *.yaml,*.yml set et ts=2 sw=2

When you will be editing YAML files, you’ll automatically have the following behaviour:

  • et – expand tabs – puts spaces whenever you use tabs
  • ts=2 – tab stop – 2 spaces per tab
  • sw=2 – shift width – 2 spaces to move with < and > commands

YAML in VI

Debian kernel upgrade to 2.6.30

Debian testing just got kenrel 2.6.30. The previous version was 2.6.26.
I will summarize here the new features for the upgrade but only the ones I will find interesting.

2.6.30

  • POHMELFS – kernel client for the developed distributed parallel internet filesystem
  • DST – network block device storage
  • LZMA/BZIP2 kernel image compression. The kernel size is about 10 per cent smaller with bzip2 in comparison to gzip, and about 33 per cent smaller with lzma.
  • SR-IOV support – PCI virtualization helper
  • Networking: Allowing more than 64k connections and heavily optimize bind(0) time.
  • Virtualization: virtio_net: Allow setting the MAC address of the NIC
  • Many Bluetooth improvements

2.6.29

  • WiMAX (Intel Wireless WiMAX/Wi-Fi Link 5×50 USB/SDIO devices)
  • Filesystem freeze (LWN has very technical article about this)
  • Support for multiple instances of devpts
  • MD: Allow md devices to be created by name, make devices disappear when they are no longer needed
  • Xen: add xenfs to allow usermode and Xen interaction
  • USB: storage: recognizing and enabling Nokia 5200 cell phones
  • Bluetooth: Add suspend/resume support to btusb driver

2.6.28

  • Memory management Scalability improvements (technical details on LWN)
  • NFS: authenticated deep mounting
  • Port redirection support for TCP
  • gre: Add Transparent Ethernet Bridging
  • Support discard requests on SSD devices to improve wear-leveling
  • Add generic ATA/ATAPI disk driver

2.6.27

  • Multiqueue networking
  • ftrace, sysprof support – another tracing mechanism for kernel (not related to SystemTap)
  • Improved video camera support with the gspca driver. (List of devices is at LWN)
  • Add HTC Shift Touchscreen Driver
  • Bluetooth: Track status of Simple Pairing mode and remote Simple Pairing mode
  • HP iLO driver

The sources (full lists of changes):

  1. http://kernelnewbies.org/Linux_2_6_30
  2. http://kernelnewbies.org/Linux_2_6_29
  3. http://kernelnewbies.org/Linux_2_6_28
  4. http://kernelnewbies.org/Linux_2_6_27

What’s wrong with the Internet – SMTP

SMTP is a text based protocol. I already mentioned that text protocols are evil. It was phrased more nicely in previous posts. Well it can no more be phrased nice. This is stupid! In SMTP it means that binary attachments are encoded using base64 which is part of MIME. Any binary attachment (image, presentation, document, …) you’ve ever sent takes more space in the email. That means it’s slower to send, slower to receive, and wastes more space on the server. The space some people still pay for. I have to remind the reader that such encoding requires additional CPU cycles to handle which in turn increases electricity bills’ totals.

And the additional bonus: ever heard “I have 10M email box but I can’t get an email with 8M attachment from my friend. What’s the problem?”. The correct answer would be the stupidity. Wasted tech support time. They have to explain that 8M attachment can not fit in 10M email box. Try sometime to explain this to someone. Have fun. Extra bonus: Someone pays for this tech support wasted time. Exercise for the reader: figure out who’s paying.

What’s wrong with the Internet – FTP

If I had the powers, I would make it unlawful to use FTP. One of the troublesome protocols. Let alone it’s text based, the semantics are totally screwed. Active and passive mode. Yeah, that totally solves all the problems, right. Especially the 2 sockets (network connections) for file transfer. Is it intentionally so f*cked up to make firewall software much harder to get right? In short, it’s broken. Don’t use it. Let it die slowly.

Use SFTP wherever you can. If you are a system administrator, make the world a favour: never enable FTP on your servers.

What’s wrong with the Internet – HTTP

Why in the world would one want to use text-based protocol? Really. WTF Dudes?
Yes, you can telnet a server on port 80 and debug… maybe. That’s about it.
Wikipedia says: “Binary protocols have the advantage of terseness, which translates into speed of transmission and interpretation”.
Lower costs would be caused by: less electricity used, cheaper hardware at the ends and along the way, less bandwidth.

I would also expect programs to be written in better ways just because of handling a binary protocol. A special library would always be used (I hope). There would probably be less stupid Perl scripts each implementing their own parsing of the query string, HTTP headers, and MIME POST body instead of using existing libraries. It would be much harder. There wouldn’t be less stupid people though… I mean that the same people that wrote those scripts would write some other stupid scripts.

HTTP does not support two-way communication in the way required for current internet applications. Wake up! Internet is mostly about applications these days and much less about documents.

Unfortunately I guess we are stuck because of the costs of upgrading to something better. I predict that we will continue to see increasing number of clever hacks to overcome the limitations of this pre-historic protocol.

What’s wrong with UNIX – configuration

All the programs use their own configuration files. The bad part is that the files have different syntax. This is stupid. Let’s assume one favored multiple files, one or several per application. I’m neutral about this. This could be OK if only they had the same syntax. I would expect one library to be used across all applications to read and write the configurations.

If I understand correctly, Gnome is trying to solve this by using a library. But looking at ~/.gconf/apps I saw big FAIL: it’s XML based.
One could argue about the XML but I’m all anti-XML. You could search the internet for “XML sucks” about that. Maybe I’ll post about that later.
Anyhow the fact that Gnome is doing it differently and not “The UNIX way” just highlights the problem.

Looking at the one-big-stupid-file-to-rule-them-all solution also known as Registry makes it clear that this implementation is plain and simple a failure. So it’s probably not it. Not the way it was implemented anyway.

Just to make clear, I’m not sure what the correct solution is. I just know the solutions I’m aware of suck.

ADSL on Debian

Hi.

Tired of lengthy manuals about ADSL setup, here is a copy+paste (with a bit of editing) from an email I’ve sent once (year 2005).

apt-get install ppp pppoe

### summary of changes ###
file /etc/ppp/options:
    change "auth" to "noauth"
    add "plugin /usr/lib/pppd/2.4.3/rp-pppoe.so"
file /etc/ppp/pap-secrets:
    "MYUSER@MYPROVIDER" * "MYPASSWORD" *
file /etc/ppp/peers/dsl-provider:
    "user" -> "MYUSER@MYPROVIDER"
    on line with "pty" - change eth0 to correct interface
        (eth1 in my case, but should usually be eth0)

manual start:
    pppd call dsl-provider
auto start:
    file /etc/network/interfaces:
        ###
        auto ppp0
        iface ppp0 inet ppp
            provider dsl-provider
        ###
### end of changes ###

Use at your own risk.
Hope that helps.