What’s wrong with the Internet – HTTP

June 26, 2009 Ilya Sher11 Comments

Why in the world would one want to use text-based protocol? Really. WTF Dudes?
Yes, you can telnet a server on port 80 and debug… maybe. That’s about it.
Wikipedia says: “Binary protocols have the advantage of terseness, which translates into speed of transmission and interpretation”.
Lower costs would be caused by: less electricity used, cheaper hardware at the ends and along the way, less bandwidth.

I would also expect programs to be written in better ways just because of handling a binary protocol. A special library would always be used (I hope). There would probably be less stupid Perl scripts each implementing their own parsing of the query string, HTTP headers, and MIME POST body instead of using existing libraries. It would be much harder. There wouldn’t be less stupid people though… I mean that the same people that wrote those scripts would write some other stupid scripts.

HTTP does not support two-way communication in the way required for current internet applications. Wake up! Internet is mostly about applications these days and much less about documents.

Unfortunately I guess we are stuck because of the costs of upgrading to something better. I predict that we will continue to see increasing number of clever hacks to overcome the limitations of this pre-historic protocol.

11 thoughts on “What’s wrong with the Internet – HTTP”

anon says:

October 29, 2009 at 7:21 pm

Your chain of thought is based on a faulty assumption of scarcity of resources. If you dismiss that assumption, it might occur to you that efficiency is a specialization technique. You can see various other protocols and stacks, for example, the notorious SS7 telecom stack, in use world-wide, that trade off extensibility and ease of troubleshooting for efficient binary implementation. You can also see that these protocols aren’t in much general use, and writing compatible implementation is quite painful. Generally, efficiency is a short-term goal, and compatibility is long-term. In protocol design, you’re better off with compatibility.
The shorter the term, the more efficient the design. Assembly programming provides you stupidly-fast execution, but on only one type (or even one family) of processors. Python/perl/php/ruby are not very fast, but run very nicely almost everywhere and describe the program in a much more understandable way. You can argue that C is a middle ground keeping efficiency while being quite compatible all-around. Compare to human languages, which have very high descriptive power, but interpretation and execution performance is problematic.
Short-term thinking has no place in protocol design. If you want a protocol for your specific application, use whatever you want, but if you want it to last, slow down and contemplate extension mechanisms etc.
SS7 is glaring example of a failed protocol design. It’s very complex and hard to get right, and being binary-only doesn’t help it. Another famous one: IPv4. You can argue that IPv6 is even worse, though. HTTP, on the other hand, is interesting: you can use compression if you want to, but it’s very easy to implement and debug. There is a limit to this “compatibility” thinking, though, and, at least for now, XML looks a bit too descriptive, although it might be praised for design in 100 years (not very likely).

If you really NEED that extra bit of performance, do the magic and get that last ounce of efficiency – it’s actually fun to search for it too. Not many of your users/community members will be able to understand your solution, though, so don’t expect someone to extend or debug it.
If you need extensibility and lasting support, deal with the overall design, and only optimize the parts that you don’t expect to be touched by anyone except yourself.
You can see examples of this in linux kernel source – there are different parts written in different styles – some are always being developed and extended, and some rot without being looked at by anyone (tty etc).

It might cost you less of your own resources in bandwidth or cpu time to run your program/protocol, but it will cost more resources in human-time for your users/partners/community to implement something compatible with your solution. If you want the community to extend/support your program/protocol, you’re better of wasting some more of your time to save theirs.

Inability to process strings efficiently on modern computers is another problem altogether.

Btw, there are many binary protocol design/debug instruments, did you ever use one of those ? What about telnet/netcat ?

LikeLike

Reply
Ilya Sher says:

October 29, 2009 at 8:10 pm

> Your chain of thought is based on a faulty assumption of scarcity of resources.
Incorrect, it’s not that you can’t have your resources. It’s about the costs. You have inefficient protocol – you pay more for hardware and electricity.

> the notorious SS7 telecom stack
Can’t comment on SS7 thing as I haven’t read about it yet.

> compatibility is long-term
I agree that ease of writing compatible and extensible implementations is very important. I don’t think that binary vs text has much weight here.

> Short-term thinking has no place in protocol design.
Efficiency is not short-term thinking. It’s live-long gain in hardware and electricity costs.

> Not many of your users/community members will be able to understand your solution.
I do agree that there should be a trade-off between anything and simplicity and clarity of implementation. Having more developers can lead to financial gains too, not only efficiency of a protocol.

> Btw, there are many binary protocol design/debug instruments, did you ever use one of those ?
I used openssl to look at ASN.1 encoded certificates and to perform encrypted HTTP request.

> What about telnet/netcat ?
I use them much more frequently. I think it’s related to the real world’s share of text based protocols vs binary based 😉

Bottom line:
It’s all about costs. Historically we are stuck with many text based protocols with very high costs of changing them. I guess that it could be done cheaper if it was given more thought from the beginning.

LikeLike

Reply
anon says:

November 1, 2009 at 3:34 am

Regarding your last 2 points – “real world’s share of text based protocols” and “stuck with many text based protocols” – it “might” have something to do with “text” being the most efficient consciously human-comprehensible communication mechanism on earth ? There just isn’t any simpler way (well, visualization is even better, but much harder to do) to do communication.
So, anything made in text would be quite easily parseable by a human.
Thus, it all comes to the amount of time to create something that could provide you important data.
For HTTP/SMTP/etc the tool you need can be written in a few lines of code.
Of course, assuming TCP/IP is done by the OS here. If it’s not, you will feel all the pain of binary protocols.
If you want _another_ human being to look at, understand, and use your protocol in self-created software, you’re better off with something human-comprehensible.
This is why this message is not written in a text file, rar’d and then base64 or uuencoded and pasted here. It would take less space, but at what price ? You would probably know how to deal with it by its headers, but why complicate things ?
Scarcity is a myth. Unless you have _real_ reasons to save on some resource, it will most often cost you much less to find a way around the scarcity. If your game runs at 20fps and you need 30, yea, go find a good programmer to bring it up. If you feel like it and know what you’re doing, and like to deal with that stuff, go on, make him work nights and bring the performance to 100fps (so poor folks with old computers can also play). Just don’t fool yourself that you’re doing that for some kind of economics reasons, you’re doing it as a personal indulgence, which btw is generally quite well known in r&d and elsewhere : perfection should be diagnosed as a proper addiction 🙂

For a better example, why don’t people talk with more coded messages ? Why text ? Is it really that efficient ? (hint: yes, for its purposes, it’s very efficient and very descriptive)

Btw, why are you saving electricity ? It costs almost nothing compared to human worktime ? And don’t even start about the global warming etc – one live human makes more damage to the environment in a year than a full farm of nice hot servers.

About costs, do you know the definition of “cost” ? The resources don’t really _cost_ anything. 10% of what you pay is human labor costs, 90% is markup. The resource doesn’t have any inherent cost. The only resource you might pay for is human labor, which is exactly what you’re trying to waste to fuel the addiction to perfection. Again, nothing wrong with that, human labor is also not scarce at all, it’s just that the assumption that this actually matters is false. There isn’t much anything scarce, by human metrics, in this world.

LikeLike

Reply
Ilya Sher says:

November 1, 2009 at 10:30 am

It looks like you are putting much more weight on easy developing than on operating costs (function of efficiency). Is it cheaper to buy new hardware than spend 1 month of 1 employee to improve efficiency by 1% and reduce the operating costs? It looks like you would go for the hardware anyway.
I argue that it depends on the scale and hence the price. If this 1% reduces the number of servers by 1000, I would go for the developing, because it’s cheaper. On the other hand I do use scripting languages as opposed to C, for example, because that is cheaper in majority of the cases I’ve seen.

> Unless you have _real_ reasons to save on some resource,
I’m not trying to save the resources. I’m trying to save my money. Think that everyone using the Internet would save 1 cent per month (if we had better protocols). Isn’t this huge pile of money worth working a bit harder on the protocols?

> Just don’t fool yourself that you’re doing that for some kind of economics reasons.
What I would like to happen (I’m not doing anything for that except writing this blog yet) _is_ mostly for economy. I am well aware that most of the programmers do things which are financially wrong. I do it sometimes too, just because it’s fun but I do it in my spare time. This is _not_ the case here.

> For a better example, why don’t people talk with more coded messages ?
That’s trade-off between “network” (communication time) and CPU (brain usage). It will take too much processing power in our brains. You wouldn’t want to wait half a minute for an answer for a simple yes/no question. (You can look up for designed languages, people tried something like this; I don’t think any of these languages really took off). On the other hand, why would computers ever “want” to communicate in text-based protocols? It’s less efficient in both network and CPU!

> Btw, why are you saving electricity ?
Again, saving my money, nothing else.

> It costs almost nothing compared to human worktime ?
Big “gotcha” is here. It depends on amount of work and savings. Savings largely depend on scale. I bet Google works a _lot_ on saving electricity (equals money here). From my personal experience I can tell that there are lot more cases where hardware and electricity are cheaper then the labor. It’s _not_ the case here, not at this scale.

> global warming
Last time I checked, global warming was a fraud to impose world-wide carbon taxes as one more step towards world government.

> definition of “cost”
… is “amount of money, given up in exchange for something else”, which means whatever I pay is the cost whether it’s for labor or resources.

LikeLike

Reply
anon says:

November 1, 2009 at 11:46 am

re: buying hardward “brute-force” instead of software “wisdom” –
It all depends on whether the only problem that needs fixing is your software itself. F.e. the hardware would also be useful for other software, and it will also be quite easy to value in case you wanted to sell it. Not so easy with software, especially if you’re making it mostly for your own needs. Thus, your investment in software is very nice – it’s generally a very cool thing to do – but is only more “efficient” when looked at from a very narrow point of view. Other than experience gained from the process by your developers, you won’t gain much benefit when you need to scrap the software. The new hardware, on the other hand, would provide you gains running both new and old versions of whatever you wanted to run.

Since there is not real scarcity of resources (including money) it’s only a matter of personal preference which path to choose. The choice doesn’t really matter in face of the astounding amounts of value wasted every second. Not talking even about wars etc, just look at art, or music.

Now you can ask why is money “non-scarce” ? Just look at the amount of money in circulation (trillions) compared to the amount of real money backed by something tangible (billions at most). Money is not scarce. It’s not a zero-sum game – most banks don’t have any financial ratio limits, other than those set arbitrarily by the oversight committees, and thus influenced by personal “friendships” which are sometimes called “corruption” by the masses.

re: everyone using the ‘net saving 1 cent per month:
They don’t really care. You wouldn’t really care to save 1 cent per month. And nobody sane would. And that’s because there is a much bigger amount of money _created_ every month. You don’t need to _save_.
As a general idea, go with this: “Don’t save. Earn”. Which also means that spending is a good merit. Saving is not.

re: cpu-limited communication
Computers communicate using text because humans operate them.
If aliens were operating computers they might’ve brought some other paradigms. But as long as a human needs to understand the protocol, the human needs text. Whether it’s a binary protocol with a text manual, or a text protocol from the start doesn’t really matter. The manuals are generally hard to read and decipher, so text protocol is a good middle ground.

> Btw, why are you saving electricity ?
Again, saving my money, nothing else.
Oh no, you’re saving electricity to save money ?
Why not save on food instead ? Or on luxury items ?
Did you compare your electricity bill to any of other resources you use ?

re: google working on saving electricity
How does google try to save on power, do tell ? By creating more efficient protocols ? And considering most of the equipment is on all the time anyway, how does making it sit idle help ? The way you can help save electricity is by making power-efficient hardware and investing in that is good for general population too. Obviously, Google has enough profits to pay for the electricity bill. And while at it, what would be a better way to spend time: finding new ways to profit, creating new interesting projects, or trying to save opex on existing ones ? Generally the companies that start saving enter the vicious cycle leading to fatalities. Innovative companies spend and earn, they don’t save beyond reason.

Of course, if you have factual arguments supporting your point, and actual experience at the scale of your focus, bring it on.

re: cost “definition”
here you enter a cycle, trying to define “cost” with “money”. next thing you should do is define “money” 🙂

LikeLike

Reply
anon says:

November 1, 2009 at 11:58 am

re: opex focus
Operational expenses are generally planned before you use them. That means that you already have enough money or other resources to cover your opex. So, unless you get drastic savings (>20%) with very small investment, you should go on and do something new. Why delve in the past, when it’s already working quite well ? Leave some of your profits for internal updates, gaining that 1% efficiency per month and supporting the thing.
Go on and focus on something else.

LikeLike

Reply
Ilya Sher says:

November 1, 2009 at 2:36 pm

It looks like you completely ignore the weights. I’ll take it to the extreme. You are a startup company and you have the following two options. 3 days of development and then you need 10 servers or 4 days of development and then you need 1 server just for your POC and first 100 users. You have enough money only for the latter. Would you go for the 3 days and look for another investment round just to complete your POC? This is not a real case, it’s just to show that it’s not always the same answer and things do have weights.

Don’t you think there are cases when extra development is justified to save later on operational costs?

> you won’t gain much benefit when you need to scrap the software
If the decision was correct, the extra development would pay itself off till then.

> “Don’t save. Earn”
If you are talking about a company, of course they should decide whether their resources can be put to better use (earning on something else amount bigger than X) than saving X money. If you are talking about a person that’s completely private issue and the person should decide on that. I’m pretty sure that most people, given two equal products that only differ in cost, would generally prefer the cheaper and not to work extra hours for the expensive. Average Joe have to work for the money and he does not care about the amount of money in circulation (not until he’s robbed by inflation).
One cent was just an example. Given 2 different providers, offering exactly the same service, having exactly the same market position, and one is a bit cheaper.
Do you think people would prefer to pay less for the Internet? … or work more?

> spending is a good merit. Saving is not.
I totally disagree. It looks like something big corporations would like us to believe.

> as long as a human needs to understand the protocol
Again, you completely ignore the weights. It is important for humans to understand the protocol but in some cases there are more important things. It looks like you have the same answer for all cases.

> Oh no, you’re saving electricity to save money ? Why not save on food instead ? Or on luxury items ?
Would you go for (A) $50 toaster + $100 bill per year or (B) $75 toaster + $75 bill per year ? Do you think the answer is somehow affected by the fact that you would like to visit a restaurant tonight? For me, that’s completely independent.

> How does google try to save on power, do tell ?
I’ll Google on it later.

> And considering most of the equipment is on all the time anyway, how does making it sit idle help ?
It’s not busy vs idle, it’s less servers vs more servers for the same work.

> The way you can help save electricity is by making power-efficient hardware and investing in that is good for general population too.
I’m sure people are working on this too but that’s not my field. Again, it does not have to be this vs more efficient code.

> what would be a better way to spend time: finding new ways to profit, creating new interesting projects, or trying to save opex on existing ones ?
Again you are “global”. It depends on each case.

> Of course, if you have factual arguments supporting your point,
I’ll have to do some research for that. Thanks for a good idea for another post.

> actual experience at the scale of your focus, bring it on
No. Do you?

> re: cost “definition”
> here you enter a cycle
I don’t see any cycle here.

> So, unless you get drastic savings (>20%) with very small investment, you should go on and do something new.
Again, you are very “global”. What if I could save 1%, which would be $3K per month, using 1 day of 1 developer which is currently working on another non-critical task that would bring the company $3K a year?

It looks like I should research and write on large-scale operational costs savings.

LikeLike

Reply
anon says:

November 1, 2009 at 8:57 pm

re: spending in restaurant has nothing to do with $75/year utility bills:
First point: you’re talking about less than 0.1% of your total annual expenses. Do you really need to optimize at that level ?
Second point: the more efficient product costs more. Your example is unrealistic, but, say, the new product for $150 saves you $10/year. If you can use your money well, the $50 you save this year can bring you much more good than that $10 a year. Especially since it’s a nice idea to change toasters once in a while. But look at it at an even better perspective: if you have enough money to eat and buy luxury items, just get that toaster for $150, feel that you need more money and go and earn it. Very simple.

re: 2 providers with one of them cheaper:
Clearly the point is not entering the price war – it will bring both the competitors down. A good manager at one of the providers will decide to differentiate somehow – and that’s where the resources should be going to. Unless the service costs a significant portion of person’s money flow, the person won’t really try to save $5 and get a worse service. The thought-flow goes something like this: “Hmm.. this one costs $X, that one costs $X+5, but the second provider gives free Y and Z, and they have nice advertisement campaign, and I get the right to call the clients of the first provider ‘cheapskates’… sounds like a deal !”
On the scales of ISP fees, f.e., the money amounts are so small that people don’t really care about them. It’s much more of a hassle to them to call the provider and get updated prices and discounts than to pay for the service at last-year rates. And this is the only scale where your “optimization” logic really works, as you can’t bring it into high-end luxury services market.

Re: the startup thing:
You forgot a VERY important element. When opening the startup you probably had a plan about the specific project and how to make it work.
When some r&d provides you a benefit of moving from 10-server requirement to 1-server requirement in the matter of 1 day, that’s not “performance optimization”, that’s just plain old “development”. That should have been planned in your original plan for that startup, and if it wasn’t – shame on you. But after you’ve done that, you became profitable, and the next month of optimization brings you single-digit percentage improvements – why bother focusing on it ? Leave some developers to optimize it slowly and support it. Move most of your force to new and exciting projects. You won’t believe how much developers like new code !
And if you didn’t become profitable, and just planning to become such after some magic “performance optimization” fairy helps you, then your startup was planned badly and your investors should be very sad.

At any scale you take in a risk-taking world – the one that exists right now – risk-averse behavior will always lose in the long-term against risk-taking one. Performance optimization when the law of diminishing returns has kicked in, and it ceased to be real r&d is an example of risk-averse behavior. Investing in new projects is risk-taking. Spending is risk-taking, or even risk-seeking. Saving is risk-averse.

LikeLike

Reply
anon says:

November 1, 2009 at 9:20 pm

…

Actually, you’re right.

Optimization is a very nice thing.
And binary protocols are really good. They really _are_ that good.
If you take the amount of raw HTML, SMTP, FTP etc flowing through the “intertubes” every second and rar it, there would be quite a considerable amount of saving.

If you’re already good in performance optimization, might as well use it for your benefit !

The days when humans really needed to look at this stuff at such a low level are long gone. Same as the first data storage devices were these “punch-cards”, and the first protocols were real “protocols” in pen-and-paper-and-a-human-moderator, the “human” touch along with its text-based descriptive attributes will most probably go up higher and higher in the protocol stack. They’re already being pushed out from the 4th OSI layer 🙂

So, if you can find a way to promote this trend efficiently, do it !
A middle ground might be a standard for protocol binarization which would allow any protocol to be binary, while having a one-to-one text-based, human-readable form-mapping. Binary XML ?
Or some kind of a universal interpreter, that, given a standardized human-readable protocol description would provide a working binary adaptation module.

btw, google for “wikimization” – you might find it interesting.

Have ideas about creating the above – let know, might provide funding for your new startup !

LikeLike

Reply
Ilya Sher says:

November 1, 2009 at 10:32 pm

It looks like you largely ignored my questions.
Trying to understand you fully, I have several questions:
1. Do you think that some level of optimization pays off?
2. Any specific criteria you can use to distinguish between “development” and optimization if the changes make the code more efficient?
3. Do you think that the money saved on optimization will always have alternative usage that will provide more gain than the optimization?
4. Will you take a risk over a “safe” bet?

What’s that sudden change in your last post? Is it sarcasm? If it’s not, here is the reply.

Regarding RAR:
Trade-off of the network vs CPU must be considered. Actually it is, so we do have “Content-Transfer-Encoding: gzip”.

> Or some kind of a universal interpreter, that, given a standardized human-readable protocol description would provide a working binary adaptation module.
If I understand correctly, ASN.1 is heading that general direction. Take a look at http://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One .

> wikimization
Looks very math-oriented.

LikeLike

Reply
anon says:

November 2, 2009 at 12:41 pm

The last post is genuine.

re: questions:
1. Optimization pays off always. Where to stop optimizing and move to something different is a personal preference. Some may say that the time to move on is when you have everything working at the level needed to be profitable, or that which was required by the plan. There’s nothing wrong in wanting to optimize indefinitely. If you are implying that optimization doesn’t have to have that “diminishing returns” factor, you might also be right. Systemic thinking along with unlimited r&d time sometimes lead to astounding breakthroughs. This is not very “startup”-ish, though.

2. When you _plan_ your optimization, and you see the goal of that optimization clearly, you can estimate the time spent and the results you will get. This is what happens in that “startup” theme, for example. You wouldn’t start that project if you would think it’s not feasible. You knew, or you had an instinctive belief that you could get it down to 1 server requirement, that’s why you started it. That’s development. Optimization for fun starts when you don’t know what you’re looking for, you’re just poking at various random parts to check if they can be made more efficient. Or you might as well create a full benchmark suite and work from there. In any case, you’re doing it not because you needed it from the start and planned for it, but because it’s fun, or it’s your personal preference to do that.

3. It depends if the optimization is a part of a planned process that can be estimated well, or an “exploratory” process, finding stuff and fixing it. In the former case, you know exactly the costs and benefits, so you can compare to other stuff you might wanna use the money for. In the latter case, since you can’t estimate the benefits or even resources required, you might as well go and do something more “controllable”. So it depends on the amounts of money you’re talking about. If you don’t have near-infinite amounts of money, in most cases, controlled processes bring more profit. Again, this is a matter of personal preference, and sometimes the continued optimization brings undreamed-of benefits suddenly. It’s actually lots of fun doing stuff you enjoy, and if you really enjoy it it will bring benefits sooner or later. And for that “sooner or later” part you better have a near-infinite money supply 🙂

4. There are strategies in taking risks. Read up on it.

re: content-encoding:
You’re talking about doing compression at level 5 OSI. Was talking about 4th layer, or something between 3rd and 4th. In any case, it’s not really a solution, since the underlying content is still textual.

re: asn.1:
oh, don’t go there. it’s near the top in the list of the worst things EVER created on earth.

You are probably right generally about the “binary vs text” issue, but the ASN.1 or text-compression solution doesn’t really cut it.

LikeLike

Reply