these pages are currently not completed and are for illustration only for the time being


The Web is for Machines — a layman's explanation to a healthy web

Explanation and purpose

The short version

The web is chaos, badly made, sites load ten times slower than they could, your site won't come up on search engines and will look bad on another browser because most people that order web sites don't know all the details that are required and get cheated off. Be sure to ask people that make your site about the semantic web, valid markup, optimalization and separation of presentation and structure.

The longer version

The web is for machines, sounds like a strange idea but really, the web is for machines, there are very, very few humans on the web. What is a machine then would this make sense? a machine is not to be thought of like some thing with cogs or circuitry in this context, a machine is simple a piece of software that reads, just like a human, the one big difference between both is that machines are at the same time a thousand times dumber and a thousand times smarter than humans, they are savants if you like. They interpret every thing you say literally, lack any creativity to realize what you want them to do, but if you tell them exactly what you want them to do they can do about any task you describe to them.

But still? Aren't there a lot of humans on the web too? No, actually very, very little, the people to whom this document is addressed have never browsed the web. Their browser browses the web, what a browser does is translate the web into a form people can read, into an image, a picture, it's just a visualisation of the web, if you view the source of this page (most browsers do that via ctrl+u) you can see something closer to the web, but it's still just a picture of it. It's still a representation for humans.

The web in the eyes of a machine.

If you see something like this:

<p>The web is for machines, sounds like a strange idea but really, the web is for machines, there are very, very few humans on the web. But first what is a <span class="keyword">machine</span>? A machine is not to be thought of like some thing with cogs or circuitry in this context, a machine is simple a piece of software that reads, just like a human, the one big difference between both is that machines are at the same time a thousand times dumber and a thousand times smarter than humans, they are <em>savants</em> if you like. They interpret every thing you say literally, lack any creativity to realize what you want them to do, but if you tell them exactly what you want them to do they can do about any task you describe to them.

It's not to be interpreted as that text but as something more like

1: BEGIN PARAGRAPH
2: PLACE TEXT: "The web is for machines, sounds like a strange idea but really, the web is for machines, there are very, very few humans on the web. What is a " 
3: BEGIN SPAN UNDER CLASS: "keyword"
4: PLACE TEXT: "machine"
5: END SPAN
...et cetera et cetera

That's the meaning of that part. Though a human being can of course read that and probably get the idea, after a while it gives them a headache, so a browser has a handy thing called a layout engine (there are browsers without them actually) that translates all this into a form humans find extremely easy to interpret, but machines find very hard to make sense of. A white line inside a body of text makes a human realize like breathing a new paragraph has started, but a machine already has the greatest of trouble realizing that those black lines are text. So unless you read source codes a lot, you're not technically on the web, your browser is and that browser tells you what it finds on the web in a way you find intuitive to understand. You just ask your browser what's under a certain URL and it tells you that in your own language.

Now, teaching machines to translate machine language into human language is one thing, the reverse is quite impossible. That's because the translation is lossy, you lose information as you do it. After all:

1: BEGIN PARAGRAPH

or

1: PLACE LINEEND
2: PLACE LINEEND

End up looking exactly the same, a human can in most situations see from context which it is, a machine can't, so if a machine encounters this, assuming the machine can even recognise text, it has no way to know how to translate it back.


Why is this important?

The semantics

reversing the ideaThe web is made by people, far more than made by machines, and even more interesting, the web is made mostly by different people than who own that part of the web they make, the site, and thus have no interest in it working beyond getting paid. Even worse, the web is ordered by people generally who far from understand the web, in fact, most people that make the web have also not been taught to understand it during their education for the same principles. The web is now mostly made for humans and not for machines. It's made for how it looks, the so called presentation, instead of what it means, the so called structure. The web is unhealthy, and it's worse than people think.

Some people might think, 'but it's in the end to be delivered by people, yes?' and that's true, but ask yourself how you got to most sites? Got them by friends, looked them up on search engines? Wonder how those sites get to your friends and you realize that it all starts with search engines, which are operated by machines. And a machine can't understand a thing of your site based on its looks, a machine goes through the source and follows those instructions and tries to interpret the meaning of a site, and this doesn't go as well as it should go, currently the web isn't healthy at all as the people that order sites don't know this often and just see if it looks good and then pay people, often the people that make sites don't even know this as they have never been informed of this by their educations. Machines are ultimately far more important than humans to make clear to what a site is and does and what information it has.

The time

Another part is that in forgetting the machine and only being interested in the translation the machine has to offer, many designers simply quit when it looks good, forgetting the quality of the code under it. There are many flavours of 'bad' code, the simplest is that the code is not optimal, using far more code than would be needed, not that important, but still quite important as sites get larger to load into your browser to work with, and the browser takes more time to translate it to you. It's like translating 'I saw a man, the man that I saw was quite big, the man that I saw that was quite big waved to me me.' into German instead of 'A quite big man I saw waved to me.', first you have to read a long sentence, then try to find out what it means, and then translate into 'Ein etwas großerer Mann grüßte mich'.

Another far more consequential thing is that despite being unnecessarily verbose, most source codes are simply not grammatical. Especially languages for machines like HTML that tell your browser the layout of a page have very strict grammar, you can't just be creative with word order or use poetic licence. Now, as explained before, machines have a lot of trouble with these of things, in the case of a grammatical error in your HTML code, there are some very advanced algorithms working to try to guess what you probably meant with it, and these algorithms take a lot of time to be carried by your computer, make your browser larger in disk space than would-be-needed, make your computer run slower as other applications could use that processing power, and browser vendors lose a lot of time researching bad HTML code on the web and devising algorithms to anticipate people's errors.

The sad part of it is that it's a vicious loop, some browsers first tried to get a market share by offering some advanced algorithms that could work around some basic errors. Then other competing browsers had to join in, web designers got more lax with making errors and the cycle went on and on to the point that virtually no site on the web has completely grammatical code. This is a unicum in computer languages, most programs just quit on you at the first error and tell you where it is and ask you to repair it before you try again. In computer languages, technically, after the first error the rest of the code means nothing any more, nada, nihil. If your site has completely valid code, you're part of an elite nowadays, amazing. If you look down you see those nice blue pictures that certify my page as valid HTML, no grammatical error at any point, you can click then to check and fill in some other sites you know at the machine that checks them to have a laugh. Yap, even the biggest search engine on the planet has errors. That pages load from top to bottom with the internet speed and processing power computers have nowadays shouldn't be able to happen, it should be there instantly, but your browser is actively busy applying complex algorithms to figure out what the author of the page meant and takes its due time.

But now any browser that comes without these algorithms is useless as there are virtually no sites on the web that it could make some thing of. And that means that more and more errors keep popping up.


niarch
email: mail@nihilarchitect.net
msn: half.live@passport.com
jabber: temporalabstraction@gmail.com

The web is for Machines, last updated on Tue, 13 Oct 2009, 11:11:26 (UTC)

Valid HTML 4.01 Strict Valid CSS Level Triple-A conformance icon, 
          W3C-WAI Web Content Accessibility Guidelines 1.0