Archive for 2007
Reading Between Apple’s Lines
October 26th, 2007 • 8 comments General
Apple is trying to recreate 1984. Only in 2007, technology is a fashion statement. Technology is trendy ebbs and flows. G33ks are cool now, and soon will be nicknamed “Applesites”.
What we’re seeing now is what Mac enthusiasts have been hoping to see for 20 years: more people deciding to buy a Mac.
John Gruber, Everything’s Coming Up Milhouse
I find it a little ironic how Apple makes most of its money. They’re originally a computer company; a mac company. Now Apple sells gadgets, while the Mac sells almost solely by association.
Ten years ago, Apple Kremlinologists read into PowerMac, iMac, eMac. Now they’re reading iPhone, iPod and iTunes. Curiously, the more we try to read Apple, the more diverse our conclusions are:
Apple has deliberately driven all contracted phones out of its stores, so now it offers only the hardware. This means that the goal (10 million units sold) will be achieved at any cost, even if the company’s partners, carriers in this particular case, will suffer.
— Eldar Murtazin, Apple iPhone – apocalypse today or business strategies
For once, I found an interesting view on Apple’s strategy, and it’s not John Gruber’s tea leaf reading.
Because A Language Is Irrelevant
October 16th, 2007 • 1 comment General
The purpose of a language is to communicate purpose. It is most commonly used among human beings, until proven otherwise.
I don’t talk to my computer. I don’t intend to until it can butter my morning bread, figure out my coffee mood, and give a better answer than 42.
Everytime I punch in code, I implicitly accept a settlement. I will meet computers half way and use a structured language to communicate my purpose, provided that my fellow human beings are still able to discern my purpose. The clearer the structured language I use, the better aftertaste it leaves.
I couldn’t care less about a computer’s feelings. For all I know, I can improve it to better understand my less structured, more natural language. However, I do care about my fellow human beings, and I’d rather spare their brain muscles.
Sometimes a linguist is better qualified to design a less structured, more natural language. Of course, sometimes, things become so screwed up, that the very same linguist is cursed at. Sometimes it helps knowing a language with a different grammar. But often, none of it matters, as long as you can communicate your purpose to a fellow human being first, and a computer second.
French used to be a dominant language. So was Arabic, and Russian. I speak English most of the time now, and it’s not because English is the trend nowadays, but because it carries my intent and purpose to a wider audience.
Language itself is a vehicle, a protocol, a mechanism. Language in itself is irrelevant.
P.S. Inspired by Matt’s post.
Here You’re Screwed
October 13th, 2007 • 2 comments General
I miss Kathy’s blogging. It taught us a lot, now didn’t it?
Facebooks Will Rise
October 11th, 2007 • General
It is in every company’s interest to lock in its customers. Lotus did. Microsoft did. Adobe did. AOL did. Facebook does.
Eventually, people figure out how to export and exchange proprietary formats. Sometimes they invent a whole new format, and internationally standardize it. Sometimes they create programs to convert proprietary formats to existing standards. Sometimes, they start from scratch, reluctantly, determinately.
I don’t live in The States, so I haven’t experienced AOL’s iron grip. I can relate, however, because each of the Middle East countries has a dominant service provider, usually that which owns the copper. Facebook, on the other hand, isn’t tied to a region anymore, so you’re free to experience its iron grip, not matter what your location is.
Facebook’s biggest competitor isn’t another site with the same features. It isn’t better software. It isn’t the desktop. It’s the “export” button; just like Facebook’s most important feature is the “import” button.
Think of the web, of the Internet itself, as water. Proprietary platforms based on the web are ice cubes. They can, for a time, suspend themselves above the web at large. But over time, they only ever melt into the water. And maybe they make it better when they do.
— Anil Dash, Blackbird, Rainman, Facebook and the Watery Web
Thanks Anil.
The Smallest DocBook Big Picture
October 2nd, 2007 • 10 comments General
So you’re a writer, looking to make loads of money off your next best-seller. You fire up Microsoft Word to spend a few weeks drafting and crafting, carefully formatting your examples, properly aligning your figures and images, “bolding” this and italicizing that. Well into your hundredth page, your publisher contacts you regarding that draft you sent.
“This won’t work”, Bobby annoyingly says, “Didn’t we agree on a different line spacing? Oh, and your tables should be a little narrower, and this, and that, and…”. You sigh, knowing that you will be spending the next few days on coffee and cheap nicotine.
What happened there? Was it because you used Microsoft Word? Was it because you didn’t pay attention to Bobby’s every word? Nope. You mashed together what you want to say, with how you want it to look like, and then stuffed both in a proprietary format.
Single-Source Publishing
A book, an article, a documents in general consist of three layers of information:
- Content, which is the flesh of the document, and the reason it exists.
- Semantic information, or data about the content, such as the author, copyright information, list of figures and side notes.
- Formatting information, which describes how content will look like according to semantic information.
Most text processors, don’t distinguish between semantic and formatting information. For instance, we usually use bold to emphasize a phrase. We know bold means emphasized, but text processors have no idea. They know how bold looks like, but not what it means.
Formatting information also change in different types of media. For example, a link to a website wouldn’t normally be displayed on a web page, because we can turn any word into a link. However, asking the user to refer to a website through a book is a different story.
DocBook, can distinguish between the above-mentioned layers of information, and is able to generate different media outputs using the same DocBook document. So now, instead of working on a Word version for my copy-editor, an HTML version for the Web, and a PDF for my publisher. I can use a single DocBook version, and tell DocBook what to do with it and how.
What is DocBook?
The DocBook site says it “is a schema (available in several languages including RELAX NG, SGML and XML DTDs, and W3C XML Schema) maintained by the DocBook Technical Committee of OASIS.”
To put it in plain English, DocBook is a list of markup elements suitable for writing Documents and Books, hence the name. It’s like HTML, in a sense that it has tags and uses plain text files, but unlike HTML, it’s more suitable for writing books. DocBook has no idea how your final document will look like. That’s the job of the DocBook toolkit.
When we want to start a new paragraph in HTML, we use a <p>paragraph</p> tag. The browser knows how to display a paragraph tag, it knows that it should insert a margin above and below. DocBook, following the same concept, uses a <para> tag.
DocBook uses a toolkit to read DocBook files and generate final documents. A toolkit has tools to validate DocBook files, apply formatting, import images, according to a set of rules defined in stylesheets.
Schema Jungle
The first version of DocBook was based on SGML. SGML is the mother of all markup languages. HTML, XML, SVG, MathML, VRML, and most of the markups we use, are subsets of SGML.
Later on, DocBook people realized that SGML isn’t a very viable option. Well it is, but it complicates authors’ lives. So an XML version was created, and the only difference between SGML and XML versions is the toolkit, and in most cases, both SGML and XML are interchangeable.
When you start reading more about DocBook, you’ll be bumping into all sorts of geeky acronyms. Relax NG, DTD and XML Schema are the most frequent ones. The bad news is that these three have no resemblance whatsoever, and they’re frequently mentioned. The good news is it’s none of your business. You don’t need to learn any of them. I promise.
Relax NG, DTD and XML Schema are three different ways of expressing the same thing: the list of valid DocBook elements. These different versions exist for technical reasons, for toolkit developers and DocBook validators. As an author, you should be basking at Starbucks, crafting your masterpiece, instead of worrying about TLAs (Three Letter Acronyms).
DocBook Versions
When DocBook mentions versions, they mean the schema version, not the toolkit versions. Different versions of DocBook have different markup elements. Some were added in version 4.0, but deprecated in 5.0. Some exist only in minor version like 4.5, but not 4.1.
It used to be more confusing a couple of years ago, because customization is DocBook’s best and worst feature. O’Reilly, for example, created a subset called DocBookLite, which was suitable for writing their books. Simplified DocBook gets rid of elements related to book-writing, and leaves you with a subset suitable for technical documents. Other schemas extend DocBook and add more markup elements.
The most prominent DocBook customization is Simplified DocBook, which you should take a look at if you’re writing white papers not books. But the best news is that you only need to know one DocBook version. That’s right, only one. Just one. Don’t go out of your way to write a book. Read the Definitive Guide for version 4.x, pay attention to what’s changed between minor 4.x releases, and you’re set. You can convert 4.x to 5.x. Of course next year I would have recommended you learn version 5.x, but it’s not final yet, so I’d wait.
If you’re eyeing O’Reilly, keep in mind that they’ve deprecated DocBookLite in favor of the full DocBook schema. So make sure you’ve read more about DocBook before you jump contact them.
The DocBook Toolkit
When it’s time to make a book from your DocBook sources, things tend to get messy. Since DocBook is just an XML file with restrictions, there are dozens upon dozens of different tools you can use. However, there two main toolkits.
Toolkits are a combination of commonly-used tools, which validate your DocBook sources, apply stylesheets to them, and generate final, publishable versions of your book.
DocBook stylesheets have nothing to do with CSS. These stylesheets describe how the toolkit should convert DocBook elements to the desired output. So in order to convert a DocBook file to HTML, the stylesheet would tell the toolkit to replace every <para> with <p>, and apply other conversions for the rest of the elements. Of course the same applies to a PDF-converting stylesheet, which would take a <para> tag and replace it with a paragraph boundry, sort of.
There are two types of stylesheets: DSSSL and XSL, used with SGML and XML versions respectively. SGML can be converted to XML, and OpenJade, the tool that works with DSSSL, has an XML adapter, so it will work with XML just as easily. If you are a developer though, you’re probably more comfortable with xsltproc, Saxon and Xalan, instead of Jade.
PDF generation seems to be the most troubled part of the whole DocBook process. DocBook’s stylesheets can’t generate PDFs directly. Instead, they generate XSL-FO documents, which are then used by an FO processor to generate PDFs. Explaining FO is outside the scope of this article, but fortunately you only need to read the first paragraph of the Wikipedia entry.
Apache FOP is most commonly used, but mostly because its free, and competing products are expensive.
Authoring Tools
DocBook has an awful lot of authoring tools, yet most of them require an expert, and they look funny for Windows users (I’m looking at you Emacs). Of all the editors I looked at, I found the best option to be XMLmind’s XML Editor, XXE for short. There’s a free version, and the commercial license isn’t insanely over-priced. It can be extensively customized, and you can get the source code when you buy a commercial license. Oh, and O’Reilly use it.
More Information
I hope this article tickled your fancy, however odd that phrase might sound. If you’d like to learn more about DocBook, try to sift through Mark Johnson DocBookmarks, get brain-washed, and dip your toes into the details.

