Archive for General
Facebooks Will Rise
October 11th, 2007 • General
It is in every company’s interest to lock in its customers. Lotus did. Microsoft did. Adobe did. AOL did. Facebook does.
Eventually, people figure out how to export and exchange proprietary formats. Sometimes they invent a whole new format, and internationally standardize it. Sometimes they create programs to convert proprietary formats to existing standards. Sometimes, they start from scratch, reluctantly, determinately.
I don’t live in The States, so I haven’t experienced AOL’s iron grip. I can relate, however, because each of the Middle East countries has a dominant service provider, usually that which owns the copper. Facebook, on the other hand, isn’t tied to a region anymore, so you’re free to experience its iron grip, not matter what your location is.
Facebook’s biggest competitor isn’t another site with the same features. It isn’t better software. It isn’t the desktop. It’s the “export” button; just like Facebook’s most important feature is the “import” button.
Think of the web, of the Internet itself, as water. Proprietary platforms based on the web are ice cubes. They can, for a time, suspend themselves above the web at large. But over time, they only ever melt into the water. And maybe they make it better when they do.
— Anil Dash, Blackbird, Rainman, Facebook and the Watery Web
Thanks Anil.
The Smallest DocBook Big Picture
October 2nd, 2007 • 10 comments General
So you’re a writer, looking to make loads of money off your next best-seller. You fire up Microsoft Word to spend a few weeks drafting and crafting, carefully formatting your examples, properly aligning your figures and images, “bolding” this and italicizing that. Well into your hundredth page, your publisher contacts you regarding that draft you sent.
“This won’t work”, Bobby annoyingly says, “Didn’t we agree on a different line spacing? Oh, and your tables should be a little narrower, and this, and that, and…”. You sigh, knowing that you will be spending the next few days on coffee and cheap nicotine.
What happened there? Was it because you used Microsoft Word? Was it because you didn’t pay attention to Bobby’s every word? Nope. You mashed together what you want to say, with how you want it to look like, and then stuffed both in a proprietary format.
Single-Source Publishing
A book, an article, a documents in general consist of three layers of information:
- Content, which is the flesh of the document, and the reason it exists.
- Semantic information, or data about the content, such as the author, copyright information, list of figures and side notes.
- Formatting information, which describes how content will look like according to semantic information.
Most text processors, don’t distinguish between semantic and formatting information. For instance, we usually use bold to emphasize a phrase. We know bold means emphasized, but text processors have no idea. They know how bold looks like, but not what it means.
Formatting information also change in different types of media. For example, a link to a website wouldn’t normally be displayed on a web page, because we can turn any word into a link. However, asking the user to refer to a website through a book is a different story.
DocBook, can distinguish between the above-mentioned layers of information, and is able to generate different media outputs using the same DocBook document. So now, instead of working on a Word version for my copy-editor, an HTML version for the Web, and a PDF for my publisher. I can use a single DocBook version, and tell DocBook what to do with it and how.
What is DocBook?
The DocBook site says it “is a schema (available in several languages including RELAX NG, SGML and XML DTDs, and W3C XML Schema) maintained by the DocBook Technical Committee of OASIS.”
To put it in plain English, DocBook is a list of markup elements suitable for writing Documents and Books, hence the name. It’s like HTML, in a sense that it has tags and uses plain text files, but unlike HTML, it’s more suitable for writing books. DocBook has no idea how your final document will look like. That’s the job of the DocBook toolkit.
When we want to start a new paragraph in HTML, we use a <p>paragraph</p> tag. The browser knows how to display a paragraph tag, it knows that it should insert a margin above and below. DocBook, following the same concept, uses a <para> tag.
DocBook uses a toolkit to read DocBook files and generate final documents. A toolkit has tools to validate DocBook files, apply formatting, import images, according to a set of rules defined in stylesheets.
Schema Jungle
The first version of DocBook was based on SGML. SGML is the mother of all markup languages. HTML, XML, SVG, MathML, VRML, and most of the markups we use, are subsets of SGML.
Later on, DocBook people realized that SGML isn’t a very viable option. Well it is, but it complicates authors’ lives. So an XML version was created, and the only difference between SGML and XML versions is the toolkit, and in most cases, both SGML and XML are interchangeable.
When you start reading more about DocBook, you’ll be bumping into all sorts of geeky acronyms. Relax NG, DTD and XML Schema are the most frequent ones. The bad news is that these three have no resemblance whatsoever, and they’re frequently mentioned. The good news is it’s none of your business. You don’t need to learn any of them. I promise.
Relax NG, DTD and XML Schema are three different ways of expressing the same thing: the list of valid DocBook elements. These different versions exist for technical reasons, for toolkit developers and DocBook validators. As an author, you should be basking at Starbucks, crafting your masterpiece, instead of worrying about TLAs (Three Letter Acronyms).
DocBook Versions
When DocBook mentions versions, they mean the schema version, not the toolkit versions. Different versions of DocBook have different markup elements. Some were added in version 4.0, but deprecated in 5.0. Some exist only in minor version like 4.5, but not 4.1.
It used to be more confusing a couple of years ago, because customization is DocBook’s best and worst feature. O’Reilly, for example, created a subset called DocBookLite, which was suitable for writing their books. Simplified DocBook gets rid of elements related to book-writing, and leaves you with a subset suitable for technical documents. Other schemas extend DocBook and add more markup elements.
The most prominent DocBook customization is Simplified DocBook, which you should take a look at if you’re writing white papers not books. But the best news is that you only need to know one DocBook version. That’s right, only one. Just one. Don’t go out of your way to write a book. Read the Definitive Guide for version 4.x, pay attention to what’s changed between minor 4.x releases, and you’re set. You can convert 4.x to 5.x. Of course next year I would have recommended you learn version 5.x, but it’s not final yet, so I’d wait.
If you’re eyeing O’Reilly, keep in mind that they’ve deprecated DocBookLite in favor of the full DocBook schema. So make sure you’ve read more about DocBook before you jump contact them.
The DocBook Toolkit
When it’s time to make a book from your DocBook sources, things tend to get messy. Since DocBook is just an XML file with restrictions, there are dozens upon dozens of different tools you can use. However, there two main toolkits.
Toolkits are a combination of commonly-used tools, which validate your DocBook sources, apply stylesheets to them, and generate final, publishable versions of your book.
DocBook stylesheets have nothing to do with CSS. These stylesheets describe how the toolkit should convert DocBook elements to the desired output. So in order to convert a DocBook file to HTML, the stylesheet would tell the toolkit to replace every <para> with <p>, and apply other conversions for the rest of the elements. Of course the same applies to a PDF-converting stylesheet, which would take a <para> tag and replace it with a paragraph boundry, sort of.
There are two types of stylesheets: DSSSL and XSL, used with SGML and XML versions respectively. SGML can be converted to XML, and OpenJade, the tool that works with DSSSL, has an XML adapter, so it will work with XML just as easily. If you are a developer though, you’re probably more comfortable with xsltproc, Saxon and Xalan, instead of Jade.
PDF generation seems to be the most troubled part of the whole DocBook process. DocBook’s stylesheets can’t generate PDFs directly. Instead, they generate XSL-FO documents, which are then used by an FO processor to generate PDFs. Explaining FO is outside the scope of this article, but fortunately you only need to read the first paragraph of the Wikipedia entry.
Apache FOP is most commonly used, but mostly because its free, and competing products are expensive.
Authoring Tools
DocBook has an awful lot of authoring tools, yet most of them require an expert, and they look funny for Windows users (I’m looking at you Emacs). Of all the editors I looked at, I found the best option to be XMLmind’s XML Editor, XXE for short. There’s a free version, and the commercial license isn’t insanely over-priced. It can be extensively customized, and you can get the source code when you buy a commercial license. Oh, and O’Reilly use it.
More Information
I hope this article tickled your fancy, however odd that phrase might sound. If you’d like to learn more about DocBook, try to sift through Mark Johnson DocBookmarks, get brain-washed, and dip your toes into the details.
Ruby vs. Java Wars
September 26th, 2007 • General
Is it just me, or did “framewar” hell break lose this September? Cedric Otaku calls Obie Fernandez a Rails bigot. Obie, perhaps in jest, ticks people off. Then realizing the error of his ways, attributes it to humor. Obie, surprisingly gets a decent, objective response from Daniel Spiewak. Daniel, weary of Ruby community’s reactions, clarifies his “self-avowed Rubyism”.
Point #9 in Daniel’s post caught my eye. I have two colleagues of mine, one is a die-hard Rubyist, the other is a Perl/PHP coder (Perl for fun, PHP for profit). When they asked me how I think Ruby compares to Perl, this was essentially my response:
Ok, admit it, Ruby’s syntax is complex. If you don’t think so, then you don’t know the language as well as you think you do. All that DSL goodness comes at a price. Flexibility is hard to build into a language, and as a result Ruby has all sorts of weird constructs and cryptic tricks required to get things done.
— Daniel Spiewak on Ruby vs. Java
Rails is The PAT of Ruby
September 25th, 2007 • General
“There must be a reason for such strong reactions” was all I could think of following yesterday’s flamewars. These spiteful comments couldn’t have come out of nowhere. And then I remembered PHP, and how active it was during 2002-2005. I remember being excited about every single release, and every new template class, and every tiny framework. But what I remembered most was PHP community’s attitude towards Perl.
PHP people knew that Perl wasn’t competition, that PHP filled a different niche than Perl, and more importantly, that the Perl ecosystem is a rich source of information, proven code and tried ideas. PHP learned from Perl instead of fighting it. Just like Perl learned from awk, which learned from grep.
The PHP community engaged in flamewars only when the pace of PHP development declined. When PHP stopped being as good as it used to be. Many concentrated on deflating Rails’ hype instead of doing any real work; many have failed.
And this wasn’t as obvious within the PHP community as it was within Java’s. Ruby was a real threat to Java purists. Here came a language, with a clean object-oriented design, fun to write with, and supported by many libraries; what’s not to like? If you pay attention, you’ll notice a pattern of superfluous arguments: Ruby is not as fast as Java, Ruby is still immature, Ruby isn’t enterprisey. None hold any water.
It’s pointless to argue against a programming language, even more so a programming framework. There’s a right tool for every job, and some right tools are destined to give way to newer, better tools.
I’m sure my PHP readers can remember PAT and notably patTemplate. This collection of libraries is one of the oldest in PHP history. At its time, PAT was widely used by many developers, especially those who like templates and want to keep their View clean of PHP code.
PAT grew old and slow, while new libraries emerged doing what PAT did in different ways. But there was never a reason to despise PAT. If anything, we respected PAT even more.
Rails is a tool that will give way to a newer, better tool which will do exactly what Rails does, but in a different way. Rails feels to me like the PAT of Ruby, except “Ruby Application Tools” doesn’t sound as good an acronym.
On Rails Hype, Bias, and Hypocrisy
September 24th, 2007 • 1 comment General
Derek Sivers, probably unintentionally, struck a sentimental nerve in the Rails community.
The odd, and immature reactions piled up quickly. PHP people hailing for their savior, and misty-eyed Rails followers, fiercely debunking the “myth” of Mr. Sivers. The chaos overshadowed a serious concern: An obscure suicide note, published by none less than David Heinemeier Hans, the originator of Rails.
Rails’ hype was over in July 2007. Of course that didn’t, and should not be a reason to abandon such a curious project. We’ve learned, over the course of years, that rewriting is bad. That it’s a thing you shouldn’t do. It’s interesting that nobody complained a couple of years earlier, when they really should have; but then again, one man’s poison, is another man’s meat.
The Rails community backlash against Derek sounds like it came fresh from the ’80s, where a lonely teenager is tricking his phone to carry a name-calling response to some absurd BBS group over a noisy line. Haven’t we learned that calling others stupid backfires? We are seven years well into the 21st century, I thought lousy telephony was dead.
Derek’s post was awfully misinterpreted. It feels like he’s been cast out a sacred cult called Railsotology, and condemned to glance back in regret at the puffy pillows of social praise. There’s a strange divide between the friendliness of the Ruby community, and Rails. The latter was able to stand on Ruby’s shoulders, but never learned why Ruby really took off. I realize that blind arrogance would suggest Rails pulled Ruby into the light, but I big to differ.
The language of bias is exactly what made Rails interesting. “Opinionated”, they said, and we liked it. I liked it, and I’m the most opinionated person I know.
Bias sold Rails. It was the very essence of its marketing efforts. Denouncing bias is bias in itself. So is condoning a rewrite, and condemning another.
