the Bush Files: XML
Posted by the markITeer on May 23, 2007
This post is the first in a series of things ‘everybody-assumes-you-know-but-actually-you-don’t-have-a-clue’.
I call them : the Bush Files.
Today : XML
XML stands for eXtensible Markup Language.
Let’s start with the last part first : It’s a language, a way of communication, of sharing data, mostly between two applications. But it’s not just a language, it’s a ‘markup‘ language. This means that it combines text and extra information about this text. In the case of XML, it’s information about the structure of the text. Finally, it’s extensible, meaning that you can extend the language, invent new ‘words’, so you can have it say exactly what you want it to say.
let’s take a look at a simple example:
<books> <book ISBN="1400079179"> <title>The Da Vinci Code</title> <author>Dan Brown</author> </book> <book ISBN="0345340426"> <title>The Lord Of The Rings</title> <author>J.R.R. Tolkien</author> </book> </books>
What does this tell us?
1. XML is readable: it’s just text and can be read in any text editor, like eg Notepad. Handy!
2. the extra information about our text is put in descriptive ‘tags’ between ‘<‘ and ‘>’
3. text is put in between opening and closing tags: eg <book> …… </book>. The opening tag, text and closing tag together are called an element
4. There is 1 element containing all data. In our example this is the <books> …. </books> element
5. inside an opening tag, you can also put some extra information in an attribute : eg <book ISBN=”1400079179″>
The fun thing is, you can invent any tags and attributes you want, as long as the one you’re sending the XML to ‘speaks’ the same language… That’s why, once a language has been defined between the sender(s) and the receiver(s) of the XML, it can be described in a schema. That way, everybody knows which words can be used so that every-one can understand what the others say.
If you speak the correct language, use the words you and the other people/programs that have to work with the XML have defined, the XML is said to be valid. If your XML is syntacticly correct (see items 3 & 4 above), it is said to be well-formed.
So, that’s basically all there is to XML. Not so difficult, he?
P.S. If you take a close look at HTML, the language used for building web pages, you’ll see … that it’s really XML with it’s own pre-defined tags and attributes! The original HTML wasn’t really well-formed though. That’s why they invented XHTML, which is the same as HTML, but this time fully well-formed.