In this tutorial you learn the structure of parliamentary documents in XML, and you learn the basics of querying them with XPath expressions.
XML is a format for adding structure to text. To see how this works, open h-tk-20092010-41-4027.xml in Firefox. You will see words written like this: <docinfo> and normal text. Also you see little minus signs on the left. You can click on these and then close that part of the file. An XML file is a nested structure which resembles a tree. By clicking on the plus and minus signs, you hide or bring back part of the tree.
Click on the minus sign next to <root>. Now open the root again by clicking on the plus. Root has three "children": docinfo, meta, and proceedings. Close them all by clicking on them.
Now you can look at the three bits of information one-by-one. Docinfo just contains some technical information. Close it quickly. Meta contains metadata about the document itself, using the Dublin Core description. Open and close some of the children of meta. Find the source XML from which this file was created by looking for the dc:source element. Take that URL and paste it in a new tab in your browser. Open and close again some of the elements.
The proceedings element contains the interesting stuff. It contains the meeting notes of one day of parliament. The proceedings are divided into topic's. Inspect the first one. Topic's again are divided into scene's. The first topic has just one scene below it. The second topic has several scenes. Scenes themselves are subdivided into speeches and stage-directions. If you study a speech you can see who gave the speech, sometimes at what time, etc. You can also read the different paragraph's the speech was made of.
Now we will see how you can query XML documents using XPath. Follow carefully the steps outlined here.
<xsl:variable name="myxpath" select="//scene[1]"/>
<xsl:template match="/">
<h3 name='search'>Result of expression </h3>
<tt>
<xsl:copy-of select="$myxpath"/>
</tt>
</xsl:template>
The most important thing for you to remember is that variable named myxpath. That is your XPath query. In the xsl:template below it, that query myxpath is executed on the proceedings XML file, and the result is copied to the screen. Your browser does all that for you!
If you try these examples yourself, put $document in front of the query.
| XPath | Query | |
|---|---|---|
| //scene | Return all scenes | |
| //scene[5] | Return the fifth scene | |
| //scene[@party='D66'] | Return all scenes by speaker of party 'D66' | |
| //scene[@party='D66']//speech | Return all speeches within those scenes | |
| //speech[contains(p,'oorlog')] | Return all speeches with a paragraph containing the string 'oorlog' | |
| data($document//speech/@speaker) | Return all names of speakers of speeches | |
| //speech[.//stage-direction] | Return all speeches which have a stage-direction |
We have asked above for values of XML elements, like speeches and scenes. We also want to ask for the values of attributes. A question like "Who speaks during the second scene?". The XPath expression for this is simple: //scene//speech/@speaker. The rule is that if you want the value of an attribute you put the @-sign in front of its name.
Caveat.However, if you put this expression in our stylesheet as before, you see nothing. Try it. This is because the browser treats attributes different from elements. To solve this, replace the template in QueryProceedingsByXPath.xsl by this one:
<xsl:template match="/">
<h3 name="search">Result of expression </h3>
<tt>
<xsl:copy-of select="$myxpath"/>
</tt>
<h3 name="search">Result of attribute expression </h3>
<tt>
<xsl:for-each select='$myxpath'>
<xsl:value-of select="."/><br/>
</xsl:for-each>
</tt>
</xsl:template>
You see, that we use another XSLT command to show the value of a list of attributes.
Now experiment with asking for attribute values.