Welcome to the xlinkit tutorial. This is a technical tutorial that will give you an overview of how the various components of xlinkit work together. Using the example of a shop that sells bicycles, this tutorial will show you:
- how to manage integrity between distributed XML documents, and between XML documents and a database
- how to automatically create reports that give details on integrity problems
- how to create hyperlinks automatically and produce a portal using xlinkit’s link generation technology
WILBUR’S BIKE SHOP
Wilbur’s Bike Shop, is a chain of bicycle retailers that sells and repairs bicycles and makes information available on the Internet and a corporate intranet. Wilbur’s use XML both for Web publication and data storage, and also holds some its information in a database.
The information made available by Wilburs includes:
- a product catalogue [xml | html] – containing product name, product code, price and description
- advertisements [xml | html] – containing product name, price and description
- customer reports [xml | html] – listing the products purchased by particular customers
- service reports [xml | html] – giving problems with products reported by customers
The product catalogue is a single XML file held by Wilburs corporate staff. The individual shops produce localized advertisements for products that they post on the Internet. These advertisements are maintained by the shops and are stored in XML files, so that they can be styled and turned into HTML pages automatically. Customer reports are also separate XML files produced by the regional sales staff. Finally the service reports are held in a database managed by the customer relations department.
Wilbur’s have tried to manage the integrity of their data manually, this has costs them a lot of effort and added significant complexity to their business processes. Some of the problems Wilbur’s has encountered include:
- products were advertised before they had been entered into the central catalogue, leading to supply problems
- products were advertised at prices different from those in the catalogue, sometimes because of local special offers but more commonly due to errors, leading to complaints and extra cost
- customers reported problems with products which were not actually purchased from Wilburs, leading to extra cost incurred by the customer relations department manually checking the customer reports
In addition, the intranet web page for each shop was constructed by hand, Wilbur’s shops link their advertisements to the catalogue so that customers can find technical details, but links entered by hand were fragile and often out of date.
xlinkit can address all of these problems. xlinkit allows Wilbur’s to specify consistency constraints that match their business rules. All of their data sources, that is the XML files and the database, can be plugged directly into xlinkit and can be checked against the constraints.
Let’s look at the first two problems listed above – advertisements for products that are not in the catalogues and pricing errors. We can specify the following consistency constraints:
- All advertisements must reference products that exist in the product catalogue
- The advertised price must match that in the catalogue
xlinkit provides a very powerful constraint language. It significantly exceeds the expressive power provided by XML Schema and other traditional schema languages. xlinkit’s consistency constraints are written in XML. You may recognise that the language is essentially first-order logic. Our first example constraint can be expressed as follows:
<consistencyrule> <forall var="a" in="/Advert"> <exists var="p" in="/Catalogue/Product> <equal op1="$a/ProductName/text()" op2="$p/Name/text()"/> </exists> </forall> </consistencyrule>
and our second constraint as follows:
<consistencyrule> <forall var="a" in="/Advert"> <forall var="p" in="/Catalogue/Product> <implies> <equal op1="$a/ProductName/text()" op2="$p/Name/text()"/> <equal op1="$a/Price/text()" op2="$p/Price/text()"/> </implies> </forall> </forall> </consistencyrule>
These two constraints now introduce the required dependency between Wilbur’s advertisement files and the product catalogue. Since we do not refer to the location of the files or their storage formats the constraints can remain the same when data is moved between different storage formats or internet adresses.
These rules, which are in XML, can also be rendered into HTML using a simple stylesheet that makes them look like standard first order logic. Click here to see all of Wilbur’s rules rendered in HTML.
At this point we can introduce the key feature of xlinkit. The evaluation of the constraints produces links. These links identify those elements that are related and those which fail to adhere a constraint.
Thus in the case of our first example constraint the result of evaluating it is a set of links between the product advertised and the corresponding product entry in the catalogue. If there is no corresponding product in the catalogue we link to the advertisement and provide a reference to the constraint.
The links are in the form of an XLink linkbase. XLink is the W3C standard for representing extended hyperlinks. With this linking information Wilburs can navigate between related items of information and can readily identify data integrity problems. In the case of our second example, we get an XLink between an advert and a catalogue element whose prices are not matching:
<xlinkit:ConsistencyLink ruleid="rule.xml#/consistencyruleset/consistencyrule"> <xlinkit:State>inconsistent</xlinkit:State> <xlinkit:Locator xlink:href="advert3.xml#/Advert"/> <xlinkit:Locator xlink:href="catalogue.xml#/Catalogue/Product"/> </xlinkit:ConsistencyLink>
Using these consistency links, xlinkit provides precise diagnostics that can link inconsistent items that have been arbitrarily distributed.
Our second example is much like the first however in this example we just want to identify the data integrity problems and do not want to link all the related items. xlinkit allows you to choose the linking behaviour you want by annotating the constraints. Thus:
<consistencyrule> <linkgeneration> <consistent status="off"/> </linkgeneration> ... </consistencyrule>
You have seen that xlinkit takes a set of rules and a set of documents and produces XLinks . When you use a servlet deployment of xlinkit, like the one available through this web site, it will take the URL of a rule set (an xml document that references the rules) and a document set (an xml document that references the documents you want to check). You can look at Wilbur’s document set and rule set. (Your browser may not display these XML files correctly – you can use View Source in that case). After a check, xlinkit will return to you the URL of the XLink linkbase containing the results of your check.
You can now perform a live check of Wilbur’s bikeshop. Alternatively, you can check your own documents online.
Using our report generator Pulitzer, Wilburs can create reports that give details on integrity problems. You can see the sample report produced for Wilbur’s here.
ALTERNATIVE DATA SOURCES
In our third example did we sell the goods reported as problematic to the customer reporting the problem? our constraint looks much the same as the example above but we need to use the service reports which are in a database (for instance mySQL).
To do this we use a “fetcher”. These fetchers are referenced in the document set and can be written for almost any proprietary data storage format. The rules are insulated from the way in which data is held providing an elegant abstraction. Our default fetcher simply reads its data from an XML file. The service reports require the use of our JDBCFetcher which uses JDBC and an SQL query to retrieve a table from a database and turn it into an XML DOM tree. All that is required to load a table is an additional attribute in the document set:
<Document fetcher="JDBCFetcher" href="jdbc:mysql://www.xlinkit.com/wilburs#select * from report"/>
When the links are generated they point back into a virtual XML file which can be extracted from a database.
You have now seen the main features of xlinkit but not the very many ways in which it can be applied or the power of the xlinkit language. To do this you should see our demonstration examples. You can also continue from here by looking at our educational portal case study.
There are many other features of xlinkit not included in this quick tutorial. They include:
- replace complex constraints with macros
- script your own ‘plug-in’ predicates
- check incrementally for small changes
- manage your checking sessions
- check VERY large sets of data
- deploy xlinkit as a component in other enterprise architectures
- automatically generate repairs for data integrity problems
There are extended technical descriptions in our White Papers.