Collaborations‎ > ‎

ECommerce Lecture

ECommerce Lecture

Ecommerce, Web Services and XML

July, 2002

A Web Robot - service available since 1994.

A series of lectures presented in July 2002 for the subject - Comp3410 Information Technology in Electronic Commerce at the Department of Computer Science, Australian National University.

The lecture series provides an introduction to XML, DTD's, schemas and XSL transformations presented in the context of web services. Two web services featured are an online robot and infrastructure for supporting Supervisory Control and Data Acquisition (SCADA) with thin clients utilising the GSM phone network for communication.

Also discussed are SMS messaging on mobile phones and some aspects of human behaviour important to the commercial success of web services, in particular the significance of Zipfs distribution.

Links to other eCommerce Lectures pages


XML - eXtensible Markup Language

Ken Taylor - incorporating course notes by Ramesh Sankaranarayana July 2001

     A language for creating other markup languages...

Xml defines content and not presentation.

  • Derived from SGML.
  • Create your own tags.
  • Well defined structure.
  • XML is strict about syntax.
  • Unlike HTML errors in XML syntax halt document processing, and users or applications receive error messages, not a best-guess interpretation of the document structure.
  • This removes ambiguity.
  • XML can be used to create markup languages like HTML.


Ken Taylor based on course notes by Ramesh Sankaranarayana July 2001

      HTML defines format...

Look at the following HTML example.

<H2>John Doe</H2>
<P>2 Backroads Lane<br>

  • Note: This is not valid XML as its not well formed.
  • Well formed HTML is called XHTML.

This will display as:

John Doe

2 Backroads Lane

What we are specifying here is how the document is to be rendered, and not what information is contained in the document.

While a human could read this and gather information about the embedded data, a machine could not. Humans can assist machines to obtain information from such pages by a technique known as page scraping. An example is Betman which interacts with the NSW TAB site.

Decoupled content

Ken Taylor based on course notes by Ramesh Sankaranarayana July 2001

     XML defines content...

Now look at the following XML example.

<name>John Doe</name>
<address>2 Backroads Lane</address>

What is being marked here is the structure of the data, not the way in which it is presented. The content has been decoupled from the presentation, which can now be done in several ways based on the same data. This form is readable by both humans and machines.

  • The same information can now be presented differently.
  • Formatting for display on different devices.

Interacting Applications

Ken Taylor based on course notes by Ramesh Sankaranarayana July 2001

     A programmable web...

Marking up data to define content allows easy exchange of information between programs. Of course, both programs must have the same understanding of the meaning and order of the tags. XML syntax rules and the fact that the grammar of such tags can be specified, ensures that this is possible.

A non XML solution. Telerobot operators could design their own interface.

  • Database backed applications.
  • Applications that provide a service. For example an electronic payments gateway Eway.
  • Applications that utilise data from the web. For example an Excel spreadsheet for interacting with Full Stops.

Web Services and Zipfs

Web Services and Zipfs Distribution

Ken Taylor based on course notes by Ramesh Sankaranarayana July 2001

     Be the best...

Some observations on web services.

  • Consumers have captured most of the benefits.
  • Good for humanity but bad for your company.
  • Consider advertising as a revenue model. For the first eight months of 1998 the telerobot delivered an average of 32,439 operator and observer pages per month. This usage could generate an income of US$243 per month from advertising. Our single robot could only be used by one person at a time so at flat out could have produced no more than US$385 per month. More trouble than its worth.
  • I discover Zipfs distribution and give up on advertising.
  • Zipfs law also applies to referring sites.

As seen above, the Perth telerobot did show an approximate Zipf distribution of referrers but the discrepancy between the sample and the Zipf distribution is not due to sampling error. This is shown by the chi squared statistic of 5707 with 774 degrees of freedom which indicates that the probability of the difference being due to random sampling error is almost zero. Apart from the problem of classifying referrers, it was also observed that referral numbers changed over time. The telerobot was used by some school groups and was mentioned in the press from time to time. This causes short-term changes in the number of referrals from a particular address. This effect was strongest when the telerobot featured in radio shows Net Talk Live (1995) in the United States and Safari (Heldal 1998) in Norway. The largest number of requests to the ABB1400 telerobot recorded in a day (1782) occurred on the day the telerobot was featured on Net Talk Live. Referring sites also gain and fade in popularity over time which will affect the number of referrals they generate.

The uncertainty of categorising referrers and the change in referrer numbers over time suggests that Zipf's law does not provide an adequate explanation for referral data however, it does seem to provide a useful approximation.

Web Services and XML

Ken Taylor based on course notes by Ramesh Sankaranarayana July 2001


Zipfs tells us we have to be the best. One way to do this is do very little but do it very well.

  • XML is an important enabling technology.
  • Application service provider. Again XML is an important enabling technology.
  • Doesn't need big organisations. Empowers individuals and enables atomisation.
  • Some examples of ultraspecialisation using SOAP, a standard built on XML

History of XML

A bit of history...

Ramesh Sankaranarayana July 2001

     Where did the idea come from...

In 1969, Charles Goldfarb was leading an IBM research project on integrated law office information systems. Together with Ed Losher and Ray Lorie, he invented the Generalized Mark-up Language (GML). It was used as a means of allowing the text editing, formatting and information retrieval subsystems to share documents. IBM now produces over 90% of its documents with it.

Goldfarb carried on the work on GML and invented Standard Generalized Mark-up Language (SGML) in 1974.

This was later adapted for use as an all-purpose information standard. It was established as an ISO standard in 1986.

It is extremely powerful, but complex. Used quite a lot in the domain of electronic publishing.

Then came Hyper Text Mark-up Language (HTML). Tim Berners-Lee and Anders Berglund invented a tag-based language for marking up technical documents that a group of scientists in Europe shared over the Internet. This was later expanded to a simplified application of SGML and called HTML, the mark-up language that we know and love.

HTML is the language of the web for rendering of documents. It has a fixed set of tags and is used primarily for defining how content is to be displayed. It is a particular instance of a mark-up language and can't be used to define new mark-up languages.

The World Wide Web Consortium (W3C) combined the power of SGML with the simplicity of HTML and came up with XML. It is a subset of SGML. The latest specification standard is XML 1.0, released in February 1998. The specification for XML is less than a tenth of the size of the SGML specification. It has many of the features of SGML including:

  • Extensibility
  • Structure
  • Validity

It is meant to be interoperable with both SGML and HTML. 

XML Basics

Ken Taylor based on course notes by Ramesh Sankaranarayana July 2001

     What you need to know...

<?xml version=1.0?>
<!- An XML example -->
<name>John Doe</name>
<address>2 Backroads Lane</address>

What this means.
<?xml version=1.0?>

is a processing instruction. Conveys useful information to an application. The above says that this is an XML document and uses XML specification version 1.0.

<!- An XML example --> 

is a comment. You can't use a double hyphen inside a comment.


is an element whose type name is contact. Every element has a start tag and end tag. You can have nested elements.

Keep the following rules in mind:

  • element type names are case sensitive.
  • each element must have a starting and ending tag.
  • tags must maintain their order in nested elements.
  • <contact/> denotes an empty element.
<movie type="mystery" rating="R" year="1968">
The Guns of Navarone

Here; type, rating and year are attributes.

The combination of elements and attributes, as shown in the examples above, can be used to define the structure of various types of data. In effect, each such definition represents a mark-up language for a specific type of data.

Look at namespaces.

Parsers and the DOM

Ken Taylor based on course notes by Ramesh Sankaranarayana July 2001

     Makes it easy to manipulate...

XML parsers or processors are used to check if an XML document obeys the XML syntax rules, conforms to the data model where one is specified and to provide access to the Document Object Model (DOM) interface. A document that obeys the rules is called a well-formed document. Parsers typically build tree structures from XML documents. There are many different XML parsers available. Examples are:

  • The W3C XML Parser used in Amaya, which is the W3C's open source HTML and XML editor and browser.
  • msxml, Microsoft's XML parser that is now up to version 4. Problems arise in incompatibility between versions when applying XSL transformations or checking against a data model specification.

An alternative to manipulating an XML document with the DOM is SAX.

Many markup languages have been defined for different applications. Some examples are:

  • MathML. Mathematical Markup Language.
  • SpeechML. Speech markup language
  • SMIL. Synchronized Multimedia Interface Language.
  • GML. Geographic Markup Language.
  • SVG. Scalable Vector Graphics

When using such a language it is important to check that the resulting XML document obeys the rules of the underlying data model. For example, every movie element has to have the attributes type, rating and year. The data model can be specified Using Document Type Definitions (DTD)'s or Schemas. An XML parser can also check if the given XML document adheres to the specified data model. Such a parser is called a validating XML parser and an XML document that it successfully parses is called a valid document.

It is not required that every XML document have an accompanying data model specification.
  • Look at an SVG document.
  • Look at W3C. In particular schema definitions.
  • Also Microsoft schema reference.

Validating XML

Ken Taylor based on course notes by Ramesh Sankaranarayana July 2001

     Describing Information...

For applications to communicate it is necessary that they have a common understanding of the information communicated. Consider the following:

<title>John Doe</title>
<author>Michael Morrison, et al.</author>
<publisher>Sams Publishing</publisher>

It is a well formed document, since it obeys all the rules of XML. But there is a required element that is empty. The only way that an XML processor, or some other application, can figure out that this element is not allowed to be empty is if it is provided with a specification of the underlying data model.

There are two techniques for specifying document content. They are document type definitions and XML Schemas. DTDs are older and stable. Schemas are new and dynamic (a bad thing). DTD's require learning another new language. Schemas are defined with XML. DTD's are limited in what can be specified. DTD's define some important standards e.g. SVG.

The DTD method of defining the data structure is something that is inherited from SGML. Specifying the rules that structure a document. is done using EBNF (Extended Backus-Naur Form). For example, the book document structure could be defined as follows:

<!ELEMENT catalogue (book+)>
<!ELEMENT book (title, author, isbn, publisher, year?)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT publisher (#PCDATA)>

Every DTD must have a root element, which in this case is catalogue.

The line:

<!ELEMENT catalogue (book+)></PRE>

says that the root element catalogue consists of one or more book elements.

The line:

<!ELEMENT book (title, author, isbn, publisher, year?)>

says that the book element consists of the stated five sub-elements, which have to appear in the same order, of which year may or may not occur. The rest have to appear exactly once.

The line:

<!ELEMENT title (#PCDATA)>

indicates that the element title consists of parsed character data.

We now have an XML document and the corresponding DTD. How do we specify the DTD in the XML document? There ar etwo ways of doing this:

  • External DTD's and
  • Internal DTD's.
First consider external DTD's. Assume that the DTD is stored in a file called catalogue.dtd. We can then refer to it from within an XML document as follows:
<?xml version="1.0"?>>
<!DOCTYPE catalogue SYSTEM "catalogue.dtd">

The above states that the root element of the XML document is catalogue and the DTD is contained in an external file catalogue.dtd.

Some of the advantages of using an external DTD are:

  • The DTD is reusable.
  • The document is cleaner.
  • Almost a requirement for interacting applications.

In contrast an internal DTD is contained within the document itself. This is done as follows:

<?xml version="1.0"?>>
<!DOCTYPE catalogue ["catalogue.dtd">
<!ELEMENT catalogue (book+)>

You can use both external and internal DTD's in the same docuemnt. If there are elements with the same name, then the internal one overrides the external one. Internal DTD's are used when:

  • Only a single document is being created.
  • There is a need to minimize overhead.
Alternatively use XML Schemas. Also see how schemas are used for validation.

SMS Formatting

Ken Taylor July 2001

     How does SMS messaging work...

Alternatives for transmitting data over the GSM network are:

  • Dialing, can use DTMF codes transmit information to the dialed number after answering.
  • Circuit switched data.
  • SMS messaging.
  • GPRS.
SMS messages are routed via a message centre. The format allows text and binary messages to be sent. The binary formats are used for over the air provisioning, address book (WAP), calendar information (WAP), icons, ringtones etc. Some of this is generic to other handsets that conform to the WAP standard. Also there is a IETF standard for SMS messaging with XML.

Transforming XML

Ken Taylor July 2001

     Turning XML into another form of XML...

XML needs to be transformed for:

  • Display. e.g. as HTML.
  • From one XML schema to another. e.g. GML to SVG and Icon to SMS base.

Methods of transforming XML are:

  • Manipulating programmatically e.g. via DOM.
  • Cascading Style Sheets (CSS) - Works in mozilla browser.
  • XSLT.

Processing XSLT can be done with the Microsoft parser, there are other XSLT processors, Xalan recomended by a colleague, W3C XSLT standard is stable. Microsoft XSLT reference easier to interpret.

  • Look at how this page is transformed.
  • Look at phone book sample application.

Soap in 1 minute

Mike Kearney July 2001

     Getting into a lather...

Soap is yet another implementation of remote procedure calls, this time using XML over HTTP.

Essentially soap is a standardised way of interacting with a web server using XML encoded requests and XML encoded reponses. These two diagrams for the client and the server illustrate this. For an overview of soap see this

Class Exercise

Mike Kearney July 2001

     Best way to provide web services...

The lectures have discussed a variety of ways to interact with a web server. The following table enumerates most of the combinations. Examples discussed in the lectures are marked.

There are three IT applications:
  • A web page that contains a form. The form provides radio buttons that determine the type of information requested. The results are viewed in a web browser.
  • A web page that present information such as flight arrival information. The web page is targeted at PC browsers, Handheld browsers, WAP browsers, even SMS messages.
  • A set of VB macros for an Excel spreadsheet that accesses data from a remote experiment.

For each application discuss and be able to report on:

  1. Which combination is best, which is next best.
  2. Why a particular combination is best. You need to consider "best" from at least two perspectives: functionality, implementation, flexibility, ..
  3. Prepare a sketch block diagram that provides an outline of the processing steps. (Imagine you are the technical lead and you need to explain how the application is going to work to a junior developer). Do this for both the "best" approach and the "next best" approach. Be prepared to stand up and present both explanations.

Organization and timing

  1. Break into groups of between 4 to 6 people around where you are currently sitting.
  2. Decide who is going to be the spokesperson for your group.
  3. As a group consider discussion points 1 and 2 above for each application and be prepared to present the outcome to the rest of the class. YOU HAVE 10 MINUTES
  4. Randomly selected groups will explain to the class what they have decided.
  5. As a group consider point 3 above. YOU HAVE 10 MINUTES
  6. Randomly selected groups will explain to the class what they have decided.

Tutorial Exercises

Ken Taylor based on course notes by Ramesh Sankaranarayana July 2001

     Try these exercises...

  • Look at the XML source for this page.
  • Look at the XSL translation that formats the XML for display in your browser. The translation is applied on the server.
  • If your browser is IE5 or above this is the same XML source translated in your browser. View the browser source and note that you don't see HTML. How is this specified? This page will only be properly formatted if you have the correct MSXML version installed?
  • The exercises below may be difficult to do without access to IE5. Alternative tutorial work is available here.
  • Copy the the XML and XSL source to your computer. Modify the XML file so that the XSL file is referenced from your computer. If its not apparent how to do it, leave this and the next exercise and return to it after the later exercises.
  • Modify the XSL so that the dot points on this page become a numbered list.
  • XML tutorial provided by Microsoft. Do lessons 1 to 6.
  • Work through the Zvon examples which demonstrate the use of cascading stylesheets to render XML.
  • Work through the Zvon XSL samples in the non-frames version or the frames version which demonstrate *XSL transformations.
  • Or try out the Microsoft XSLT tutorial.
Try out the Adobe SVG demos.