
This file contains the 1st tutorial on XML, entitled "list"

Its objectives are :
1) The understanding of a very simple XML file, without any DTD
2) The understanding of the same XML file, with different DTDs
3) The use of the simple tool "xmllint", to check that the
  XML files are well-formed, and if a DTD is present, to check
  that the files are valid
4) The examination of different errors in XML files and 
  the corresponding debugging

The timing is purely indicative.
Total estimated timing : 40 minutes.

X. Gonze 2003-09-27

=======================================================================

Step 1. The basic structure of a XML document.
About 3 minutes.

Examine the file "participant_list_1.xml", using your favorite 
text editor. Note :
- the header
- the different elements (list,workshop,version,student,name ...)
- the attributes of some elements (year for the workshop element)
- the contents of each element (some only contain text data, other
 contain other elements, one is empty).

Question 1.1. Which element is the "root" element ?
Question 1.2. Which element is empty ?
Question 1.3. Which elements have attributes ?
Question 1.4. How many times do the "student" element occurs ?
Question 1.5. Do all the elements "student" contain the same children elements ?

------------------------------------------------------------------------------

Step 2. Checking that a document is well-formed.
About 5 minutes.

Check that "participant_list_1.xml" is well-formed.
To do this, we will use xmllint, a very simple tool to analyze XML file, 
contained in standard LINUX distributions.
You should issue :

xmllint --noout participant_list_1.xml

and get the prompt back, without any error message.

Try to modify  "participant_list_1.xml" : change an opening markup,
and not the corresponding closing one. For example, change the
first occurence of "list" by "list2".

Now, you should get 

participant_list_1a.xml:62: error: Opening and ending tag mismatch: list2 and list
</list>
      ^

The document is not "well-formed" anymore.

You should now experiment a bit more with modifications of the XML file,
and the associated xmllint error messages, during a few minutes.

At the end of this step 2, issue 

xmllint | more

without any argument, so that you get a description of the different options of xmllint .

--------------------------------------------------------------------------------

Step 3. Checking that a document is valid.
About 7 minutes.

Examine the file participant_list_2.xml .
It is quite similar to the file participant_list_1.xml .
Issue the command 
diff participant_list_1.xml participant_list_2.xml

The difference is not large : only a reference to a DTD file.

Now, issue 
xmllint --noout participant_list_2.xml

You should get the prompt back, without any error message.

Try to modify  "participant_list_2.xml" : change an opening markup,
AS WELL AS the corresponding closing one. For example, change the
two occurrences of "list" by "list2".

Issue again 
xmllint --noout participant_list_2.xml

Interestingly, you get also the prompt back, without any error message.
Why is it so ??

Actually, you need the option --valid of xmllint, in order to check
the validity of a XML document.

Issue now
xmllint --valid --noout participant_list_2.xml

You get :

participant_list_2a.xml:3: validity error: Not valid: root and DtD name do not match 'list2' and 'list'
<list2>
      ^
participant_list_2a.xml:62: validity error: No declaration for element list2
</list2>
       ^

Spend a few minutes examining the file participants_A.dtd .
In particular, reexamine the five questions 1.1 to 1.5, 
and examine the DTD file mechanisms related to these features.

Then, you should experiment a bit more with modifications of the 
participant_list_2.xml file
and the associated error messages, during a few minutes.

--------------------------------------------------------------------------------

Step 4. Parameter entities.
About 5 minutes.

Examine the file participants_B.dtd .
What are the differences with the file participants_A.dtd ?
Modify the file participant_list_2.xml , so that it uses
this other DTD.
Check the validity of the file.
Question 4.1. The use of an entity has shortened some expressions
 in the DTD (at the expense of readability, but OK). 
 It is possible to define one or two other entities, 
 common to at least two places, that would 
 also simplify a bit some expressions (Actually, this might
 prove very interesting if even more occurences of these
 expressions were present).
 Find one common expression, and try to modify the DTD.
 Then, check that the xmllint validation still work.

--------------------------------------------------------------------------------

Step 5. General entities + visualisation by a "dummy" browser.
About 10 minutes.

Examine the file participants_list_3.xml .
What are the differences with the file participants_list_2.xml ?
Examine the file workshop.xml .
Check the validity of the participants_list_3.xml file .

Rename the file as participants_list_4.xml
and insert at appropriate (trial) places, in that new file,
the following general "character" entities :
&apos; 
&gt;
&lt;
&quot;
&amp;
Check the validity of the different modifications.

Insert also the following general entity from Unicode :
&#x03C0;
Check the validity of the different modifications.

Now, we would like to see the effect of these general entities.
Standard browsers will simply list the plain text contained
in the XML document, without any formatting. For formatting,
one would need to reference a CSS (Cascading Style Sheet),
but this goes beyond this introductory lecture.
So, we will simply visualize the text.

Using your favorite browser, visualize the following documents :
participants_list_3.xml
participants_list_4.xml

Question 5.1 What is the symbol corresponding to the general
Unicode entity &#x03C0; , that you had introduced in 
participants_list_4.xml ?

Still using your favourite browser, access the Unicode Web page :
http://www.unicode.org
Examine a bit this Web site, then go to "Where is my character ?"
Select the on-line "code charts".
Find the greek set of characters.
Find the character corresponding to hexadecimal 03C0

--------------------------------------------------------------------------------

Step 6. Find the error !
About 10 minutes ?! 

The following files should be examined :
participant_list_5.xml
participant_list_6.xml
participant_list_7.xml
participant_list_8.xml
participant_list_9.xml

They all are close to being valid and well-formed, but there is
still something wrong. Use xmllint to debug.

================================================================================

Answer 1.1. The element "list" is the root element :
 it appears just after the header, and contains all the other elements.
Answer 1.2. The element "version" is empty :
 <version date="2003-08-29" />
Answer 1.3. The elements "workshop", "version", "student" and "instructor" have attributes :
  <workshop year="2003">
  <version date="2003-08-29" />
  <student id="id1">
  <instructor id="id21">
Answer 1.4 The element "student" appear three times, at lines 10, 21 and 34.
Answer 1.5 The first occurence of the element "student" has the following
  child elements : name, status, email, address, phone, note.
  The second occurence has the following child elements :
   name, status, email, address, phone.
  The third occurence has the following child elements :
   name, email, address, phone
  So, their content differ.


Answer 4.1. It is possible to define

<!ENTITY % id_triplet "id       ID      #REQUIRED" >
and
<!ENTITY % CD_required "CDATA  #REQUIRED" >

And to replace the occurences of the strings by

 %id_triplet;
 and
 %CD_required;


Answer 5.1 It is the greek letter lowercase pi .

Answer 6. Hint : if you do not succeed with xmllint,
try to make a diff with the file participant_list_3.xml .
As concern the "id" attribute : remember that the value of 
a attribute with "ID" characteristics must start with a letter or
an underscore.
