The Application Profile

The Application Profile

What is an application profile?

The concept of the application profile is not new and therefore several definitions have been provided by different communities. An article published in the Journal of Digital Information, recounts several such definitions from communities such as Z39.50, IEEE standardization, FGDC etc. [6]

With their experience in the European DESIRE project, Heery and Patel introduce ‘application profile’ as a type of metadata schema.

Their definition:

An application profile is a type of metadata schema which consists of data elements drawn from one or more namespaces, combined together by implementors, and optimised for a particular local application. [7]

Essential Definitions

The three most commonly used terms in this document are described below for the sake of clarification.

Element – An element is described as a unit of data or metadata. The element allows us to give more information about the described information.
Element Refinement – An element qualifier makes the meaning of an element either narrower or more specific. Additionally, element refinement shares the meaning of the unqualified element, but with a more restricted scope. When a client does not understand the element refinement, it can be ignored and the value is used as content of the unqualified element.
Encoding Scheme – An encoding scheme aids in the interpretation of the value of an element. Encoding schemes may either be controlled vocabularies or formal notations. A value drawn from an encoding scheme can be taken from a controlled list of vocabulary (e.g. a term from a classification such as Lawi Subject Categories or a term from a thesaurus such as Legal Thesaurus). Formal notations are used to format a value of an element (e.g., date expressed the “YYYY-MM-DD” format). When a client does not understand the encoding scheme, it can be still useful for human readers.
e.g.

 

Element Qualifier
Element Refinement(s)
Encoding Schemes/Controlled List
Subject Subject Classification ASC
CABC
DDC
LCC
UDC
SubjectThesaurus AGROVOC
CABT
ASFAT
NALT
MeSH
LCSH

Other definitions

XML

XML, the eXtensible Markup Language, is the universal format for structured documents and data on the Web. It is designed to improve the functionality of the Web by providing more flexible and adaptable information identification. It is called extensible because it is not a fixed format like HTML (a single, predefined markup language). Instead, XML is actually a ‘metalanguageÂ’ — a language for describing other languages — which lets you design your own customized markup languages for limitless different types of documents. All these features make it an attractive standard for exchanging data.

An XML document is a collection of data. In many ways, this makes it no different from any other file. As a “database” format, XML has some advantages. For example, it is self-describing (the mark-up describes the structure and type names of the data, although not the semantics), it is portable (Unicode), and it can describe data as tree or graph structures.

Except for unparsed entities, all data in an AGRIS AP XML document is PC DATA (for elements) or CDATA (for attributes) text, even if it represents another data type, such as a date or an integer. Generally, the data transfer software will convert data from text (in the XML document) to other types (in the database) and vice versa.

XML is a content mark-up meta-language designed to store and display documents on the World Wide Web. By separating content from presentation, XML enables us to create information that can be more easily integrated with other Web resources.

The Document Type Definition (DTD)

The purpose of a DTD, or document type definition, is to define the legal building blocks of an XML document. It defines the document structure with a list of legal elements. The advantages of the DTD are many, viz. each of your XML files can carry a description of its own format with it; independent groups of people can agree to use a common DTD for interchanging data; your application can use a standard DTD to verify that the data you receive from the outside world is valid; and you can also use a DTD to verify your own data.

It is essential that the structure of the XML output documents exactly match the structure expected by the DTD. Mapping the local database schema to an XML DTD schema is the most important exercise that is undertaken in this context.

Namespaces

The W3C XML community defines a mechanism called XML namespaces, which can be used as single XML document containing elements and attributes that are defined for and used by multiple software components. This use by multiple software promotes reuse and restricts reinvention. Their definition:

An XML namespace is a collection of names, identified by a URI reference which are used in XML documents as element types and attribute names. XML namespaces differ from the “namespaces” conventionally used in computing disciplines in that the XML version has an internal structure and is not, mathematically speaking, a set.

In this context, all the newly defined elements in a Metadata Element Set constitute a namespace. The metadata element set defines elements needed to accurately describe various types of information resources in the domain of agriculture. This element set is maintained at a stable location and identifies a reference point where elements are defined and are maintained to be used by different applications.

XML and Databases

Today most bibliographic data is stored both in relational databases, such as Oracle and SQL Server 2000 and other database systems that support XML using different approaches. These products allow easy publishing, managing, and sharing of content on corporate intranets and Web. An important characteristic in this respect is that they are bidirectional. That is, they can be used to transfer data both from XML documents to the database and from the database to XML documents.

Tools to validate XML documents

Validating parsers check the well-formedness of the XML documents and verify that the same documents conform to the specific rules of the AGRIS AP XML DTD. The process of validating can be easily achieved with the Microsoft XML Parser (MSXML) which is included in Microsoft Internet Explorer. In the next section we will see that AGRIS AP XML validation is facilitated in that the AGRIS DTD is located in a fixed (PURL) location.

Other XML parsers, many of them freeware, are available on the Internet (1). The tool that is most widely used is XML Spy (2), a comprehensive package used to create, edit and validate XML, XSL and DTD/XML Schemas documents.

References

1. http://www.xml.com/pub/rg/XML_Parsers
2. http://www.altova.com

General References

[2] Dublin Core Metadata Initiative
http://www.dublincore.org/

[3] Agricultural Metadata Element Set
http://www.fao.org/agris/agmes/

[4] The Australian Government Locator Service
http://www.naa.gov.au/recordkeeping/gov_online/agls/cim/cim_manual.html

[5] Namespaces in XML
http://www.w3.org/TR/REC-xml-names/

[6] Baker, Dekkers, Heery, Patel and Salokhe (2001) “What Terms Does Your Metadata Use? Application Profiles as Machine-Understandable Narratives”. JoDI, Vol 2., Issue 2.
http://jodi.ecs.soton.ac.uk/Articles/v02/i02/Baker/

[7] Heery, Rachel and Manjula Patel (2000) “Application profiles: mixing and matching metadata schemas”. Ariadne, No. 25, September.
http://www.ariadne.ac.uk/issue25/app-profiles/intro.html

[8] Codes for the Representation of Names of Languages
http://www.loc.gov/standards/iso639-2/langcodes.html

[9] Codes for the Representation of Countries
http://www.iso.ch/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *