What is SAX?
-
Stands for “Simple API for XML”.
-
An open-source API for accessing parsed XML documents.
-
An event-based API for accessing XML documents.
-
A de facto standard for event-based XML processing.
-
A way to access XML document content without holding the entire document in memory.
Benefits:
Application Development
- Enables processing of arbitrarily large documents, especially in “pipeline” processes such as data extraction, XML import, or simple transformations.
- Enables use of a variety of SAX implementations in different programming languages.
- Simplifies new program development by reducing the number and types of data access interfaces.
- Eliminates the need to develop application-specific code for parsing and accessing XML documents.
Application Maintenance
- SAX programming knowledge and skills are easily transferable across applications and programming languages.
- Same application code can be quickly and easily bound to different SAX implementations.
How SAX Works:
XML documents are made of elements represented by pairs of start and end tags. The SAX API is implemented by a SAX parser , which is then used by an application-specific SAX processor . As the SAX parser parses an XML document, it generates events for each thing it hits: start tag, end tag, attribute, text content, etc. These events are passed to the SAX processor, which examines each event to get the relevant data from it. For example, from a start tag event, the SAX processor can get the tag name, which it might use to then trigger some specific processing. SAX processors only need to handle the events that are relevant to their task—this allows for very efficient processing, especially when the application only needs to look at specific parts of the document. This makes SAX-based applications particularly well suited for data import and extraction tasks, where information from XML documents is being loaded into other systems.
Unlike DOM based Processors, SAX parsers do not remember anything about the document they are processing—once an event has been passed to the SAX processor it is forgotten. If the SAX processor needs to remember part of the document as it is doing its processing, it is up to the SAX processor to hold the information in memory. This tends to make SAX processors very fast and capable of processing arbitrarily large documents. However, for some tasks, especially complex transformations, it may be more efficient to use a DOM-based process, where the entire document is automatically made available.
Because building a SAX-based application involves implementing a number of event handlers, rather than just interacting with objects as in DOM-based applications, it often takes more work to implement a SAX-based processor than to use a DOM-based program. That is, it is usually easier to get a DOM-based program up and going initially. However, the extra effort needed to implement SAX-based applications is often warranted when the task at hand requires the fastest possible processing, such as high-volume transaction systems or the processing of very large documents or sets of documents.