Skip to content →

Tag: NSXMLParserDelegate

Parsing an XML Document

Using NSXMLParser to parse an XML document is the focus of this second installment in a series of posts that began with using NSURLRequest to retrieve a document from the web and continues with building an object tree representation of the document.

Apple makes parsing the text of an XML document rather trivial via its NSXMLParser and NSXMLParserDelegate Foundation classes. The two very much work hand-in-hand, as it is the delegate the parser notifies when it finds the opening and closing tag of an XML element, among several other events.

In the previous post, I introduced the XMLElement class and method retrieveXmlDataForURL(_:completionHandler:). This method retrieves into an xmlData property an NSData representation of an XML document. That xmlData property is used as input for the parsing operation I am now discussing here.

The first thing I do is extend XMLElement to install the NSXMLParser by adding this simple parseDocument() method (note, this adds to the XMLElement class defined so in the previous article):

class XMLElement { // excerpt

  /**
   Parse the XML document previously assigned to property `xmlData`. 
   If `xmlData` is `nil`, this method takes no action.
   */
  func parseDocument() {
    guard let xmlData = xmlData else { return }
    let xmlParser = NSXMLParser(data: xmlData)
    xmlParser.delegate = self
    xmlParser.parse()
  }
}

That will get the machine going, but stopping there would be a case of a letting a tree fall in the forest with no one to hear it. So I add some properties and an NSXMLParserDelegate method to capture the values NSXMLParser has found:

class XMLElement { // excerpt

  /// The element name, such as "BoundingBox". Every element has a name.
  var name: String!

  /// The element attributes, such as ["maxx": "180", "maxy": "90"]. 
  /// If `nil`, the element has no attributes.
  var attributes: [String: String]?

  /**
   Records the element name and attributes found by the `NSXMLParser`.
   */
  func parser(parser: NSXMLParser, didStartElement elementName: String,
    namespaceURI: String?, qualifiedName qName: String?, 
    attributes attributeDict: [String : String]) {

    name = elementName
    attributes = attributeDict
  }
}

Of the parameters given in those methods, the ones of interest to me are elementName and attributeDict. In the XML snippet:

<Layer queryable="0" opaque="0" cascaded="1">
  <BoundingBox SRS="EPSG:4326" minx="-180.0" miny="-90.0" maxx="180.0" maxy="90.0"</BoundingBox>
  <Abstract>Observed River Stages</Abstract>
</Layer>

the elementName on line 2 is “BoundingBox” and the attributeDict contains the key-value pairs ["minx": -180.0, "miny": -90.0, ...]. For the element on line 3, the name is “Abstract”, and the attribute dicitonary is nil.

There are two components of this particular XML example that are not represented in the parameter lists of those methods. The first is the content of an element; i.e., the text between the opening and closing tags, such as “Observed River Stages” in the Abstract element. The second is the hierarchy of or relationship between the elements: there’s nothing in those delegate methods that point to the parent element; nothing tells us BoundingBox is a child of Layer or a sibling of Abstract. I’ll address the first now, and leave the second for the next post in this series.

In order to pass the content of an XML element, NSXMLParser calls its delegate’s parser(_:foundCharacters:) method. What the API documentation tells us about this method is it may be called multiple times as the parser makes its way through the content of a single element; it does not provide a single aggregated string representing the entirety of the content. You have to do that yourself.

So next I add to the XMLElement class the delegate method and supporting property to build an aggregated content string followed by the didEndElement delegate method to clean up the aggregated content:

class XMLElement { // excerpt

  /// The string content within an element pair, or `nil` if there is no content.
  var content: String?

  /**
   Accumulates an element's content character strings.
   */
  func parser(parser: NSXMLParser, foundCharacters string: String) {
    content = (content ?? "").appendContentsOf(string)
  }

  /**
   Ensures the content for an element pair contains non-whitespace characters,
   or is `nil` otherwise.
   */
  func parser(parser: NSXMLParser, didEndElement elementName: String, 
    namespaceURI: String?, qualifiedName qName: String?) {

    // Store fully composed content string, trimming whitespace
    content = content?.stringByTrimmingCharactersInSet(
      NSCharacterSet.whitespaceAndNewlineCharacterSet())
    if content == "" {
      content = nil
    }
  }
}

I found that if the XML was formatted with newlines, even when there was no actual content, those newlines were passed along to the delegate method. Trimming leading and trailing whitespace ensures we’re extracting legitimate content from the element.

Not directly related to capturing the values of an element, but important nonetheless, is the delegate method for capturing errors:

class XMLElement { // excerpt

  /// If an error occurs during parsing, this value describes that error.
  var parseError: NSError?

  /**
   Captures the error state in response to a fatal parsing error.
   */
  func parser(parser: NSXMLParser, parseErrorOccurred parseError: NSError) {
    self.parseError = parseError
  }
}

Once NSXMLParser encounters an error, it immediately terminates parsing and calls this method. This will leave our parseError in the proper state for determining whether a parse error occurred.

And that does it for now.

Note that with this current implementation, the NSXMLParser will call on the single XMLElement delegate assigned in the parseDocument() method as it works its way through the entire XML document. This means the name, attributes, and content properties will be overwritten with each new element. This is something I will address in a forthcoming article, which will focus on how to capture the hierarchical relationships between the elements.

Leave a Comment