Skip to content →

Tag: XML

Duplicate XML Elements and Associated Enum Values

This is the third in a series of articles on parsing an online XML document. In the previous post, Parsing an XML Document, I focused on how to use NSXMLParser and NSXMLParserDelegate to assist in parsing, but left a bit of unfinished business: capturing the hierarchy of the document. This is something NSXMLParser and its delegate methods do not do. I was planning to address the entirety of a solution here, but a bit of other (but related) business intervened.

For now, I look at a few approaches to representing a collection of child elements within the XMLElement I have been building so far, and will further build upon the chosen approach in the next article to discuss actual construction of the XML tree object.

Common to all approaches is for each XMLElement to have a collection of child XMLElements. This makes sense because every child of an XML element is itself an XML element.

Unfortunately, things are complicated by the fact that not all element names are unique within a given collection of elements. Consider this excerpt that expounds upon the XML document example introduced in the first article, which did not show any keywords:

<KeywordList>
  <Keyword>WCS</Keyword>
  <Keyword>GeoTIFF</Keyword>
  <Keyword>AerialOK_SP_LL84_HalfFoot_5-7-2015></Keyword>
</KeywordList>

This is relevant because if we are to want to store child elements in a dictionary (presumably by their name) so that we can easily reference them, duplicate names would be prohibitive. We could store child elements for all element types within an array, but then we lose the inherent ability to reference them by name that a dictionary provides.

The first solution to addressing the duplicate element names is to store XMLElement child elements in an array within a dictionary, like so (this adds to the XMLElement class defined so far in this series):

class XMLElement { // excerpt

  /// Stores an array of one or more child elements for a given element name.
  var elements: [String: [XMLElement]] = [:]

  /**
   Adds an `element` with `name` to the collection of child `elements`.
   */
  func addElement(element: XMLElement, withName name: String) {

    // First element with this name, create elements array
    if elements[name] == nil {
      elements[name] = [XMLElement]()
    }

    // Add element to dictionary array
    elements[name]!.append(element)
  }

  /**
   Returns the first child element from `elements` with `name`.
   */
  func firstElementWithName(name: String) -&gt; XMLElement? {
    return elements[name]?[0]
  }
}

I provide firstElementWithName(_:) on line 23 for illustration. To return all the elements with a particular name is simply a matter of directly querying elements. Determining which element in the array you need after that point is really up to how the XML is being utilized, and I don’t address that here.

This approach does have the benefit of simplicity. However, I do not like that those elements appearing only once are stored in an array…an array with only one element. So I sought a native Swift solution that might let me, in effect, return either a single XMLElement or an array of XMLElements.

In the below excerpt (which replaces the prior one), a data structure like that given on line 4 would eliminate the explicit array, where Any can now be an XMLElement or an array of XMLElements.

class XMLElement { // excerpt

  /// Dictionary of `XMLElement` or `XMLElement` arrays, keyed on element name.
  private(set) var elements: [String: Any] = [:]

  /** 
   Adds an element to an array of child `XMLElement`s.
   */
  func addElement(element: XMLElement, withName name: String) {

    // Add to a single element
    if let existingElement = elements["Name"] as? XMLElement {
      elements["Name"] = [existingElement, element]
    }

    // Add to an array of elements
    else if var elementsWithName = elements["Name"] as? [XMLElement] {
      elementsWithName.append(element)
      elements["Name"] = elementsWithName
    }

    // Add the first single
    else {
      elements["Name"] = element
    }
  }

  /**
   Returns the first of the named `XMLElement` array, 
   or `nil` if no such named elements exist.
   */
  func firstElementWithName(name: String) -&gt; Any? {

    // A single element
    if let element = elements["Name"] as? XMLElement {
      return element
    }

    // An array of elements
    else if let elements = elements["Name"] as? [XMLElement] {
      return elements[0]
    }

    // No element with this name
    return nil
  } 
}

This is considerably more complex than the former approach, but while it has the benefit of eliminating the array for single XMLElement values with a specific name, it gains some type ambiguity. The private setter on line 4 mitigates any mis-assignments to the elements dictionary, but the return value on firstElementWithName(_:) is not an explicit XMLElement or [XMLElement]. This may be good solution, but I sought something better.

The last approach I discuss came from me exploring how I could return or represent in Swift an either/or value…without opening it up to Any. What came to mind was my prior reading on associated enumuration values. I had yet to have the opportunity to use those, and this application for them seemed to have potential.

class XMLElement { // excerpt

  /**
   Restricts the valid forms a child `XMLElement` can take: 
   a single element or a collection of them.
   */
  enum XMLElements {
    case Single(XMLElement)
    case Collection([XMLElement])
  }

  /// Dictionary of `XMLElements`, keyed on element name.
  var elements: [String: XMLElements] = [:]

  /**
   Adds an element to an dictionary of child `XMLElements`.
   */
  func addElement(element: XMLElement, withName name: String) {

    // Sibling elements with same name
    if let elementsForName = elements[name] {
      switch elementsForName {

      // Create collection
      case .Single(let existingElement):
        elements[name] = .Collection([existingElement, element])

      // Add to collection
      case .Collection(var existingElements):
        existingElements.append(element)
        elements[name] = .Collection(existingElements)
      }
    }

    // First element with this name
    else {
      elements[name] = .Single(element)
    }
  }

  /**
   Returns the first named child `XMLElement`, or `nil` if no such element exists.
   */
  func firstElementWithName(name: String) -> XMLElement? {

    if let elements = elements[name] {
      switch elements {

      case .Single(let element):
        return element

      case .Collection(let elements):
        return elements[0]
      }
    }

    // No elements with this name
    else {
      return nil
    }
  }
}

This is very similar to the preceding Any approach, in that it avoids storing an array of XMLElements for every element, but it has the benefit of strong typing and being a bit more self-documenting. While it does have the greatest complexity of the three approaches, we can know with certainty the elements property of an XMLElement will contain nothing but XMLElements, if anything at all, and the return value from firstElementWithName(_:) is non-ambiguous.

I will press ahead with that implementation in the fourth installment in this series where I use it to build a document tree object.

Leave a Comment

Parsing an XML Document

Using NSXMLParser to parse an XML document is the focus of this second installment in a series of posts that began with using NSURLRequest to retrieve a document from the web and continues with building an object tree representation of the document.

Apple makes parsing the text of an XML document rather trivial via its NSXMLParser and NSXMLParserDelegate Foundation classes. The two very much work hand-in-hand, as it is the delegate the parser notifies when it finds the opening and closing tag of an XML element, among several other events.

In the previous post, I introduced the XMLElement class and method retrieveXmlDataForURL(_:completionHandler:). This method retrieves into an xmlData property an NSData representation of an XML document. That xmlData property is used as input for the parsing operation I am now discussing here.

The first thing I do is extend XMLElement to install the NSXMLParser by adding this simple parseDocument() method (note, this adds to the XMLElement class defined so in the previous article):

class XMLElement { // excerpt

  /**
   Parse the XML document previously assigned to property `xmlData`. 
   If `xmlData` is `nil`, this method takes no action.
   */
  func parseDocument() {
    guard let xmlData = xmlData else { return }
    let xmlParser = NSXMLParser(data: xmlData)
    xmlParser.delegate = self
    xmlParser.parse()
  }
}

That will get the machine going, but stopping there would be a case of a letting a tree fall in the forest with no one to hear it. So I add some properties and an NSXMLParserDelegate method to capture the values NSXMLParser has found:

class XMLElement { // excerpt

  /// The element name, such as "BoundingBox". Every element has a name.
  var name: String!

  /// The element attributes, such as ["maxx": "180", "maxy": "90"]. 
  /// If `nil`, the element has no attributes.
  var attributes: [String: String]?

  /**
   Records the element name and attributes found by the `NSXMLParser`.
   */
  func parser(parser: NSXMLParser, didStartElement elementName: String,
    namespaceURI: String?, qualifiedName qName: String?, 
    attributes attributeDict: [String : String]) {

    name = elementName
    attributes = attributeDict
  }
}

Of the parameters given in those methods, the ones of interest to me are elementName and attributeDict. In the XML snippet:

<Layer queryable="0" opaque="0" cascaded="1">
  <BoundingBox SRS="EPSG:4326" minx="-180.0" miny="-90.0" maxx="180.0" maxy="90.0"</BoundingBox>
  <Abstract>Observed River Stages</Abstract>
</Layer>

the elementName on line 2 is “BoundingBox” and the attributeDict contains the key-value pairs ["minx": -180.0, "miny": -90.0, ...]. For the element on line 3, the name is “Abstract”, and the attribute dicitonary is nil.

There are two components of this particular XML example that are not represented in the parameter lists of those methods. The first is the content of an element; i.e., the text between the opening and closing tags, such as “Observed River Stages” in the Abstract element. The second is the hierarchy of or relationship between the elements: there’s nothing in those delegate methods that point to the parent element; nothing tells us BoundingBox is a child of Layer or a sibling of Abstract. I’ll address the first now, and leave the second for the next post in this series.

In order to pass the content of an XML element, NSXMLParser calls its delegate’s parser(_:foundCharacters:) method. What the API documentation tells us about this method is it may be called multiple times as the parser makes its way through the content of a single element; it does not provide a single aggregated string representing the entirety of the content. You have to do that yourself.

So next I add to the XMLElement class the delegate method and supporting property to build an aggregated content string followed by the didEndElement delegate method to clean up the aggregated content:

class XMLElement { // excerpt

  /// The string content within an element pair, or `nil` if there is no content.
  var content: String?

  /**
   Accumulates an element's content character strings.
   */
  func parser(parser: NSXMLParser, foundCharacters string: String) {
    content = (content ?? "").appendContentsOf(string)
  }

  /**
   Ensures the content for an element pair contains non-whitespace characters,
   or is `nil` otherwise.
   */
  func parser(parser: NSXMLParser, didEndElement elementName: String, 
    namespaceURI: String?, qualifiedName qName: String?) {

    // Store fully composed content string, trimming whitespace
    content = content?.stringByTrimmingCharactersInSet(
      NSCharacterSet.whitespaceAndNewlineCharacterSet())
    if content == "" {
      content = nil
    }
  }
}

I found that if the XML was formatted with newlines, even when there was no actual content, those newlines were passed along to the delegate method. Trimming leading and trailing whitespace ensures we’re extracting legitimate content from the element.

Not directly related to capturing the values of an element, but important nonetheless, is the delegate method for capturing errors:

class XMLElement { // excerpt

  /// If an error occurs during parsing, this value describes that error.
  var parseError: NSError?

  /**
   Captures the error state in response to a fatal parsing error.
   */
  func parser(parser: NSXMLParser, parseErrorOccurred parseError: NSError) {
    self.parseError = parseError
  }
}

Once NSXMLParser encounters an error, it immediately terminates parsing and calls this method. This will leave our parseError in the proper state for determining whether a parse error occurred.

And that does it for now.

Note that with this current implementation, the NSXMLParser will call on the single XMLElement delegate assigned in the parseDocument() method as it works its way through the entire XML document. This means the name, attributes, and content properties will be overwritten with each new element. This is something I will address in a forthcoming article, which will focus on how to capture the hierarchical relationships between the elements.

Leave a Comment

Retrieving an Online XML Document

Employing NSURLSession to retrieve an online XML document was the focus of some of my recent work that led into a lager task of parsing such a document into its various elements and extracting portions of interest. Thankfully, Apple’s Foundation framework took care of most of the overhead associated with both the retrieval and parsing. In this first of a short series of posts discussing my approach to this task, I begin by looking at document retrieval.

The XML document I’ll be retrieving for this example is an OpenGIS® Web Map Service Interface Standard (WMS) Capabilities Document that represents a catalog of weather products their geoserver provides. Here is a snippet from such a document, courtesy of the great state of Oklahoma:

<Layer queryable="0" opaque="0" cascaded="1">
  <Name>ogi:0</Name>
  <Title>Observed</Title>
  <Abstract>Observed River Stages</Abstract>
  <KeywordList/>
  <SRS>EPSG:4326</SRS>
  <LatLonBoundingBox minx="-180.0" miny="-90.0" maxx="180.0" maxy="90.0"/>
  <BoundingBox SRS="EPSG:4326" minx="-180.0" miny="-90.0" maxx="180.0" maxy="90.0"/>
  <ScaleHint min="0.0" max="250000.0"/>
</Layer>

Below I employ NSURLSession in a rather bare-bones implementation that doesn’t concern itself with response errors or the validity of the document itself, but only requesting the document.

For now, the XMLElement class name isn’t terribly relevant to its functionality, but it will become moreso as I in subsequent posts expand it to represent a single XML document element, such as <Name> or <Title> from the above example, and, in fact, an entire tree of XMLElements. More on that later. For now it is only retrieving the document.

import Foundation

/**
 Provides the mechanism for retrieving an online XML document.
 */
class XMLElement {

  /**
   Attempts to retrieve an XML data object from the given `url`, calling the `completionHandler` upon success or failure.
   */
  func retrieveXmlDataForURL(url: NSURL, completionHandler: 
    (data: NSData?, response: NSURLResponse?, error: NSError?) -> Void) {
    let session = NSURLSession(configuration: NSURLSessionConfiguration.defaultSessionConfiguration())
    let task = session.dataTaskWithURL(url, completionHandler: completionHandler)
    task.resume()
  }
}

As you can see, it’s fairly straightforward, with little to comment on. So let’s put it to work with the following script:

import Foundation

/// The data response.
var xmlData: NSData?

/// True once response to request is received.
var responseReceived = false

// Request the XML document
let xmlURLString = "http://ogi.state.ok.us/geoserver/wms?VERSION=1.1.1&REQUEST=GetCapabilities&SERVICE=WMS"
if let xmlURL = NSURL(string: xmlURLString) {
  let xmlElement = XMLElement()
  xmlElement.retrieveXmlDataForURL(xmlURL) {
    data, response, error in
    xmlData = data
    responseReceived = true
  }
}

// Block until response is returned
while !responseReceived {}

// Output the response
let stringData = NSString(data: xmlData!, encoding: NSUTF8StringEncoding) ?? "unable to decode"
print("Response: \(stringData)")

On line 13, I call upon the XMLElement to handle the data request, giving it a completion handler to store the response.

In the context of this test script, I use the loop on line 21 to prevent it from running to conclusion before the asynchronous request has a chance to do its thing.

If all goes well, the output should be the content of a document that begins something like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE WMT_MS_Capabilities SYSTEM "http://204.62.18.179:8080/geoserver/schemas/wms/1.1.1/WMS_MS_Capabilities.dtd">
<WMT_MS_Capabilities version="1.1.1" updateSequence="11284">
  <Service>
    <Name>OGC:WMS</Name>
    <Title>GeoServer Web Map Service</Title>

And that does it! In the next installment, I’ll look at an approach to parsing that returned document using another Foundation class, NSXMLParser.

Leave a Comment