Monday 8 April 2013

Groovy Xml Series: Manipulating Xml

The Xml

In this entry I would like to review the different ways of adding / modifying / removing nodes using XmlSlurper or XmlParser. The xml we are going to be handling is the following:

      def xml = """                                                                                                                       
          
              
                  
                      
                          Don Xijote
                          Manuel De Cervantes
                      
                  
              
          
      """


Adding nodes

The main difference between XmlSlurper and XmlParser is that when former creates the nodes they won't be available until the document's been evaluated again, so you should parse the transformed document again in order to be able to see the new nodes. So keep that in mind when choosing any of both approaches.

If you needed to see a node right after creating it then XmlParser should be your choice, but if you're planning to do many changes to the XML and send the result to another process maybe XmlSlurper would be more efficient.

You can't create a new node directly using the XmlSlurper instance, but you can with XmlParser. The way of creating a new node from XmlParser is through its method createNode(..)

        def "Adding a new tag to a node"(){
          setup: "Building an instance of XmlParser"
              def parser = new XmlParser()
          and: "Parsing the xml"
              def response = parser.parseText(xml)
          when: "Adding a tag to response"
              def numberOfResults = parser.createNode(
                  response,
                  new QName("numberOfResults"),
                  [:]
              )   
          and: "Setting the node's value"
              numberOfResults.value = "1"
          then: "We should be able to find the new node"
              response.numberOfResults.text() == "1"
      }

The createNode() method receives the following parameters:
  • parent node (could be null)
  • The qualified name for the tag (In this case we only use the local part without any namespace)
  • A map with the tag's attributes (None in this particular case)
Anyway you won't normally be creating a node from the parser instance but from the parsed Xml instance. That is from a Node or a GPathResult instance.

Take a look at the next example. We are parsing the xml with XmlParser and then creating a new node from the parsed document's instance (Notice the method here is slightly different in the way it receives the parameters):

def "Adding a new tag to a node with the node instance"(){
          setup: "Building an instance of XmlParser"
              def parser = new XmlParser()
          and: "Parsing the xml"
              def response = parser.parseText(xml)
          when: "Appending the tag to the current node"
              response.appendNode(
                  new QName("numberOfResults"),
                  [:],
                  "1"
              )
          then: "We should be able to find it"    
              response.numberOfResults.text() == "1"
      }

When using XmlSlurper GPathResult instances don't have createNode() method.

Modifying / Removing nodes

We know how to parse the document, add new nodes, now I want to change a given node's content. Let's start using XmlParser and Node. This example changes the first book information to actually another book.

     
     def "Replacing a node"(){
          setup: "Building the parser and parsing the xml"
              def response = new XmlParser().parseText(xml)
          when: "Replacing the book 'Don Xijote' with 'To Kill a Mockingbird'"
           /* Use the same syntax as groovy.xml.MarkupBuilder */
              response.value.books.book[0].replaceNode{
                  book(id:"3"){
                      title("To Kill a Mockingbird")  
                      author(id:"3","Harper Lee")     
                  }
              }
          and: "Looking for the new node"
              def newNode = response.value.books.book[0]
          then: "Checking the result"
              newNode.name() == "book"        
              newNode.@id == "3"
              newNode.title.text() == "To Kill a Mockingbird"
              newNode.author.text() == "Harper Lee"
           /* Don't know why I have to look for the first id */
              newNode.author.@id.first() == "3"
      }

When using replaceNode() the closure we pass as parameter should follow the same rules as if we were using groovy.xml.MarkupBuilder (See resources section for more information):
tagName(attribute:attributeValue){
    nestedTag("stringcontent")
  /// etc
}

Here the same example with XmlSlurper:
def "Replacing a node"(){
          setup: "Parsing the document"
              def response = new XmlSlurper().parseText(xml) 
          when: "Replacing the book 'Don Xijote' with 'To Kill a Mockingbird'"
           /* Use the same syntax as groovy.xml.MarkupBuilder */
              response.value.books.book[0].replaceNode{
                  book(id:"3"){
                      title("To Kill a Mockingbird")  
                      author(id:"3","Harper Lee")     
                  }          
              }              
          and: "Asserting the lazyness"
              assert response.value.books.book[0].title.text() == "Don Xijote"
          and: "Rebuild the document"
           /* That mkp is a special namespace used to escape away from the normal building mode 
              of the builder and get access to helper markup methods 
              'yield', 'pi', 'comment', 'out', 'namespaces', 'xmlDeclaration' and 
              'yieldUnescaped' */
              def result = new StreamingMarkupBuilder().bind{mkp.yield response}.toString()
              def changedResponse = new XmlSlurper().parseText(result)
          then: "Looking for the new node"
              assert changedResponse.value.books.book[0].title.text() == "To Kill a Mockingbird"
      }

Notice how using XmlSlurper we have to parse the transformed document again in order to find the created nodes. In this particular example could be a little bit annoying isn't it?

Finally both parsers also use the same approach for adding a new attribute to a given attribute. This time again the difference is whether you want the new nodes to be available right away or not. First XmlParser:

     def "Adding a new attribute to a node"(){
          setup: "Building an instance of XmlParser"
              def parser = new XmlParser()    
          and: "Parsing the xml"
              def response = parser.parseText(xml)
          when: "Adding an attribute to response"
              response.@numberOfResults = "1" 
          then: "We should be able to see the new attribute"
              response.@numberOfResults == "1"
      }

And XmlSlurper:

def "Adding a new attribute to a node"(){
          setup: "Parsing the document"
              def response = new XmlSlurper().parseText(xml)
          when: "adding a new attribute to response"
              response.@numberOfResults = "2"
          then: "In attributes the node is accesible right away"
              response.@numberOfResults == "2"
      }

But, hold on a second! The XmlSlurper example didn't need to evaluate again the transformation did it? You're right. When adding a new attribute doing a new evaluation is not necessary either way.

Resources




No comments:

Post a Comment