Question: Complete a program that is designed to read an XMl file, follow an XPath to a selected node, and then list all of the text

Complete a program that is designed to read an XMl file, follow an XPath to a selected node, and then list all of the text content (ignoring attributes) in the subtree rooted at the selected node, in the order that the text appeared within the original XML file.

You will be provided with the bulk of the code for this program, including the input processing to read XML and convert it into a tree structure (declared in node.h).

Your task is to supply the functions declared in extraction.h:

  1. A function to walk the tree starting from its root, following an XPath to a desired node.

  2. A function to extract all of the text (in the leaves of the tree) in the tree, combining it into a single string with one or more blanks separating the text strings from different nodes.

Your bodies for these functions should be written in extraction.cpp.

To run the application program, supply two command line parameters. The first will designate an XML file and the second will be the XPath to the desired node.

Example 1

./xmlextract test0.html /html/body 

will print

Hello world! 

Example 2

./xmlextract test1.html /html/body/p[2]

will print

world! 

Example 3

./xmlextract books1.xml /rdf:RDF/pgterms:etext/dc:creator 

will print

Twain, Mark, 1835-1910 

Example 4

./xmlextract books1.xml /rdf:RDF/pgterms:etext[3] 

will print

&pg; A History of the Early Part of the Reign of James the Second Fox, Charles James, 1749-1806 Morley, Henry, 1822-1894 [Editor] A History of the Early Part of the Reign of James en Great Britain -- History -- James II, 1685-1688 DA 2003-07-01 17

// extraction.cpp

#include "extraction.h" #include #include #include

using namespace std;

/** * Examine an xpath step of the form "/tagName[k]" and pull out the tagname * and index. The index part may be omitted, in which case it is assumed to * be 1. * * @param xpathStep the string containing one step in an xpath. * @param tagName the tag name that must be matched in the step (output) * @param index the index of the desired child with that tagName (output) */ void interpretXPathStep(string xpathStep, string &tagName, unsigned &index) { index = 1; tagName = xpathStep; if (tagName.size() > 0 && tagName[0] == '/') tagName = tagName.substr(1); // discard the '/' string::size_type indexStart = xpathStep.find('['); if (indexStart != string::npos) { string::size_type indexStop = xpathStep.find(']'); index = stoi(xpathStep.substr(indexStart+1, indexStop - indexStart - 1)); tagName = tagName.substr(0, indexStart); } }

/** * Find a node in an XML tree usign a subset of XPATH: * /tag1[k1]/tag2[k2]/.../tagn[kn] * Each tagi is an XML tag name. The [ki] give an integer index indicating * which child with the given tag name should be selected. The "[ki]" portion * may be omitted when ki==1. * * @param root the root of the tree from which the selection should be made * @param xpath the path to follow in selectign the desired node. * @return the desired node from within the tree, or nullptr if no node matching * the given path can be found. */ Node *selectByPath(Node *root, std::string xpath) { //*** To be implemented return nullptr; }

/** * Given an XML (sub)tree, extract and concatenate the text leaves from * that tree in the order they would be encountered in an XML listing, * separating text from different leaf nodes by one or more blanks. * * @param tree the root of the tree from which the text is to be extracted. */ std::string extractText(const Node *tree) { //*** To be implemented return ""; }

Please only submit new extraction.cpp file please!

// extraction.h

#ifndef EXTRACTION_H #define EXTRACTION_H

#include #include "node.h"

/** * Find a node in an XML tree usign a subset of XPATH: * /tag1[k1]/tag2[k2]/.../tagn[kn] * Each tagi is an XML tag name. The [ki] give an integer index indicating * which child with the given tag name should be selected. The "[ki]" portion * may be omitted when ki==1. * * @param root the root of the tree from which the selection should be made * @param xpath the path to follow in selectign the desired node. * @return the desired node from within the tree, or nullptr if no node matching * the given path can be found. */ Node* selectByPath (Node* root, std::string xpath);

/** * Given an XML (sub)tree, extract and concatenate the text leaves from * that tree in the order they would be encountered in an XML listing, * separating text from different leaf nodes by one or more blanks. * * @param tree the root of the tree from which the text is to be extracted. */ std::string extractText(const Node* tree);

#endif

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!