Question: Complete a program that is designed to read an XMl file, follow an XPath to a selected node, and then list all of the text
Complete a program that is designed to read an XMl file, follow an XPath to a selected node, and then list all of the text content (ignoring attributes) in the subtree rooted at the selected node, in the order that the text appeared within the original XML file.
You will be provided with the bulk of the code for this program, including the input processing to read XML and convert it into a tree structure (declared in node.h).
Your task is to supply the functions declared in extraction.h:
-
A function to walk the tree starting from its root, following an XPath to a desired node.
-
A function to extract all of the text (in the leaves of the tree) in the tree, combining it into a single string with one or more blanks separating the text strings from different nodes.
Your bodies for these functions should be written in extraction.cpp.
To run the application program, supply two command line parameters. The first will designate an XML file and the second will be the XPath to the desired node.
Example 1
./xmlextract test0.html /html/body
will print
Hello world!
Example 2
./xmlextract test1.html /html/body/p[2]
will print
world!
Example 3
./xmlextract books1.xml /rdf:RDF/pgterms:etext/dc:creator
will print
Twain, Mark, 1835-1910
Example 4
./xmlextract books1.xml /rdf:RDF/pgterms:etext[3]
will print
&pg; A History of the Early Part of the Reign of James the Second Fox, Charles James, 1749-1806 Morley, Henry, 1822-1894 [Editor] A History of the Early Part of the Reign of James en Great Britain -- History -- James II, 1685-1688 DA 2003-07-01 17
// extraction.cpp
#include "extraction.h" #include
using namespace std;
/** * Examine an xpath step of the form "/tagName[k]" and pull out the tagname * and index. The index part may be omitted, in which case it is assumed to * be 1. * * @param xpathStep the string containing one step in an xpath. * @param tagName the tag name that must be matched in the step (output) * @param index the index of the desired child with that tagName (output) */ void interpretXPathStep(string xpathStep, string &tagName, unsigned &index) { index = 1; tagName = xpathStep; if (tagName.size() > 0 && tagName[0] == '/') tagName = tagName.substr(1); // discard the '/' string::size_type indexStart = xpathStep.find('['); if (indexStart != string::npos) { string::size_type indexStop = xpathStep.find(']'); index = stoi(xpathStep.substr(indexStart+1, indexStop - indexStart - 1)); tagName = tagName.substr(0, indexStart); } }
/** * Find a node in an XML tree usign a subset of XPATH: * /tag1[k1]/tag2[k2]/.../tagn[kn] * Each tagi is an XML tag name. The [ki] give an integer index indicating * which child with the given tag name should be selected. The "[ki]" portion * may be omitted when ki==1. * * @param root the root of the tree from which the selection should be made * @param xpath the path to follow in selectign the desired node. * @return the desired node from within the tree, or nullptr if no node matching * the given path can be found. */ Node *selectByPath(Node *root, std::string xpath) { //*** To be implemented return nullptr; }
/** * Given an XML (sub)tree, extract and concatenate the text leaves from * that tree in the order they would be encountered in an XML listing, * separating text from different leaf nodes by one or more blanks. * * @param tree the root of the tree from which the text is to be extracted. */ std::string extractText(const Node *tree) { //*** To be implemented return ""; }
Please only submit new extraction.cpp file please!
// extraction.h
#ifndef EXTRACTION_H #define EXTRACTION_H
#include
/** * Find a node in an XML tree usign a subset of XPATH: * /tag1[k1]/tag2[k2]/.../tagn[kn] * Each tagi is an XML tag name. The [ki] give an integer index indicating * which child with the given tag name should be selected. The "[ki]" portion * may be omitted when ki==1. * * @param root the root of the tree from which the selection should be made * @param xpath the path to follow in selectign the desired node. * @return the desired node from within the tree, or nullptr if no node matching * the given path can be found. */ Node* selectByPath (Node* root, std::string xpath);
/** * Given an XML (sub)tree, extract and concatenate the text leaves from * that tree in the order they would be encountered in an XML listing, * separating text from different leaf nodes by one or more blanks. * * @param tree the root of the tree from which the text is to be extracted. */ std::string extractText(const Node* tree);
#endif
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
