In web searches and certain problems in natural language processing, it is often useful to filter out
Question:
In web searches and certain problems in natural language processing, it is often useful to filter out certain words prior to performing a search or processing of text to help with the performance of the algorithms. Words such as and, the, and is are commonly referred to as stop words for this purpose. Lists of stop words are almost always created manually based on the constraints of a particular application. List of stop words are commonly available across the internet. For our purposes here, we will use one such list included with the materials for this book.
Using the above list of stop words, or any other that you are interested in, first filter some sample “search phrases” and then remove all stop words from a larger piece of text. If you function were called FilterText, it might work like this:
Understanding Cross Cultural Management
ISBN: 9781292015897
3rd Edition
Authors: Marie Joelle Browaeys, Roger Price