PYTHON PROGRAM RELATED TO INFORMATION RETRIEVAL AND WEB SEARCH
Problem 1 [30 points]. Write a (Python) program that preprocesses a
collection of documents using the recommendations given in the
Text Operations lecture. The input to the program will be a directory
containing a list of text files. Use the files from assignment #3 as
test data as well as 10 documents (manually) collected from news.yahoo.com .
The yahoo documents must be converted to text before using them.
Remove the following during the preprocessing:
– urls and other html-like strings
– morphological variations
Above mentioned assignment 3# file is also attached and by running this code in anaconda spider you can see the output