Programming Project 9

 

There is a real program developed by a computer company that reads a report (running text), issues warnings on style, and partially corrects bad style.  You are to write a simplified version of this program with the following features.

 

                    Statistics A statistical summary is prepared for each report processed with the following information: 

•total number of words in the report;

•number of unique words;

•number of unique words of more than three letters;

•average word length;

• (EXTRA CREDIT: average sentence length); and

•a listing of the special words with the number of times each was used in the report.

 

                    Style Warnings   Issue a warning in the following cases:

•Word used too often:  List each unique word of more than three letters if its usage is more than 5% of the total number of words of more than three letters. 

•(EXTRA CREDIT:  Sentence length too long:  Write a warning message if the average sentence length is greater than 10.)

•Words too big:  Write a warning message if the average word length is greater than 5.

 

                    Run  Summary  At the end of the run the special words are written to a file with the number of times each was used during the run of the program.

Input

                    From the keyboard:

                    1.     The name of the file containing the text to be analyzed.  (If this is input as a string type, you must convert it to a character array using c_str as in inputfilename.c_str())

                    2.      List of special words

                    From the file:  

                    The report to be analyzed; ended by <eof>. 

Output

1.                For each report being analyzed write the following information to a file.

·           The name of the file

·           A listing of the file

·           The statistical summary of the report  (See Statistics above.)

·           The style warnings given  (See Style Warnings above.)

2.               An alphabetical listing of the special words one per line with the number of times each was used throughout the run of the program.

Data Structures

                    You do not know how many words the text has, nor how long the text passage is.  You also do not know how many special words are listed.  Therefore, you cannot use an array implementation of any data structure. Keep efficiency in mind as you decide on the type of data structures that you will use.

                    1.      A list (tree) to contain the special words

                    2.      A list (tree) of unique words in the report, created as the file is read.  If a word is not in the list, put it there. If it is, increment a counter showing how many times the word has been used. 

Definitions

                    Word : Sequence of letters ending in a blank, a period, an exclamation point, a question mark, a colon,  a comma, (Extra Credit:  a single quote, a double quote), or a semicolon.  Numbers do not appear in the words; they may be ignored. 

                    Unique word: Words that are spelled the same, ignoring uppercase and lowercase distinctions.

                    Sentence : Words between end of sentence markers or the beginning of the report and the first end of sentence marker.