Text Mining

text data mining, roughly equivalent to text analytics. The objective is to convert text into data for analysis with applications for natural language processing (NLP) and analytical methods. My approach is to scan/scrap documents written in a natural language and either model the document set for predictive classification purposes. The process entails extracting high-quality information from text, which requires patterns and trends recognition via  statistical pattern learning. Text mining requires reformatting input text, usually by parsing the text and inserting it into a database.  The structured data, is then processed by adding or re-moving some (derived) linguistic features and finally patterns are derived, evaluated and interpreted  ‘High quality’ in text mining usually refers to some combination of relevance, novelty, and interestingness. Text mining can also include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities). Text analytics, involves information retrieval, lexical analysis to study word frequency distributions, pattern recognition, tagging/annotation, information extraction, data mining techniques including link and association analysis, visualization, and predictive analytics.  ]]>