Coursera: Hoy comienza Text Mining and Analytics

tm&aComo comentamos en octubre del año pasado, ya comenzó la serie de cursos de especialización en Data Mining de la Universidad de Illinois en Urban-Champaign. El primer  curso de la especialización se llamó “Pattern Discovering in Data Mining“, el segundo fue “Text Retrieval and Search Engines”, y el tercero “Cluster Analysis in Data Mining“.

Ahora comienza el cuarto curso de la serie, que se llama “Text Mining and Analytics. El curso durará 4 semanas y será dictado por el profesor ChengXiang Zhai, y tendrá los siguientes 4 módulos (cada uno de 1 semana de duración):

Module Key Concepts Recommended Readings
Week 1
  • Overview of text mining and analytics
  • NLP and text representation
  • Paradigmatic and syntagmatic word relations
  • Mining word associations
  • Manning, Chris, and Hinrich Schütze. Foundations of Statistical Natural Language Processing. Cambridge: MIT Press, 1999. (Chapter 5)
  • Zhai, ChengXiang. “Exploiting Context to Identify Lexical Atoms: A Statistical View of Linguistic Context.” Proceedings of the International and Interdisciplinary Conference on Modeling and Using Context (CONTEXT-97), Rio de Janeiro, Brazil, 4-6 Feb. 1997. (pages 119-129)
  • Jiang, Shan, and ChengXiang Zhai. “Random Walks on Adjacency Graphs for Mining Lexical Relations from Big Text Data.” Big Data Conference. 27-30 Oct. 2014, Washington DC. Washington DC: IEEE International Conference on Big Data, 2014. 549-554.
Week 2
  • Overview of topic mining and analysis
  • One topic per document: document clustering for topic mining
  • Multiple topics per document: statistical topic models for topic mining
  • Probabilistic Latent Semantic Analysis
  • EM algorithm
  • Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schuetze. Introduction to Information Retrieval. Cambridge: Cambridge UP, 2007. (Chapters 16 and 17)
  • Aggarwal, Charu, and ChengXiang Zhai, eds. Mining Text Data. New York: Springer, 2012. (Chapter 4)
  • Qiaozhu Mei, Xuehua Shen, and ChengXiang Zhai. Automatic Labeling of Multinomial Topic Models. Proceedings of ACM KDD, 2007. (pages 490-499)
Week 3
  • Incorporating prior into a topic model
  • Text categorization
  • Sentiment analysis
  • Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schuetze. Introduction to Information Retrieval. Cambridge: Cambridge UP, 2007. (Chapters 13 and 14)
  • Aggarwal, Charu, and ChengXiang Zhai, eds. Mining Text Data. New York: Springer, 2012. (Chapter 6)
  • Liu, Bing. Sentiment Analysis and Opinion Mining. Morgan & Claypool Publishers, 2012.
Week 4
  • Joint analysis of text and non-textual data
  • Advanced topic models
  • Towards a general text analysis engine
  • Summary
  • Blei, D. “Probabilistic Topic Models.” Communications of the ACM. 55.4 (2012): 77–84.
  • Zhai, ChengXiang. Statistical Language Models for Information Retrieval. Morgan & Claypool Publishers, 2008. (Chapter 7)
  • Kim, Hyun Duk, Malú Castellanos, Meichun Hsu, ChengXiang Zhai, Thomas A. Rietz, and Daniel Diermeier. “Mining Causal Topics in Text Data: Iterative Topic Modeling with Time Series Feedback.” Proceedings of CIKM, 2015. (pages 885-890)
  • Smith, Noah. Text-Driven Forecasting. 31 May 2015.

Finalmente, los dejamos con el video de presentación del curso para los que estén interesados en tomarlo. Buena suerte!!

Leave a reply