COM5508 Media Data Analytics
A graduate course on computational methods for media research, covering data collection, text mining, network analysis, and machine learning applications in communication studies.
Course Overview
This graduate seminar introduces students to computational and quantitative methods for media and communication research. Topics span the full research pipeline: data collection (APIs, scraping), text analysis (sentiment, topic modeling, word embeddings), social network analysis, and machine learning classification.
Zhicong Chen served as Teaching Assistant for this course.
Learning Objectives
By the end of this course, students will be able to:
- Collect large-scale media and social data programmatically using APIs and web scraping
- Apply text mining techniques (TF-IDF, topic modeling, sentiment analysis) to communication research questions
- Construct and analyse social networks from media interaction data
- Train and evaluate machine learning classifiers on media content
- Critically assess the validity and ethics of computational media research
Level
Graduate (PhD and Research Masters)
Institution
Department of Media and Communication, City University of Hong Kong
Key Software and Libraries
| Tool | Purpose | Link |
|---|---|---|
| Python 3 | Core language | python.org |
| NLTK | NLP preprocessing | nltk.org |
| Gensim | Topic models & word vectors | radimrehurek.com/gensim |
| scikit-learn | Machine learning | scikit-learn.org |
| NetworkX | Network analysis | networkx.org |
| Gephi | Network visualisation | gephi.org |
| Beautiful Soup | Web scraping | crummy.com/software/BeautifulSoup |
Recommended Texts
- Salganik, M. J. (2017). Bit by Bit: Social Research in the Digital Age — free online.
- Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Text as Data. Princeton UP.
Assessment
| Component | Weight |
|---|---|
| Weekly lab exercises | 30% |
| Literature review | 20% |
| Final research project | 45% |
| Participation | 5% |
Schedule
| Week | Date | Topic | Materials |
|---|---|---|---|
| 1 | Week 1 | Introduction to Computational Communication Research Overview of big data in media studies; the promise and limitations of computational methods; research workflow. | |
| 2 | Week 2 | Python for Media Research Python environment setup, Jupyter notebooks, pandas basics, data types and structures. | |
| 3 | Week 3 | Collecting Data from APIs REST APIs, authentication, rate limiting; collecting data from social media and news APIs. | |
| 4 | Week 4 | Web Scraping for Media Research Scraping news sites, HTML parsing with BeautifulSoup, handling dynamic pages with Selenium. | |
| 5 | Week 5 | Text Preprocessing and Representation Tokenization, stopwords, stemming, TF-IDF; building document-term matrices. | |
| 6 | Week 6 | Content Analysis and Dictionary Methods Automated content analysis; sentiment dictionaries; LIWC; validation strategies. | |
| 7 | Week 7 | Topic Modeling with LDA Latent Dirichlet Allocation; selecting K; interpreting and labelling topics; stability. | |
| 8 | Week 8 | Social Network Analysis I: Basics Graph theory fundamentals; degree, betweenness and closeness centrality; NetworkX. | |
| 9 | Week 9 | Social Network Analysis II: Communities Community detection algorithms; modularity; information diffusion; visualisation with Gephi. | |
| 10 | Week 10 | Machine Learning for Media Classification Naive Bayes, SVM, random forests; train/test split; precision, recall, F1. | |
| 11 | Week 11 | Word Embeddings and Semantic Analysis Word2Vec, GloVe; using pre-trained embeddings; semantic similarity in media text. | |
| 12 | Week 12 | Research Design and Reproducibility Pre-registration, open data, replication; writing up computational studies; ethical considerations. | |
| 13 | Week 13 | Student Project Presentations I Student teams present computational media research projects. | |
| 14 | Week 14 | Student Project Presentations II Remaining presentations; course wrap-up and reflections. |