Skip to content

This is the final project of the Computational Content Analysis (Winter 2020) at the University of Chicago. The authors are Luxin Tian and Heather Chen.

Notifications You must be signed in to change notification settings

luxin-tian/UNGDC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 

Repository files navigation

State Preferences for Political Agendas and Policy Position Proximity: A Computational Content Analysis Project on the United Nations General Debate Corpus

Quantitative revelations of political agents’ preferences are of substantial interest in the field ofpolitical science. In this project, we employ computational methods to perform a content analysis of the United Nations General Debate Corpus (UNGDC). We infer state preferences for political agendas through features embedded in text data as well as word embeddings and topic modeling, based on which we examine policy position proximity between states based on our measurement. We embed the documented statements delivered by countries into a semantic space and infer state blocs based on their policy positions represented by document vectors. We also figure out the topics that the states are concerned about over the past half-century and investigate international relations in terms of common concerns. Finaly, we evaluate the validity and reliability of our methods and results.

Reproducible codes and notebooks

Documentation

Repository structure

  • /docs folder contains:
  • /project folder contains 4 jupyter notebooks:
    • Explanatory Data Analysis
    • Topic Modeling (LDA model) and Semantic Networks
    • Dynamic Topic Modeling
    • BERT Text Generation
  • /pic folder contains the graphs of network analysis
    • 49 networks constructed based on the topics revealed by the LDA model and normalized mutual information score.

Reference

  • Baturo, A., Dasandi, N., Mikhaylov, S.J., 2017. Understanding state preferences with text as data: Introducing the ungeneral debate corpus. Research & Politics 4, 2053168017712821.
  • Blei, D.M., Ng, A.Y., Jordan, M.I., 2003. Latent dirichlet allocation. Journal of machine Learning research 3, 993–1022.
  • Griffiths, T.L., Steyvers, M., Tenenbaum, J.B., 2007. Topics in semantic representation. Psychological review 114,211.
  • Gurciullo, S., Mikhaylov, S., 2017. Topology analysis of international networks based on debates in the united nations.arXiv preprint arXiv:1707.09491.
  • Jankin Mikhaylov, S., Baturo, A., Dasandi, N., 2017. United Nations General Debate Corpus. URL:https://doi.org/10.7910/DVN/0TJX8Y, doi:10.7910/DVN/0TJX8Y.
  • Maaten, L.v.d., Hinton, G., 2008. Visualizing data using t-sne. Journal of machine learning research 9, 2579–2605.
  • Mikolov, T., Chen, K., Corrado, G., Dean, J., 2013. Efficient estimation of word representations in vector space. arXivpreprint arXiv:1301.3781.
  • Plan, M., . Marshall plan. Public law 80, 472.27

website

About

This is the final project of the Computational Content Analysis (Winter 2020) at the University of Chicago. The authors are Luxin Tian and Heather Chen.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published