From Words to Networks - Information and Relation Extraction from Text Data and Network Analysis with ConText

Instructor: Jana Diesner
Duration: 6 hours

1. What is covered in the workshop? What will you learn?

The functioning and dynamics of real-world networks involve the continuous production, processing and flow of knowledge and information. Sources for this knowledge and information often occur in the form of unstructured, natural language text data. In this workshop, participants learn how to a) construct network data from text data and pertaining meta-data, and b) how to jointly consider text data and network data for analysis; allowing for considering two types of behavioral information, namely social interactions and language use.

Workshop participants will be introduced to fundamental theories, concepts and methods for these purposes. Using text analysis for network analysis has been useful in answering questions such as: Who is talking to whom, and about what? What perceptions or mental models do social agents have of certain themes? How do opinions evolve and diffuse in society and online? Throughout this workshop, we discuss practical applications for the introduced techniques from various domains.

The focus of this workshop is on teaching practical, hands-on skills for using text analysis methods in an informed, systematic and efficient fashion. We use the ConText software, which is available for free (context.lis.illinois.edu). Our goal is to equip the participants with the skills and tools needed to use the covered techniques for their own research purposes and text data sets. Attendants will perform automated text mining and natural language processing techniques including:

1. Collecting various types of text data, including social media data, news wire data, legal documents and corporate information.
2. Creating curated corpora of collected data, deduplicating documents, and managing text data and pertaining meta data in automatically populated databases that can be used for search and retrieval functions and data mining techniques.
3. Summarization techniques such as topic modeling and term weighting techniques.
4. Sentiment Analysis, also known as opinion mining.
5. Visualization of text mining results.
6. Pre-processing techniques such as stemming and parts of speech tagging.
7. Entity detection, i.e. detecting and categorizing terms and term sequences that represent instances of relevant node classes in one-mode and multi-mode network analysis, e.g. agents, organizations, locations and knowledge.
8. Relation Extraction, i.e. linking identified entities into edges based on various criteria, including proximity, syntax and semantics. The extracted networks can be imported into standard SNA tools, incl. Gephi, ORA, Pajek, UCINET and visone.
9. Extracting semantic networks and fusing them with social networks.
10. Analyzing the extracted networks, i.e. conducting (social) network analysis, including network visualization, computing metrics on the node, group and graph level, and clustering techniques.
Going from texts to networks involves some principles and strategies originating from computer science that are not only applicable to the task at hand, but to a wide range of problems. These principles and strategies are referred to as “Computational Thinking” - a basic skill like reading, writing and arithmetic that is crucial for solving problems and understanding human behavior across fields (Wing 2006). In this workshop, participants are introduced to Computational Thinking and practice applying this way of thinking.

2. Who should attend?

This is an interdisciplinary and interactive workshop designed to benefit from the participation of attendants from different backgrounds. The material, exercises and mode of delivery are suitable for researchers and practitioners alike. No specific prior knowledge or computational skills are required. The delivery is driven towards forming an understanding of fundamental concepts and gaining hands-on experience with text analysis and network analysis methods and tools.

3. What to bring to the workshop?

Software: We will use ConText (http://context.lis.illinois.edu/) and Gephi (https://gephi.org) for this workshop. Prior to the workshop, I will send an email to confirmed participants with links and installation instructions for these tools. You are invited to bring a laptop to the workshop. If attendants cannot bring a laptop they will still fully benefit from the workshop as I screen-project all live walk-through exercises. At the workshop, I will provide a tutorial document and further learning resources.
Data: Attendants can work with the sample data that we provide and/ or bring their own data.

4. Readings

Prior to the workshop, I recommend reading the following overviews on the concepts and methods covered in the workshop:
Diesner, J., Carley, K. M. (2011): Semantic Networks. In G. Barnett (Ed), Encyclopedia of Social Networking, (pp. 595-598). Sage Publications.
http://people.lis.illinois.edu/~jdiesner/publications/Semantic_Networks_...
Diesner, J., Carley, K. M. (2011): Words and Networks. In G. Barnett (Ed.), Encyclopedia of Social Networking, (pp. 958-961). Sage Publications.
http://people.lis.illinois.edu/~jdiesner/publications/Word_Networks_Dies...

All further readings are optional:

Introduction of information extraction/ text mining: McCallum, A. (2005). Information extraction: distilling structured data from unstructured text. ACM Queue, 3(9), 48-57.
http://people.cs.umass.edu/~mccallum/papers/acm-queue-ie.pdf
Introduction of information extraction/ text mining: Hanneman, RA & Riddle, M. (2005). Introduction to social network methods. Riverside, CA: University of California.
http://www.faculty.ucr.edu/~hanneman/nettext/
Introduction to Computational Thinking: Wing, J. M. (2006). Computational Thinking. Communications of the ACM, 49(3), 33-35, http://dl.acm.org/citation.cfm?id=1118215

Abstract Submission