26.06. – 28.06.

/ 10:00 – 17:00

Workshop: „Digital Trace Data in Social Science Research“

Frauke Kreuter (LMU München, University of Maryland), Anna-Carolina Haensch (LMU München), Christoph Kern (LMU München, Mannheim Centre for European Social Research), Clara Strasser Ceballos (LMU München)


The interdisciplinary research area FAIR organizes a three-day workshop on Digital Trace Data in Social Science Research.

In this workshop, participants will be introduced to digital trace data and its collection as well as the analysis of digital trace data, especially text data.

The workshop, „Digital Trace Data in Social Science Research“, is structured into two main sections:

  1. Overview of Digital Trace Data: This section introduces digital trace data, exploring various sources like e-learning systems, websites, smartphone apps, and sensors in wearables. Key aspects covered include the typical characteristics of these data, data quality, and their potential for social and cultural science research, along with the prerequisites for leveraging these potentials. Special attention is given to social media data from platforms like YouTube, Reddit, and TikTok. The session includes both theoretical discussion and practical data collection exercises using the statistical programming language R.
  2. Analysis of Digital Trace Data: The second section delves into the analysis of the data discussed earlier. It begins with an introduction to supervised and unsupervised machine learning, covering use cases and methods. It then focuses on specific applications: text classification models (an example of supervised learning) and topic modeling (an example of unsupervised learning). Participants will engage in practical R exercises to consolidate their learning. The workshop concludes with a forward-looking segment that explores the application of these methods to other data formats, such as analyzing open-ended responses in traditional survey data.

Invited speakers:

Frauke Kreuter bridges people, challenges, and organizations, to enhance data quality not only for AI models.

Christoph Kern’s research focuses on the social impacts of algorithmic decision-making and on methodology to mitigate algorithmic unfairness and improve training data quality.

Caro Haensch is exploring the new frontiers of social data science by merging her survey statistics background with a pioneering approach to synthetic data and large language models.

Clara Strasser Ceballos harnesses the power of data for the social good, by shaping social-aware and fair AI systems in her interdisciplinary research.