Cross‑Platform Analysis of Vegan Discourse with LLM‑Enhanced Clustering
Tunazzina Islam, Dan Goldwasser. Preprint 2025.
Abstract
Social media platforms amplify discussions on lifestyle choices, such as veganism, shaping consumer behavior, and ethical debates, as well as environmental policy. Scalable and efficient methods are needed to understand how these discussions evolve and how they differ across platforms. Yet most prior works analyze a single, centralized platform and overlook emerging decentralized networks where discourse dynamics may diverge. We propose a novel framework that integrates large language models (LLMs) with advanced clustering techniques to systematically analyze vegan discourse across two major platforms: X (formerly Twitter) and Bluesky. Using a corpus of 20k tweets and 13k Bluesky posts, we introduce a task of classifying vegan discourse into key themes. Our experiments demonstrate that leveraging LLMs with advanced clustering algorithms, specifically HDBSCAN, enhances cluster quality. Furthermore, we show that LLMs can serve as effective unsupervised annotators, thereby reducing the need for manual labeling efforts. By leveraging cross‑platform datasets, our work not only demonstrates methodological advances in scalable text analysis but also provides new empirical insights into how platform design shapes public discourse.