Analyzing the Impact of Player Performance on YouTube Comment Sentiment and Topics
Research Context: This bachelor thesis examined whether NBA player performance statistics influence how fans discuss and react emotionally to highlight videos on YouTube. Combining 72,749 comments from 496 videos with detailed player statistics from the 2024/2025 season, I used advanced text mining techniques to discover patterns between on-court performance and online fan discourse.
Key Methodology: The research integrated three analytical dimensions:
- Performance metrics: Traditional box score statistics (points, rebounds, assists, steals, blocks)
- Thematic content: 21 topic clusters identified through BGE-M3 embeddings and semantic clustering
- Emotional tone: Sentiment analysis using Twitter-RoBERTa transformer models
Using UMAP (Uniform Manifold Approximation and Projection) visualization, I examined how these dimensions relate both independently and collectively, testing whether exceptional statistical performances generate distinct patterns in fan conversations.
Findings: Contrary to initial expectations, the analysis revealed that player performance statistics and fan discourse operate as independent dimensions:
- Topics discussed in comments remain consistent regardless of whether players scored 14 or 61 points in a game
- Sentiment distributions show negligible correlation with statistical outputs (|r| ≤ 0.12)
- The top 5 discussion topics account for 70% of all comments across all performance levels
Even in integrated multidimensional analysis, sentiment dominated spatial organization (r=0.67-0.89) while performance metrics showed minimal structural influence (r=0.13-0.28), despite balanced feature weighting.
Strategic Implications: These findings challenge conventional assumptions about sports social media strategy. Rather than focusing content strategies on highlighting exceptional statistical performances, NBA social media teams can emphasize:
- Narrative continuity and storytelling elements
- Player identity and personality rather than just statistics
- Emotional connections that transcend individual game performances
Business Intelligence Value: This work provides a proof-of-concept for integrating structured performance data with unstructured social media content at scale. The methodology demonstrates feasibility for real-time dashboard implementations that could monitor fan sentiment and discourse patterns, enabling data-driven decisions about content production and audience engagement strategies.
Technical Achievement: Successfully combined multiple advanced NLP techniques (BGE-M3 embeddings with 1024-dimensional vectors, Louvain community detection, transformer-based sentiment analysis) with dimensionality reduction visualization to analyze league-wide patterns across an entire NBA season—extending beyond previous studies limited to single games or playoff series.
Research Contribution: While the null finding—that performance doesn't drive discourse patterns—contradicts the initial hypothesis, it represents valuable scientific knowledge. Fan engagement follows its own logic, shaped by broader narratives, community dynamics, and player identities rather than individual game statistics.