The New Yorker is Hiring!

The New Yorker, a prestigious magazine known for its witty and thought-provoking content, is on the hunt for a new editorialist. The HR director looks at you and says, "We’re looking for someone who truly understands humor, someone who can make our readers laugh, think, and come back for more." But here’s the catch: Humor is subjective. What makes one person laugh might leave another scratching their head. To prove you’re the right candidate, you’ll need to demonstrate your comedic instincts and align with the humor profile of our readers.

Fortunately, the amazing ADArticho team thoroughly analyzed the New Yorker Caption Contest dataset and the Oxford Humor in Context dataset to uncover what makes a joke funny, how humor evolves over time, and which themes resonate most with readers. Now, it's time to leverage these insights and test your comedic prowess.

Are you ready to prove you have what it takes to land the job? Let’s find out!

Our Big Research Questions

But before diving into the data and quizzes, let's understand the New Yorker and Oxford datasets a bit more. Feel free to explore the details of our datasets on the Datasets Page! From this, our team started with three major questions about humor:

1. Thematic Structure

What aspects of their lives do people denounce or joke about most often? Which themes and types of humor recur most frequently in humorous captions?

2. Cultural Bubble

Do the successful captions in the New Yorker dataset exhibit cultural or thematic biases that differ from those in the Oxford dataset?

3. Temporal Change

How do the thematic structures of humor evolve over time, and can major sociopolitical or cultural events be linked to shifts in the popularity of humor themes?

These questions guided our exploration of humor across different contexts, cultures, and time periods. Now, let’s try answering these big questions through the following analyses.

The Laughter Lens

The New Yorker Caption Contest dataset (NYCC) and Oxford Humor in Context dataset (OHIC) provide a unique opportunity to explore humor through data. Both datasets compile images and captions rated by humans for their funniness, offering insights into humor across institutional and general contexts.

New Yorker Caption Contest (NYCC):

  • contests.json: Metadata such as the number of votes, number of captions, and image descriptions.
  • data/: CSV files for each image, listing captions and their statistics (e.g., number of votes, funny score, mean score).
  • images/: The actual cartoon images in JPG format.

This dataset focuses on cartoons and contains detailed metadata for each image/contest. Learn more on the New Yorker Caption Contest Website.

Oxford Humor in Context (OHIC):

  • oxford_hic_data.csv: A large CSV file containing all captions for all images along with their funny scores.
  • oxford_hic_image_info.csv: A CSV file containing links to all images.

This dataset focuses on internet memes and is less detailed than NYCC but still contains the most important information for our project. Learn more in the OHIC Research Paper.

While OHIC has fewer metadata fields, it complements NYCC by providing a broader perspective on humor in internet culture. Together, these datasets allow us to analyze humor across different formats and contexts.

We implemented helper functions to load captions in a nearly uniform format between the two datasets, ensuring consistency and removing lines with missing values. Now, let’s dive into the data and uncover what makes humor tick!

Quiz Time!

Question 1: Which dataset do you think focuses on internet memes?

New Yorker Caption Contest (NYCC)
Oxford Humor in Context (OHIC)

Exploratory Data Analysis

Our analysis revealed key insights into the datasets, focusing on the distribution of captions, voting patterns, and the relationship between caption rank and funniness scores. Below, we highlight three key visualizations, while also summarizing other analyses conducted during this study.

1. Distribution of Captions per Image:

Distribution of Captions per Image

In the NYCC dataset, 80% of images have between 4,000 and 8,000 captions, with outliers ranging from 1,066 to 15,329 captions. In contrast, the OHIC dataset shows a power-law distribution, where a few images have many captions (up to 70,000), but 80% of images have only around 20 captions.

2. Normalized Distribution of Votes by Humor Category:

Normalized Distribution of Votes by Humor Category

The distribution of votes across humor categories reveals that "Not Funny" dominates, especially in the mid-range (50–150 votes). "Somewhat Funny" peaks at lower vote counts (10–40), while "Funny" is heavily concentrated near 0–10 votes. This reflects the subjective nature of humor and the difficulty of creating universally appealing captions.

3. Caption Rank vs. Mean Funniness Score:

Caption Rank vs. Mean Funniness Score

Captions with the best ranks (1–10) have distinctly higher mean funniness scores, averaging around 1.9. Beyond rank 10, the scores drop sharply and stabilize near the minimum. Captions that stand out tend to receive higher funniness scores, but the majority remain at the lower end of the scale, most low-ranked captions are uniformly mediocre.

Additional Analyses Conducted:

  • Correlation Heatmap (NYCC Features): Explored the relationships between numeric features, revealing strong correlations between vote categories (e.g., "Funny," "Somewhat Funny," "Not Funny").
  • Evolution of Captions and Votes Over Time: Analyzed how the total number of captions and votes evolved throughout the contest, showing significant variability in engagement across images.
  • Caption Rank vs. Votes: Investigated the relationship between caption rank and the number of votes received, highlighting that higher-ranked captions tend to receive more votes.
  • Boxplot of Mean Scores for Top-Ranked Captions: Visualized the variability in mean scores for top-ranked captions, showing that winning captions can vary significantly in their funniness scores.
  • Monotonic Relationship Between Rank and Funniness: Found that top-ranked captions have distinctly higher funniness scores, while lower-ranked captions stabilize near the minimum. The difference between mid- and low-ranked captions becomes less perceptible.

NYCC Insights:

  • The NYCC dataset has a more uniform distribution of captions per image, with most images receiving a moderate number of captions. This suggests a relatively balanced engagement across contests.
  • Higher-ranked captions tend to receive more votes across all categories ("Funny," "Somewhat Funny," and "Not Funny"), indicating that visibility plays a significant role in voting dynamics.
  • The audience for NYCC appears to be highly critical, with the majority of votes falling into the "Not Funny" category, reflecting the subjective nature of humor.

OHIC Insights:

  • The OHIC dataset is highly unbalanced, with a few images receiving a disproportionately large number of captions, while most images receive very few. This highlights the uneven engagement in this dataset.
  • Humor perception in OHIC follows a power-law distribution, where a small subset of captions dominates the "Funny" category, while the majority are rated as "Not Funny."
  • The dataset's focus on internet memes may contribute to its variability, as memes often rely on niche cultural references that may not resonate universally.

Then, we observe that humor is highly subjective, with most captions receiving low funniness scores. Captions with higher visibility tend to attract more votes across all categories, highlighting the role of exposure in voting dynamics. The NYCC dataset shows more balanced engagement due to its structured contests, while OHIC's meme-based content results in greater variability and imbalanced participation. These findings underscore the challenges of analyzing humor quantitatively and the importance of considering context and audience dynamics.

Quiz Time!

Question 2: What is the most common rating for captions in both datasets?

Not Funny
Somewhat Funny
Funny

Caption Length and Funniness

Our analysis shows that shorter captions tend to be funnier. The most frequent caption length is around 8 words, representing concise, one-sentence jokes. Longer captions, while occasionally effective, often lose their punch and are less likely to be rated as funny.

1. Distribution of Caption Lengths:

Distribution of Caption Lengths (in Words)

The distribution of caption lengths is right-skewed, with most captions being short. The most frequent length is around 8 words, while very few captions exceed 20 words. This suggests that brevity is a hallmark of humor, as punchy, economical language tends to dominate.

2. Caption Length vs. Mean Funniness Votes:

Caption Length vs. Mean Funniness Votes

Our scatterplot analysis, which focuses on captions below 50 words, revealed a weak negative trend: as captions get longer, their average funniness tends to decrease slightly. Shorter captions (under 20 words) are generally rated as funnier, as concise, punchy humor tends to perform better. This trend suggests that brevity might play a role in crafting more effective jokes, though the relationship is not very strong.

Quiz Time!

Question 3: What is the most frequent caption length in the New Yorker dataset?

Less than 5 words
8 words
20+ words

What Makes a Winning Caption?

Winning captions attract significantly more votes than average captions. This is a striking contrast to the average caption, which receives far fewer votes. Despite their success, winning captions can still polarize readers. They often receive a mix of positive and negative votes, showing that even the most popular captions do not appeal to everyone.

1. Total Votes vs. Winning Votes:

Total Votes vs. Winning Votes (Log Scale)

Winning captions dominate the voting landscape, gathering around 100 times more votes than the average caption. This dominance is evident in the log-scale plot, where winning captions consistently attract a significant share of total votes. On average, they concentrate around 1% of all votes, highlighting their ability to capture the audience's attention and engagement.

2. Correlation Between Vote Categories:

Correlation Between Vote Categories

Our analysis revealed a clear correlation between the number of votes in each category ("Funny," "Somewhat Funny," "Not Funny") for winning captions. This indicates that winning captions tend to attract attention across all voting categories, further emphasizing their polarizing nature. Even the most popular captions evoke mixed reactions, reflecting the subjective nature of humor.

Interestingly, even winning captions, while receiving the highest scores, still show variability in their funniness ratings. This highlights the subjective nature of humor perception, where even the most popular captions do not appeal to everyone.

Quiz Time!

Question 4: What percentage of total votes do winning captions typically attract?

0.1%
1%
10%

Identifying Humor Types

Humor is deeply subjective, and its classification presents unique challenges. To address this, we categorized captions into six predefined humor types: Affiliative, Sexual, Offensive, Irony/Satire, Absurdist, and Dark. These categories were chosen to reduce ambiguity, ensure clear distinctions, and make the classification process manageable.

The Challenge of Subjectivity and Overlap: Humor styles often overlap, and jokes can fit into multiple categories simultaneously. For example, a caption might be both Dark and Ironic. To address this, we constrained the classification to six clear types, ensuring consistency and interpretability.

The Role of Context: Context plays a critical role in humor interpretation. Without visual or situational context, even advanced models struggle to classify certain captions accurately. For instance, the phrase "That's nice of you to arrest me" requires visual context to confirm its ironic intent.

1. Confusion Matrix

Confusion Matrix: Auto-labeler vs. Human Labels

The confusion matrix compares the auto-labeler's predictions with human-labeled data. The model achieved an impressive 98.6% accuracy, with high precision in categories like Sexual and Offensive humor. Minor misclassifications occurred in overlapping categories, such as Affiliative vs. Dark humor.

2. Humor Type Analysis

Humor Type Distributions (NYCC vs. OHIC)

Humor Type Distributions

Radar Chart: Humor Type Proportions (NYCC vs. OHIC)

Radar Chart: Humor Type Proportions

The analysis of humor types reveals distinct patterns between the datasets, reflecting their unique contexts and audiences:

  • NYCC: Dominated by Irony/Satire (34.6%) and Absurdist humor (32.4%). These humor types reflect the New Yorker’s focus on subtle, layered humor that often relies on intellectual engagement and cultural critique. The prevalence of Irony/Satire suggests that NYCC captions frequently use wit and indirect commentary to address societal norms, while Absurdist humor highlights the use of surreal or illogical scenarios to provoke thought and amusement.
  • OHIC: Dominated by Affiliative humor (48%), which emphasizes relatability and shared emotions. This humor type often draws on everyday experiences and interpersonal dynamics, making it more accessible to a broader, internet-based audience. The focus on Affiliative humor in OHIC reflects its tendency to create connections through lighthearted and emotionally resonant content.

These differences highlight the contrasting humor strategies of the two datasets. NYCC humor leans toward intellectual subtext, often requiring readers to interpret subtle cues and layered meanings. In contrast, Oxford humor is more direct and emotionally engaging, relying on relatability and shared cultural experiences to connect with its audience. This divergence underscores the influence of context and audience on the thematic and stylistic choices in humor.

Overall, our analysis demonstrates the distinct humor profiles of NYCC and Oxford datasets. NYCC humor relies on subtlety and intellectual engagement, while Oxford humor emphasizes relatability and directness. These findings underscore the importance of context, audience, and cultural factors in shaping humor styles.

Quiz Time!

Question 5: Which humor type is most prevalent in the Oxford dataset?

Irony/Satire
Affiliative
Absurdist

Identifying Topics in Captions

We classified captions into predefined topics (e.g., Love, Family, School, Work, Politics and Social Issues, Nature, Food, Emotions, Entertainment and Pop Culture, Caption Contest, Health, Law, Other) to explore the thematic dimensions of humor in the datasets.

We used a sentence embedding approach to classify captions into topics. For a detailed explanation of the classification process, see the Models & Methods: Topic Classification section.

1. Topic Distributions:

Topic Distributions (NYCC vs. OHIC)

We also created an interactive pie chart to visualize the topic distributions in both datasets. You can explore it below:

This figure compares the normalized topic distributions of captions from NYCC and OHIC. While both datasets cover a similar range of themes, clear differences emerge in their relative emphasis across topics.

NYCC captions show a stronger concentration in Food, Work, Nature, and Health. Food is the most prominent topic, reflecting shared cultural experiences around eating, while Work and Nature highlight professional life and environmental humor.

In contrast, OHIC captions are dominated by Caption Contest, reflecting its self-referential and meme-driven nature. OHIC also emphasizes School, Emotions, Entertainment and Pop Culture, Law, and Love, showcasing a more expressive and pop-culture-oriented humor style.

Politics and Social Issues are limited in both datasets but slightly more represented in OHIC. These differences highlight contrasting humor strategies: NYCC favors situational and observational humor, while OHIC leans toward self-referential and emotionally expressive humor. A limitation of this analysis is that the content of the images, which could heavily influence caption themes, is not considered.

2. Topic vs. Funniness Scores:

Topic vs. Funniness Scores

This figure shows the distribution of funny scores across topics for both NYCC and OHIC captions. While the distributions reveal substantial overlap, humor appreciation is influenced more by caption quality, phrasing, and context than by topic alone.

For most topics, including Food, Entertainment and Pop Culture, Caption Contest, and Law, NYCC and OHIC captions achieve similar median funny scores, indicating comparable humorous effectiveness. However, Politics and Social Issues is the only topic where OHIC captions outperform NYCC, likely due to a more direct engagement with contemporary political content. In contrast, NYCC captions score higher for topics like Love, Family, School, Nature, and Health, suggesting more consistent humor in these themes.

The high variance in funny scores across all topics, as shown by wide interquartile ranges and long whiskers, highlights that humor effectiveness depends on execution rather than thematic category. The Other category, as expected, shows the widest and most uneven distribution due to its heterogeneous nature.

Overall, the topic classification highlights the diverse thematic dimensions of humor in the datasets. Relational topics like Love and Family resonate broadly with audiences, with NYCC captions achieving higher funniness scores in these themes. Institutional topics like Health and Law provide humor rooted in societal critique, with both datasets showing comparable effectiveness in Law. The prominence of topics like Nature in NYCC reflects its focus on observational humor, while the higher representation of Politics and Social Issues in OHIC highlights its engagement with contemporary political content. These findings underscore the role of context, stylistic choices, and audience preferences in shaping the thematic content and reception of humor.

Quiz Time!

Question 6: Which topic is the only one where OHIC captions outperform NYCC in funniness scores?

Love
Nature
Politics

Humor Types by Topic

This heatmap reveals how different topics influence and dictate the type of humor used in cartoon captions. The analysis highlights the interplay between humor types and thematic topics, showcasing how context shapes the humor style.

Proportion of Humor Types per Topic

Key insights from the heatmap include:

  • Irony and Satire: Dominates across all topics, especially in Politics and Social Issues and Work. Political humor is primarily critical or observational, while office-related humor often highlights the gap between corporate expectations and reality.
  • Absurdist: Prominent in Food and Nature, often stemming from personifying animals or objects (e.g., a talking leaf). It also plays a significant role in Caption Contest captions, aligning with the visually bizarre nature of the cartoons.
  • Affiliative: Most prominent in interpersonal topics like School, Family, and Love. This humor type fosters relatability, creating a "we've all been there" connection with the audience.
  • Dark: While generally low across the dataset, it peaks in Health and Emotions, reflecting the grim reality and fatalistic attitude associated with health issues and psychological themes like therapy and depression.
  • Offensive and Sexual: Extremely low across all topics, except for Love in the case of Sexual humor. This reflects NYCC’s editorial standards and its high-brow audience, which prioritizes indirect humor over visceral humor.

Overall, this analysis underscores how humor types adapt to the thematic context of captions. While some topics favor specific humor styles (e.g., Irony in Politics), others like Food and Nature lean toward Absurdist humor, reflecting the creative interplay between visuals and captions.

Word Clouds

Word clouds reveal the most frequent words associated with each humor type and topic, providing a quick overview of recurring themes in the datasets. These visualizations help identify the dominant vocabulary and recurring patterns in captions.

1. Word Clouds for Humor Type:

For humor types, the word clouds highlight the distinct vocabulary used in different styles of humor, reflecting the unique contexts and audiences of the NYCC and OHIC datasets:

Word Clouds for Humor Types (NYCC) Word Clouds for Humor Types (OHIC)
  • Affiliative: In NYCC, words like "home," "feel," and "right" emphasize a sense of belonging and emotional connection. In OHIC, words like "meme," "day," and "mom" reflect humor rooted in everyday life and internet culture.
  • Irony/Satire: NYCC captions use words like "sure," "meant," and "guess," highlighting subtle and indirect commentary. OHIC captions, with words like "money," "right," and "upvote," focus on social critique and internet-specific humor.
  • Absurdist: NYCC captions feature words like "sorry," "little," and "way," showcasing surreal or illogical scenarios. OHIC captions, with phrases like "oh man," "eat," and "right," lean into exaggerated and nonsensical humor.
  • Dark: Both datasets share a focus on grim themes, with words like "death," "die," and "life" in NYCC and "kill," "dead," and "die" in OHIC, reflecting humor that makes light of taboo or morbid topics.
  • Sexual: NYCC captions use words like "big," "size," and "honey," suggesting subtle innuendos. OHIC captions, with words like "girl," "ass," and "sex," are more explicit and direct in their approach.
  • Offensive: NYCC captions include words like "Trump," "mother," and "white," reflecting political and societal critique. OHIC captions, with words like "gay," "stupid," and "karen," lean into internet-driven, provocative humor.

This analysis highlights the contrasting humor strategies of the two datasets. NYCC humor often relies on subtlety, cultural critique, and layered meanings, while OHIC humor is more direct, internet-centric, and emotionally charged. These differences underscore the influence of context and audience on the vocabulary and themes used in humor.

2. Word Clouds for Topics:

For topics, the word clouds illustrate the thematic focus of captions:

Word Clouds for Topics (NYCC) Word Clouds for Topics (OHIC)

NYCC Caption Dataset:

  • Love: Dominated by words like kiss, heart, marriage, and sex, reflecting interpersonal relationships and modern dating life.
  • Family: Features terms like home, Mom, and kid, emphasizing domestic life and parental relationships.
  • School: Includes school, class, and college, highlighting educational settings and recurring routines.
  • Work: Words like job, boss, and office point to workplace stress and corporate culture.
  • Politics and Social Issues: Focuses on politics, debate, and climate change, emphasizing systemic critique.
  • Nature: Includes trees, planet, and global warming, blending environmental concerns with humor.
  • Food: Features restaurant, pizza, and menu, reflecting shared cultural experiences around eating.
  • Emotions: Marked by introspective terms like feel, cry, and therapy, emphasizing emotional vulnerability.
  • Entertainment and Pop Culture: Includes movie, TV, and star, reflecting humor grounded in media and celebrity culture.
  • Caption Contest: Self-referential terms like caption, joke, and funny dominate.
  • Health and Law: Specialized vocabularies focus on doctor, hospital, lawyer, and court.
  • Other: Generic terms like thing and people, serving as a residual category.

OHIC Caption Dataset:

  • Love: Words like relationship, crush, and friend emphasize modern dating culture and youth-oriented humor.
  • Family: Informal terms like mom, dad, and house focus on everyday domestic situations.
  • School: Includes teacher, homework, and exam, reflecting shared student struggles.
  • Work: Words like job, boss, and money highlight time pressure and compensation.
  • Politics and Social Issues: Features government, election, and freedom, with explicit references to contemporary figures like Trump.
  • Nature: Focuses on tree, sky, and rain, emphasizing situational contrasts over climate discourse.
  • Food: Includes chicken, Taco Bell, and pizza, reflecting humor tied to pop culture and consumption habits.
  • Emotions: Words like sad, happy, and anxiety emphasize explicit emotional expression.
  • Entertainment and Pop Culture: Features anime, Fortnite, and Disney, relying on internet culture and fandoms.
  • Caption Contest: Includes meme, post, and funny, highlighting meta-awareness of captioning as a social activity.
  • Health: Pandemic-related terms like covid and vaccine dominate.
  • Law: Includes lawyer, court, and crime, framed through dialogue and interpersonal tension.
  • Other: Generic conversational tokens like day and thing.

Quiz Time!

Question 7: Based on the word clouds, which topic is most associated with the word "global warming"?

Nature
Politics
Health

Stylistic Analysis

Our stylistic analysis focused on three key aspects of the captions: word length distributions, formality scores, and part-of-speech distributions. These metrics provide insights into the linguistic characteristics of captions in the NYCC and OHIC datasets, highlighting differences in tone, structure, and word usage.

1. Word Length Distributions:

Word Length Distribution: NYCC vs OHIC

The word length distributions reveal that NYCC captions are more concentrated in length, particularly for the most highly ranked captions, while OHIC captions show a broader distribution. Interestingly, longer captions are more represented among the funniest captions in the OHIC dataset, suggesting a preference for more elaborate humor.

2. Part-of-Speech Distributions:

The part-of-speech distributions highlight differences in word usage between the two datasets. NYCC captions feature a higher density of nouns, adjectives, and verbs, reflecting a more descriptive and action-oriented style. OHIC captions, on the other hand, show a higher density of interjections and unknown words, indicative of a more casual and expressive tone. The higher proportion of interjections in OHIC aligns with its internet-based context, where captions often rely on exaggerated expressions and slang.

3. Formality Score Distributions:

The formality score distributions reveal distinct patterns between the two datasets. Captions in the NYCC dataset tend to have higher formality scores, with a peak in the mid-range, reflecting a balance between conversational and formal tones. In contrast, captions in the OHIC dataset exhibit a broader distribution, with a higher density of lower formality scores, indicating a more casual and internet-driven tone. This difference aligns with the institutional context of NYCC and the internet-based, meme-driven nature of OHIC.

Notably, the mean formality for NYCC captions is slightly higher than that of a US president speech, while the mean formality for OHIC captions is comparable to one, right above that of a script from "The Simpsons". Additionally, the formality scores for OHIC captions show higher variability compared to NYCC, reflecting a broader range of tones in the OHIC dataset. However, both datasets exhibit much lower formality than a typical New Yorker article.

These stylistic differences underscore the influence of context and audience on the linguistic characteristics of captions. While NYCC captions balance formality and descriptive language to appeal to a broader audience, OHIC captions embrace a more casual and expressive style, resonating with internet culture.

Quiz Time!

Which dataset shows a higher density of interjections?

OHIC
NYCC

How Humor Evolves Over Time

The evolution of humor topics in the New Yorker captions reveals significant trends over time, reflecting societal events and cultural shifts. By analyzing the proportions of topics across years, we can observe how external factors influence the thematic focus of captions.

1. Humor Types Proportions Over Time:

Proportion of Humor Types per Year (%)

This figure shows the yearly proportions of humor types in NYCC captions. The shift in humor styles reveals how the public adapted to the "shared challenges" of the last decade:

  • 2016–2019: The "Ironic" Era: This period is dominated by Irony/Satire (peaking at 55.3% in 2017) and Absurdist humor. Interpretation: Users relied on sophisticated irony and surrealism to critique a turbulent political landscape. The high levels of absurdism suggest a preference for "nonsense" as a way to handle a world that felt increasingly illogical, while maintaining emotional distance.
  • 2020–2021: The Reorientation: A massive structural shift occurred in 2021, with Affiliative humor jumping from 7.3% to 33.5%. Interpretation: This period marked a moment of reconnection. As the initial shock of the pandemic faded into a "new normal," humor shifted from "cleverness" to "shared humanity," prioritizing jokes that build bridges over jokes that mock or distort.
  • 2022–2024: The "Relatability" and "Provocation" Era: Affiliative humor reached its zenith in 2023 (43.9%), representing a permanent shift toward "bottom-up" relatability. Interpretation: After prolonged isolation, the collective priority shifted to humor that validates shared social experiences. Additionally, 2023 saw a spike in Sexual (5.0%) and Offensive humor, reflecting a "social unmasking" as the public moved past restrictions and tested the limits of new norms.

2. Topic Proportions Over Time:

Word Length Distribution: NYCC vs OHIC

This figure presents a line plot of normalized topic distributions in NYCC captions over time. It highlight the dominance of certain themes, such as Food and Caption Contest, which consistently maintain high proportions across the years. These topics reflect universal and timeless themes that resonate broadly with audiences. Other topics, such as Love, Family, and Nature, remain steady over time, with occasional spikes reflecting specific cultural or societal moments.

3. Topic Variation Heatmap:

This figure presents a heatmap of the standardized variation (z-scores) of topic proportions in NYCC captions over time. Deviations from the mean highlight periods during which certain themes became unusually prominent or marginal, often aligning with major social, political, and global events.

The heatmap reveals several notable trends:

  • Love and Family: These themes remain steady over the years, with sharp rises in 2019 for Love and 2022 for Family, possibly reflecting cultural shifts or societal well-being.
  • School: Mentions of School increase substantially in 2021 and 2023, aligning with Covid’s long-term impact on education and the launch of ChatGPT, which affected university students.
  • Work: Overrepresented in 2016, reflecting pre-pandemic concerns about corporate culture, with renewed interest in 2023 linked to post-pandemic work reorganization.
  • Politics and Social Issues: Peaks in 2016 and 2019 align with the U.S. presidential election and Trump impeachment proceedings, followed by a decline after 2020 as political satire gave way to more personal themes.
  • Nature: Peaked in 2024, reflecting increased public concern over climate change and environmental issues.
  • Food: Consistently the most prominent topic, reflecting its universal appeal and cultural significance.
  • Emotions: Declined in 2017 but rose steadily afterward, capturing post-pandemic emotional reflection and a broader openness to discussing mental health.
  • Entertainment and Pop Culture: Steady over the years, with a slight increase from 2022, reflecting the rise of TikTok, Instagram reels, and global events like the Oscars and World Cup.
  • Caption Contest: Dominates NYCC captions alongside Food, with a sharp rise in 2024, possibly due to increased audience familiarity with the contest format.
  • Health: Peaked in 2020 and 2023 due to the Covid pandemic and its aftermath, with a decline in 2024 reflecting pandemic fatigue.
  • Law: Peaked in 2016 during election-related legal disputes, with a decline in 2022–2024 as media focus on high-profile court cases waned.

Overall, these analyses demonstrate how external events and societal changes shape the thematic focus of humor over time. By examining these trends, we gain insights into the dynamic interplay between cultural context and humor.

Quiz Time!

Question 9: What influenced the rise in captions about politics and law in 2016?

The U.S. presidential election
The global pandemic
Brexit

Conclusion: NYCC and OHIC Comparison

The New Yorker Caption Contest dataset (NYCC) and Oxford Humor in Context dataset (OHIC) share similar humor distributions, but they differ in context, audience preferences, and humor styles. Below is a comparison of the two datasets:

NYCC OHIC
Context Institutional humor, focused on cartoons and structured contests. Internet culture, focused on memes and pop culture.
Audience Niche, sophisticated audience with a preference for subtle, intellectual humor. Broader, younger, internet-savvy audience with a preference for relatable and expressive humor.
Humor Style Subtle, clever wordplay, and observational humor. Direct, absurd, and emotionally expressive humor.
Structure Highly structured, with detailed metadata (e.g., votes, captions, image descriptions). Less structured, with fewer metadata fields but broader coverage of internet humor.
Engagement Balanced engagement across contests, with most images receiving a moderate number of captions. Highly uneven engagement, with a few images dominating in captions and votes.
Caption Length Shorter captions (~8 words) perform better, emphasizing brevity and punchiness. Shorter captions also perform better, though longer captions can succeed with elaborate humor.
Stylistic Features Higher formality, descriptive language, and balanced tone. Casual, expressive tone with a higher density of interjections and slang.
Topic Focus Dominated by universal themes like Food, Work, and Nature. Dominated by internet-driven themes like Caption Contest, Entertainment, and Emotions.
Temporal Trends Peaks in Irony/Satire (2016–2019) and Affiliative humor (2020–2024). Increased focus on Politics and Pop Culture during societal shifts.
Humor Distribution Follows a power-law distribution, with a small subset of captions dominating funniness scores. Also follows a power-law distribution, with a few captions dominating the "Funny" category.

This comparison highlights how context influences humor preferences. NYCC leans toward structured, institutional humor, appealing to a niche audience, while OHIC embraces the dynamic, fast-paced nature of internet culture, resonating with a broader audience.

Quiz Time!

Question 10: Which dataset shows highly uneven engagement, with a few images dominating in captions?

New Yorker Caption Contest (NYCC)
Oxford Humor in Context (OHIC)

Explore More

Dive deeper into our datasets, methods, and findings: