Datasets

New Yorker Caption Contest (NYCC):

Data sourced from the New Yorker Caption Contest archives, containing captions submitted to the weekly contest.

Oxford Humor in Context (OHIC):

Data sourced from the Oxford University Research Archive, containing approximately 2.9 million humorous image-text pairs.

Tools and Libraries

Python

Used for data preprocessing, analysis, and modeling.

Pandas, NumPy

Libraries for data manipulation and numerical computations.

Matplotlib, Seaborn

Libraries for data visualization.

SciPy

Used for statistical analysis, including correlation computations.

Plotly

Library for creating interactive visualizations.

Transformers

Used for generating embeddings and zero-shot classification.

Sentence Transformers

Used for generating sentence embeddings.

HTML5, CSS3, JS

Technologies used to design and develop this interactive website.

Explore More

Dive deeper into our datasets, methods, and findings: