AmericasNLP 2026 Shared Task: Cultural Image Captioning for Indigenous Languages

The AmericasNLP 2026 Shared Task challenges participants to develop systems that generate accurate, culturally grounded captions for images depicting Indigenous cultures of the Americas, written in the Indigenous languages themselves.

GitHub: https://github.com/AmericasNLP/americasnlp2026

Motivation

Many Indigenous languages of the Americas are endangered and lack the resources needed to train NLP systems effectively. Language communities are actively pursuing revitalization, but creating culturally grounded teaching materials is expensive and time-consuming. Image captioning systems present an opportunity to generate such materials at scale, but doing so requires not only linguistic competence but also cultural knowledge — understanding the people, traditions, and contexts depicted in the images.

Task Description

Participants are given a dataset of culturally situated images, each paired with a caption in the associated Indigenous language. The goal is to generate captions for unseen images.

Example:

Image A wooden structure
Target Caption (Wixárika) Ik+ kareta m+ya kaxetuni wixárika wapait+ yu +kú puti utá, uti xainék+ metá tsiere manapait+ rá ye hupú.
English The so-called carretón, built specifically to store food like corn, is also used as housing for people.

Rules

Evaluation

We adopt a two-stage evaluation protocol:
  1. Stage 1: All systems are ranked using ChrF++.
  2. Stage 2: The top-5 systems are evaluated by human judges according to a fixed set of criteria.
Participants can enter for as many languages as they like; each language is evaluated separately. We provide an evaluation script and a baseline system to help get started.

Languages

Language Region
Bribri Costa Rica
Guaraní Paraguay
Yucatec Maya Mexico
Wixárika Mexico

Data

Pilot

Pilot data is available under data/pilot/. Each dataset is provided as a JSONL file with corresponding images. See data/pilot/wixarika.jsonl for an example.

⚠ Note: The pilot data includes Spanish captions for reference, but these are provided only in the pilot set. Spanish captions will not be included in the development or test sets and should not be relied upon for building systems.

Development

Development data is available under data/dev/ for Bribri, Guaraní, Maya and Wixárika. Each language folder contains a JSONL file and corresponding images.

Important Dates

Date Milestone
February 20, 2026 Release of pilot data and baseline system
March 1, 2026 Release of development sets (50 examples)
April 1, 2026 Release of surprise languages
April 20, 2026 Release of test sets
May 1, 2026 Submission of results (shared task deadline)
May 8, 2026 Winner announcement
May 13, 2026 Submission of system description paper
May 15, 2026 Acceptance notification for system description papers
May 22, 2026 Camera-ready version due

All deadlines are 11:59pm UTC-12h (AoE).

Registration

If you are interested in participating, please register here: Google Form

Contact

americas.nlp.workshop@gmail.com
Design: Rebeca Guerrero and Manuel Mager