Home Categories About Call for Papers

MLSP 2024 : Multilingual Lexical Simplification Pipeline (MLSP) Shared Task @ 19th Workshop on Innovative Use of NLP for Building Educational Applications

Mexico City

Event Date:	June 21, 2024 - June 21, 2024
Submission Deadline:	March 25, 2024

Call for Papers

The organisers are pleased to announce a new shared task, inviting participants to contribute novel systems for a Multilingual Lexical Simplification Pipeline. This task comprises lexical complexity prediction and lexical simplification, uniting these two core simplification tasks into a single pipeline. We invite participants to develop new lexical simplification systems for these two tasks in a variety of high- and low-resource languages (listed below).

This shared task will be hosted at the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024), which will be colocated with the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024) in Mexico City June 21-22nd.

Lexical complexity prediction was previously explored as part of the LCP 2021 shared task, hosted as part of SemEval 2021 (Shardlow et al. 2021). Participants were presented with a given word in a sentence and asked to evaluate its complexity on a continuous scale. This task requires participants to judge the difficulty of a given target word within a context on a continuous scale in the range 0 (easy to understand) to 1 (hard to understand).

Lexical simplification has also recently been explored at the TSAR 2022 shared task (Saggion et al. 2022), hosted as part of the Text Simplification, Accessibility and Readability Workshop at EMNLP 2022. In this task, systems must provide easier to understand alternatives for a given identified complex word in its context.

The lexical simplification pipeline unites these two tasks. Given a sentence with a marked token, the system must first make a prediction regarding the complexity of that token and secondly provide potential simpler alternatives for the token, or none if the token is judged to not require simplification. By co-developing systems to jointly perform these tasks, participants will create a working lexical simplification pipeline system that can be applied in settings such as education to improve the readability of texts for learners.

**Languages**

We will provide evaluation data for the following languages:

- English (en)
- French (fr)
- Brazillian Portuguese (pt-br)
- Bengali (bn)
- Sinhala (si)
- Filipino (fil)
- Japanese (jp)
- Italian (it)

We also hope to announce at least three further languages for participation.

Participants are free to submit to one or multiple languages. We strongly encourage submissions from multilingual systems that are capable of handling the languages that we have released and further languages beyond the scope of the task. We will provide a separate ranking for multilingual systems that participate in all languages.

**Dataset Format**

There is now a glut of available resources for simplification tasks such as lexical complexity prediction and lexical simplification. As such, each language will provide an unlabelled **test set only** comprising of 570 instances. Labelled trial data will also be released comprising of 30 instances per languages for the purpose of calibrating systems for the evaluation phase. **We will not release new training data for this task.** Participants are encouraged to make use of the many existing resources for lexical complexity prediction and lexical simplification to train their systems. A list of available resources will be hosted on the shared task website.

Each data instance in the trial data will comprise of the following fields: *language, token, begin, end, context, complexity, substitutions*. These are described below:

- Language: The language code for this instance
- Token: The identified (whole-word) token to be evaluated
- Begin: the begin-offset of the token in the context
- End: the end-offset of the token in the context
- Context: the context in which this token appeared. Typically, but not limited to the enclosing sentence boundaries.
- Complexity: A complexity score bounded in the range 0-1 derived from asking 10 annotators to judge the token in its context on a scale of 1 (easy) to 5 (difficult).
- Substitutions: A list of no more than 10 substitutions ranked by frequency of suggestion by the annotators.

Each data instance in the test data will comprise of the following fields: *language, token, begin, end, context*. Participant systems will provide the ‘complexity’ and ‘substitutions’ fields in the same format as the trial data.

**Evaluation**

For Lexical Complexity Prediction, we will evaluate using:

**Root Mean Squared Error** calculated between the system outputs for lexical complexity and the values returned by the annotators. See Shardlow et. al (2021) for details.

For Lexical Simplification We will use two metrics defined in Saggion et al. (2022) as follows:

**MAP@K** uses a ranked list of system-generated substitutes against the set of gold-standard substitutes. MAP@k takes into account the position of the relevant substitutes among the first k generated candidates.

**Accuracy@k@top1**, which is the percentage of instances where at least one of the k top ranked substitutes matches the most frequently suggested synonym in the gold data.

We will also provide **Human End-to-End Evaluation** for:

**Simplicity**,

**Fluency** and

**Meaning Preservation**.

Human evaluation will take place for the top 5 ranking systems according to the automated metrics. Availability of human evaluation will depend on the recruitment of evaluators from the task participants.

**Participant Registration**

Interested parties can register prior to the Trial Data Release via our [participant registration Google Form](https://sites.google.com/d/151nOTm4Lwla2MXolnTgNSk6hNQoCaruX/p/1BWd0x4Q2v8nBJZSslymvPUd2vkzpwCWO/edit)

Further information will be released through [the MLSP shared task website](https://sites.google.com/view/mlsp-sharedtask-2024/home)

**Timeline**

| Fri Feb 16 , 2024 | Trial Data Release |
| --- | --- |
| Fri Mar 15 , 2024 | Test Data Release |
| Mon Mar 25, 2024 | Final Submissions |
| Fri Apr 12, 2024 | System Papers Due |
| Fri Jun 21 2024 | BEA Workshop |

**Organisers**

| Matthew Shardlow | Manchester Metropolitan University |
| --- | --- |
| Marcos Zampieri | George Mason University |
| Kai North | George Mason University |
| Fernando Alva-Manchego | Cardiff University |
| Thomas François | UCLouvain |
| Remi Cardon | UCLouvain |
| Nishat Raihan | George Mason University |
| Tharindu Ranasinghe | Aston University |
| Joseph Imperial | University of Bath |
| Riza Batista-Navarro | University of Manchester |
| Adam Nohejl | NAIST |
| Yusuke Ide | NAIST |
| Akio Hayakawa | Universitat Pompeu Fabra |
| Laura Occhipinti | University of Bologna |
| Horacio Saggion | Universitat Pompeu Fabra |

**References**

Saggion, H., Štajner, S., Ferrés, D., Sheang, K.C., Shardlow, M., North, K. and Zampieri, M., 2022, December. Findings of the TSAR-2022 Shared Task on Multilingual Lexical Simplification. In *Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)* (pp. 271-283).

Shardlow, M., Evans, R., Paetzold, G. and Zampieri, M., 2021, August. SemEval-2021 Task 1: Lexical Complexity Prediction. In *Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)* (pp. 1-16).

Summary

MLSP 2024 : Multilingual Lexical Simplification Pipeline (MLSP) Shared Task @ 19th Workshop on Innovative Use of NLP for Building Educational Applications will take place in Mexico City. It’s a 1 day event starting on Jun 21, 2024 (Friday) and will be winded up on Jun 21, 2024 (Friday).

MLSP 2024 falls under the following areas: NATURAL LANGUAGE PROCESSING, ARTIFICIAL INTELLIGENCE, etc. Submissions for this Workshop can be made by Mar 25, 2024.

Please check the official event website for possible changes before you make any travelling arrangements. Generally, events are strict with their deadlines. It is advisable to check the official website for all the deadlines.

Other Details of the MLSP 2024

Short Name: MLSP 2024
Full Name: Multilingual Lexical Simplification Pipeline (MLSP) Shared Task @ 19th Workshop on Innovative Use of NLP for Building Educational Applications
Timing: 09:00 AM-06:00 PM (expected)
Fees: Check the official website of MLSP 2024
Event Type: Workshop
Website Link: https://sites.google.com/view/mlsp-sharedtask-2024/home
Location/Address: Mexico City

Credits and Sources

[1] MLSP 2024 : Multilingual Lexical Simplification Pipeline (MLSP) Shared Task @ 19th Workshop on Innovative Use of NLP for Building Educational Applications

MLSP 2024 : Multilingual Lexical Simplification Pipeline (MLSP) Shared Task @ 19th Workshop on Innovative Use of NLP for Building Educational Applications

Categories

Call for Papers

Summary

Other Details of the MLSP 2024

Credits and Sources

OTHER NATURAL LANGUAGE PROCESSING EVENTS

OTHER ARTIFICIAL INTELLIGENCE EVENTS