IMPACT SCORE JOURNAL RANKING CONFERENCE RANKING Conferences Journals Workshops Seminars SYMPOSIUMS MEETINGS BLOG LaTeX 5G Tutorial Free Tools
CONDA 2024 : First Workshop on Data Contamination
CONDA 2024 : First Workshop on Data Contamination

CONDA 2024 : First Workshop on Data Contamination

Bangkok, Thailand
Event Date: August 16, 2024 - August 16, 2024
Submission Deadline: May 17, 2024
Notification of Acceptance: June 17, 2024
Camera Ready Version Due: July 01, 2024




Call for Papers


We invite you to participate and submit your work to the First Workshop on Data Contamination (CONDA) co-located with ACL 2024 in Bangkok, Thailand.

Data contamination, where evaluation data is inadvertently included in pre-training corpora of large scale models, and language models (LMs) in particular, has become a concern in recent times. The growing scale of both models and data, coupled with massive web crawling, has led to the inclusion of segments from evaluation benchmarks in the pre-training data of LMs. The scale of internet data makes it difficult to prevent this contamination from happening, or even detect when it has happened. Crucially, when evaluation data becomes part of pre-training data, it introduces biases and can artificially inflate the performance of LMs on specific tasks or benchmarks. This poses a challenge for fair and unbiased evaluation of models, as their performance may not accurately reflect their generalization capabilities.

Although a growing number of papers and state-of-the-art models mention issues of data contamination, there is no agreed-upon definition or standard methodology to ensure that a model does not report results on contaminated benchmarks. Addressing data contamination is a shared responsibility among researchers, developers, and the broader community. By adopting best practices, increasing transparency, documenting vulnerabilities, and conducting thorough evaluations, we can work towards minimizing the impact of data contamination and ensuring fair and reliable evaluations.

We welcome paper submissions on all topics related to data contamination, including but not limited to:

Definitions, taxonomies, and gradings of contamination
Contamination detection (both manual and automatic)
Community efforts to discover, report, and organize contamination events
Documentation frameworks for datasets or models
Methods to avoid data contamination
Methods to forget contaminated data
Scaling laws and contamination
Memorization and contamination
Policies to avoid impact of contamination in publication venues and open source communities
Reproducing and attributing results from previous work to data contamination
Survey work on data contamination research
Data contamination in other modalities

Submission Instructions
We welcome two types of papers: regular workshop papers and non-archival submissions. Regular workshop papers will be included in the workshop proceedings. All submissions must be in PDF format and made through OpenReview.

Regular workshop papers: Authors can submit papers up to 8 pages, with unlimited pages for references. Authors may submit up to 100 MB of supplementary materials separately and their code for reproducibility. All submissions undergo an double-blind single-track review. Best Paper Award(s) will be given based on nomination by the reviewers. Accepted papers will be presented as posters with the possibility of oral presentations.
Non-archival submissions: Cross-submissions are welcome. Accepted papers will be presented at the workshop, but will not be included in the workshop proceedings. Papers must be in PDF format and will be reviewed in a double-blind fashion by workshop reviewers. We also welcome extended abstracts (up to 2 pages) of papers that are work in progress, under review or to be submitted to other venues. Papers in this category need to follow the ACL format.

In addition to papers submitted directly to the workshop, which will be reviewed by our Programme Committee. We also accept papers reviewed through ACL Rolling Review and committed to the workshop. Please, check the relevant dates for each type of submission.

Important dates
Relevant deadlines to consider when submitting your paper are:

Paper submission deadline: May 17 (Friday), 2024
ARR pre-reviewed commitment deadline: TBD, 2024
Notification of acceptance: June 17 (Monday), 2024
Camera-ready paper due: July 1 (Monday), 2024
Workshop date: August 16, 2024

Contact

Website: https://conda-workshop.github.io/
Contact: [email protected]

Workshop organizers
Oscar Sainz, University of the Basque Country (UPV/EHU)
Iker García Ferrero, University of the Basque Country (UPV/EHU)
Eneko Agirre, University of the Basque Country (UPV/EHU)
Jon Ander Campos, Cohere
Alon Jacovi, Bar Ilan University
Yanai Elazar, Allen Institute for Artificial Intelligence and University of Washington
Yoav Goldberg, Bar Ilan University and Allen Institute for Artificial Intelligence



Summary

CONDA 2024 : First Workshop on Data Contamination will take place in Bangkok, Thailand. It’s a 1 day event starting on Aug 16, 2024 (Friday) and will be winded up on Aug 16, 2024 (Friday).

CONDA 2024 falls under the following areas: NLP, COMPUTATIONAL LINGUISTICS, ARTIFICIAL INTELLIGENE, etc. Submissions for this Workshop can be made by May 17, 2024. Authors can expect the result of submission by Jun 17, 2024. Upon acceptance, authors should submit the final version of the manuscript on or before Jul 1, 2024 to the official website of the Workshop.

Please check the official event website for possible changes before you make any travelling arrangements. Generally, events are strict with their deadlines. It is advisable to check the official website for all the deadlines.

Other Details of the CONDA 2024

  • Short Name: CONDA 2024
  • Full Name: First Workshop on Data Contamination
  • Timing: 09:00 AM-06:00 PM (expected)
  • Fees: Check the official website of CONDA 2024
  • Event Type: Workshop
  • Website Link: https://conda-workshop.github.io/
  • Location/Address: Bangkok, Thailand


Credits and Sources

[1] CONDA 2024 : First Workshop on Data Contamination


Check other Conferences, Workshops, Seminars, and Events


OTHER NLP EVENTS

SemDial 2024: The 28th Workshop on the Semantics and Pragmatics of Dialogue
Trento, Italy
Sep 11, 2024
GamesandNLP 2024: Games and NLP 2024 Workshop
Turin, Italy
May 21, 2024
GITT 2024: Second International Workshop on Gender-Inclusive Translation Technologies
Sheffield, UK
Jun 27, 2024
LoResMT 2024: The Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages
Bangkok, Thailand
Aug 15, 2024
SIGDIAL 2024: The 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Tokyo, Japan
Sep 18, 2024
SHOW ALL

OTHER COMPUTATIONAL LINGUISTICS EVENTS

SemDial 2024: The 28th Workshop on the Semantics and Pragmatics of Dialogue
Trento, Italy
Sep 11, 2024
GamesandNLP 2024: Games and NLP 2024 Workshop
Turin, Italy
May 21, 2024
GITT 2024: Second International Workshop on Gender-Inclusive Translation Technologies
Sheffield, UK
Jun 27, 2024
LoResMT 2024: The Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages
Bangkok, Thailand
Aug 15, 2024
SIGDIAL 2024: The 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Tokyo, Japan
Sep 18, 2024
SHOW ALL

OTHER ARTIFICIAL INTELLIGENE EVENTS

SemDial 2024: The 28th Workshop on the Semantics and Pragmatics of Dialogue
Trento, Italy
Sep 11, 2024
GamesandNLP 2024: Games and NLP 2024 Workshop
Turin, Italy
May 21, 2024
GITT 2024: Second International Workshop on Gender-Inclusive Translation Technologies
Sheffield, UK
Jun 27, 2024
LoResMT 2024: The Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages
Bangkok, Thailand
Aug 15, 2024
SIGDIAL 2024: The 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Tokyo, Japan
Sep 18, 2024
SHOW ALL