IMPACT SCORE JOURNAL RANKING CONFERENCE RANKING Conferences Journals Workshops Seminars SYMPOSIUMS MEETINGS BLOG LaTeX 5G Tutorial Free Tools
FinSBD-2019 Shared Task 2019 : [IJCAI-2019] Call for participation: FinSBD-2019 Shared Task - Sentence Boundary Detection in PDF Noisy Text in the Financial Domain
FinSBD-2019 Shared Task 2019 : [IJCAI-2019] Call for participation: FinSBD-2019 Shared Task - Sentence Boundary Detection in PDF Noisy Text in the Financial Domain

FinSBD-2019 Shared Task 2019 : [IJCAI-2019] Call for participation: FinSBD-2019 Shared Task - Sentence Boundary Detection in PDF Noisy Text in the Financial Domain

Macao, China
Event Date: August 10, 2019 - August 12, 2019
Submission Deadline: May 27, 2019
Notification of Acceptance: June 17, 2019
Camera Ready Version Due: June 24, 2019




Call for Papers

Greetings,

We would like to invite you to submit to the shared task on Sentence Boundary Detection in PDF Noisy Text in the Financial Domain, in conjunction with IJCAI-2019 @ August 10-12, 2019, Macao, China!

Call for Participation: http://finnlp.nlpfin.com

Register: https://sites.google.com/nlg.csie.ntu.edu.tw/finnlp/shared-task-finsbd

Submission deadline: May 13, 2019
Workshop date: IJCAI-19 August 10-16 2019, Macao, China
------
Motivation
------

Sentences are basic units of the written language and detecting the beginning and end of sentences, or sentence boundary detection (SBD) is a foundational first step in many Natural Language Processing (NLP) applications, such as POS tagging; syntactic, semantic, and discourse parsing; information extraction; or machine translation.

Despite its important role in NLP, sentence boundary detection has so far not received enough attention. Previous research in the area has been confined to formal texts only (news, European Parliament proceedings, etc.) where existing rule-based and machine learning approaches are extremely accurate (when the data is perfectly clean). No sentence boundary detection research to date has addressed the problem in noisy texts extracted automatically from machine-readable formats (generally PDF file format) files such as financial documents.

In this shared task, we focus on extracting well-segmented sentences from Financial prospectuses by detecting their beginning and ending boundaries. These are official PDF documents in which investment funds precisely describe their characteristics and investment modalities. The most important step of extracting any information from these files is to parse them to get a noisy unstructured text, clean it, format information (by adding several tags) and finally, transform it into semi-structured text, where sentence boundaries are well marked.

------
Task Design
------

As part of the FinNLP, we present a shared task on sentence boundary detection in noisy text extracted from financial prospectuses, in two languages: English and French.

Systems participating in this shared task will be given a set of textual documents extracted from pdf files, which are to be automatically segmented to extract a set of well-delimited sentences (clean sentences).

Participants can choose to work on both languages, or submit systems for one language only.

In addition to the textual version of the documents, we will provide their PDF original files. Recommendations of additional language resources will also be listed/provided for some languages by the organizers.

------
Data Format:
------

In the provided dataset, participants will get a json format containing "text", that corresponds to the text to be segmented, begin_sentence and end_sentence correspond to all indexes of tokens marking the beginning and the end of well-formed sentences in the text. Notice that the provided text was already word tokenized using NLTK, participants should keep this tokenization as it is since all tokens indexes are instantiated based on it. The first token in the text will have then the index 0.

[{

'text': " UFF Sélection Alpha AINFORMATIONS CLÉS POUR L' INVESTISSEUR « Ce document fournit des informations essentielles aux investisseurs de cet OPCVM . Il ne s'agit pas d' un document promotionnel . Les informations qu ' il contient vous sont fournies conformément à une obligation l égale , afin de vous aider à comprendre en quoi consiste un investissement dans ce fonds et quels risques y sont associés . ..." ,

'begin_sentence': [8, 21, 31 , ...],

'end_sentence': [20, 30, 66, ...]

}]

All of the input text will be preprocessed in a common way to make sure all participants have access to all of these features at no additional overhead novelty cost. Rule-based, machine learning, deep learning, or hybrid techniques are all allowed.

Participants will get annotated training/dev data, and further a blind test data as a json format but with just the text. They should then predict the lists begin_sentence and end_sentence and submit the result in the same json format as of the training data.

------
Important dates
------

February 28, 2019: First announcement of the shared task and beginning of registration

March 7, 2019: Release of training data and scoring script

April 29, 2019: Registration deadline

May 6, 2019: Test set made available

May 13, 2019: Systems' outputs collected

May 27, 2019: Shared task system paper submissions due

June 17, 2019: Notification of acceptance

June 24, 2019: Camera-ready version of shared task system papers due

August 10-12, 2019: FinNLP 2019 Workshop in Macau

Read more:

FinNLP: http://finnlp.nlpfin.com
FinSBD: https://sites.google.com/nlg.csie.ntu.edu.tw/finnlp/shared-task-finsbd
IJCAI-19: https://ijcai19.org/

Sincerely,

The FinSBD Organizers

IJCAI-19



Credits and Sources

[1] FinSBD-2019 Shared Task 2019 : [IJCAI-2019] Call for participation: FinSBD-2019 Shared Task - Sentence Boundary Detection in PDF Noisy Text in the Financial Domain


Check other Conferences, Workshops, Seminars, and Events


OTHER NLP EVENTS

SIR 2025: First Workshop on Semantics for Interdisciplinary Research SIR@IXCS2025
Düsseldorf, Germany
Sep 24, 2025
NLLP 2025: 7th Workshop on Natural Legal Language Processing
Suzhou, China
Nov 8, 2025
SymGenAI4Sci 2025: SymGenAI4Sci Workshop on Symbolic and Generative AI for Science, taking place as part of SEMANtiCS 2025
Vienna, Austria
Sep 3, 2025
TSAR 2025: Fourth Workshop on Text Simplification, Accessibility and Readability
Suzhou, China
Nov 5, 2025
OMMM 2025: Second CFP - Interdisciplinary Workshop on Observations of Misunderstood, Misguided and Malicious Use of Language Models
Varna, Bulgaria
Sep 11, 2025
SHOW ALL

OTHER MACHINE LEARNING EVENTS

ArIT 2025: 6th International Conference on Advances in Artificial Intelligence Techniques
Toronto, Canada
Jul 19, 2025
ICSIE--EI 2026: 2026 14th International Conference on Software and Information Engineering (ICSIE 2026)
Himeji, Japan
Jan 16, 2026
ICoSSE--Ei 2026: 2026 9th International Conference on Software and System Engineering (ICoSSE 2026)
Lyon, France
Apr 13, 2026
ICHCSC 2025: 4th International Conference on Human-Centric Smart Computing (ICHCSC 2025)
Jaipur, India
Oct 10, 2025
CMLA 2025: 7th International Conference on Machine Learning & Applications
Toronto, Canada
Jul 19, 2025
SHOW ALL