IMPACT SCORE JOURNAL RANKING CONFERENCE RANKING Conferences Journals Workshops Seminars SYMPOSIUMS MEETINGS BLOG LaTeX 5G Tutorial Free Tools
FinTOC 2023 : FNP-2023 Shared Task - FinTOC (Financial Document Structure Extraction)
FinTOC 2023 : FNP-2023 Shared Task - FinTOC (Financial Document Structure Extraction)

FinTOC 2023 : FNP-2023 Shared Task - FinTOC (Financial Document Structure Extraction)

Sorrento, Italy
Event Date: December 15, 2023 - December 18, 2023
Submission Deadline: October 18, 2023
Notification of Acceptance: November 01, 2023
Camera Ready Version Due: November 15, 2023




Call for Papers

Call for participation:

FNP-2023 Shared Task: FinTOC - Financial Document Structure Extraction

Practical Information:
To be held as part of the 5th Financial Narrative Processing Workshop (FNP 2023) during the 2023 IEEE International Conference on Big Data (IEEE BigData 2023), Sorrento, Italy, from 15th December to 18th December, 2023. It is a one-day event of which the exact date is to be announced.
===================

Shared Task URL: http://wp.lancs.ac.uk/cfie/fintoc2023/
Workshop URL: https://wp.lancs.ac.uk/cfie/fnp2023/
Participation Form: https://docs.google.com/forms/d/e/1FAIpQLSdqUKy3YGho0Cw2GF__VHilHZZbR75UDG3JRBC4k0Yxw4acWg/viewform?usp=pp_url
___________________________________________________________


Shared Task Description:

A vast and continuously growing volume of financial documents are being created and published in machine-readable formats, predominantly in aPDF format. Unfortunately, these documents often lack comprehensive structural information, presenting a challenge for efficient analysis and interpretation. Nevertheless, these documents play a crucial role in enabling firms to report their activities, financial situation, and investment plans to shareholders, investors, and the financial markets. They serve as corporate annual reports, offering detailed financial and operational information.
In certain countries like the United States and France, regulators such as the SEC (Securities and Exchange Commission) and the AMF (Financial Markets Authority) have implemented requirements for firms to adhere to specific reporting templates. These regulations aim to promote standardization and consistency across firms' disclosures. However, in various European countries, management typically possesses more flexibility in determining what, where, and how to report financial information, resulting in a lack of standardization among financial documents published within the same market.
Although there has been some research conducted on the recognition of books and document table of contents (TOC), most of the existing work has focused on small-scale, application-dependent and domain-specific datasets. This limited scope poses challenges when dealing with a vast collection of heterogeneous documents and books, where TOCs from different domains exhibit significant variations in visual layout and style. Consequently, recognizing and extracting TOCs becomes an intricate problem. Indeed, in comparison to regular books that are typically provided in a full-text format with limited structural information such as pages and paragraphs, financial documents possess a more complex structure. They consist of various elements, including parts, sections, sub-sections, and even sub-sub-sections, incorporating both textual and non-textual content. Thus, TOC pages are not always present to help readers navigate the document, and when they are, they often only provide access to the main sections.

In this shared task, our objective is to undertake the analysis of various types of financial documents, encompassing KIID (Key Investor Information Document), Prospectus (official PDF documents where investment funds meticulously describe their characteristics and investment modalities), Réglement and Financial Annual Reports/Financial Statements (that provide a detailed overview of a company's financial performance and operations over the course of a fiscal year). These documents play a vital role in providing crucial information to investors, stakeholders, and regulatory bodies. While the content they must contain is often prescribed and regulated, their format lacks standardization, leading to a significant degree of variability. The presentation styles range from plain text format to more visually rich and data-driven graphical and tabular representations. Notably, the majority of those documents are published without a table of contents . A TOC is typically essential for readers as it enables easy navigation within the document by providing a clear outline of headers and corresponding page numbers. Additionally, TOCs serve as a valuable resource for legal teams, facilitating the verification of the inclusion of all the required contents. Consequently, the automated analysis of these documents to extract their structure is becoming increasingly useful for numerous firms worldwide.

Our primary focus for this edition is to expand the extraction of table of contents to a wider variety of financial documents, and the task will involve developing highly efficient algorithms and methodologies to address the challenges associated with such a dataset. Our aim is to achieve a level of generalization ensuring that the developed system can be applied to different types of financial documents. This broader scope allows us to explore the applicability of our methodologies across a range of financial document categories, such as KIID, Prospectus, Réglement and Financial Annual Reports/Financial Statements. This way, we want to demonstrate the versatility and effectiveness of the ML algorithms used in TOC extraction, enabling a streamlined and consistent approach across various financial document types.

In addition, for this edition, we are excited to introduce a dataset that goes beyond textual annotations. Our proposed dataset will include visual (spatial) annotations that capture the coordinates of the titles and hierarchical structure of the documents. This comprehensive approach enables a more holistic analysis and understanding of financial documents.
By incorporating visual annotations, we can capture the visual cues and design elements that contribute to the overall structure and organization of the documents. This allows us to delve deeper into the visual representation of the table of contents and extract valuable insights from the visual hierarchy present in these financial documents. The combination of textual and visual annotations provides a richer and more nuanced dataset, making it possible to increase the accuracy and effectiveness of the machine learning algorithms and methodologies employed in TOC extraction.


Thanks to the contribution of the Autonomous University of Madrid (UAM, Spain), the fifth edition of the FinTOC Shared Task welcomes a specific track for Spanish documents, continuing from the previous edition.
In this edition, systems will be scored based on their performance in both Title detection and TOC generation using more precise evaluation metrics based on visual annotations.

Participants are required to register for the Shared Task. Once registered, all participating teams will receive a common training dataset consisting of PDF documents along with the associated TOC annotations.


To participate please use the registration form below to add details about your team: https://docs.google.com/forms/d/e/1FAIpQLSdqUKy3YGho0Cw2GF__VHilHZZbR75UDG3JRBC4k0Yxw4acWg/viewform?usp=pp_url (now open as of 06/01/2023)


_____________________________________________

1st Call for papers & shared task participants: June 12, 2023
2nd Call for papers & shared task participants: July 17, 2023
Final Call for papers & shared task participants: August 17, 2023
Training set release: August 21, 2023
Blind test set release: September 21, 2023
Systems submission: October 03, 2023
Release of results: October 09, 2023
Paper submission deadline: October 18, 2023 (anywhere in the world)
Notification of paper acceptance to authors: November 01, 2023
Camera-ready of accepted papers: November 15, 2023
Workshop date (1 day event) : December 15-18, 2023 (exact date to be announced)
_____________________________________________

Contact:
For any questions on the shared task please contact us on:
[email protected]
_____________________________________________

Shared Task Organizers:
- Abderrahim Ait Azzi, 3DS Outscale (ex Fortia), France
- Sandra Bellato, 3DS Outscale (ex Fortia), France
- Blanca Carbajo Coronado, Universidad Autónoma de Madrid
- Dr Ismail El Maarouf, Imprevicible
- Dr Juyeon Kang, 3DS Outscale (ex Fortia), France
- Prof. Ana Gisbert, Universidad Autónoma de Madrid
- Prof. Antonio Moreno Sandoval, Universidad Autónoma de Madrid





Credits and Sources

[1] FinTOC 2023 : FNP-2023 Shared Task - FinTOC (Financial Document Structure Extraction)


Check other Conferences, Workshops, Seminars, and Events


OTHER MACHINE LEARNING EVENTS

ArIT 2025: 6th International Conference on Advances in Artificial Intelligence Techniques
Toronto, Canada
Jul 19, 2025
ICSIE--EI 2026: 2026 14th International Conference on Software and Information Engineering (ICSIE 2026)
Himeji, Japan
Jan 16, 2026
ICoSSE--Ei 2026: 2026 9th International Conference on Software and System Engineering (ICoSSE 2026)
Lyon, France
Apr 13, 2026
ICHCSC 2025: 4th International Conference on Human-Centric Smart Computing (ICHCSC 2025)
Jaipur, India
Oct 10, 2025
CMLA 2025: 7th International Conference on Machine Learning & Applications
Toronto, Canada
Jul 19, 2025
SHOW ALL

OTHER NATURAL LANGUAGE PROCESSING EVENTS

ICNLP 2026: 2026 The 8th International Conference on Natural Language Processing (ICNLP 2026)
Xi'an, China
Mar 20, 2026
NLPA 2025: 6th International Conference on Natural Language Processing and Applications
London, United Kingdom
Jul 26, 2025
ISJE at ICNLSP 2025: Industry Session and Job Exhibition at ICNLSP 2025
SDU, Odense, Denmark
Aug 25, 2025
LUHME 2025: 2nd Workshop on Language Understanding in the Human-Machine Era (LUHME)
ECAI 2025
Oct 26, 2025
RADH 2025: The Third International Conference on Recent Advances in Digital Humanities
Craiova, Romania
Nov 27, 2025
SHOW ALL