Our experience over the world is multimodal, e.g., we see objects, hear sounds, smell odors. More precisely, we sense the environment through a multimodal and complex system. In order to build intelligent perceiving systems that analyze information acquired by sensors with different physical properties, multimodal analysis systems require powerful tools in terms of signal processing, machine learning and data mining. The implications cover a wide range of domains, from audio-visual speech recognition to multimodal human-computer interfaces and media description. Under the framework of 2020 43rd International Conference on Telecommunications and Signal Processing (TSP) held during July 7-9, 2020 in Milan, Italy, the aim of this special session is to foster the use of multimodal machine learning and data analysis tools among the signal processing community, to bridge the gap between researchers from academia and industry professionals in AI-related fields and identify possible collaborations. |