David Filip: Standardization and Research

David Filip is Chair (Convener) of OASIS XLIFF OMOS TC; Secretary, Lead Editor and Liaison Officer of OASIS XLIFF TC; a former Co-Chair and Editor for the W3C ITS 2.0 Recommendation; Steering Committee member of GALA TAPICC, Advisory Editorial Board member for the Multilingual magazine; co-moderator of the Standards IG at JIAMCATT. David has been also appointed as NSAI expert to ISO TC 37/SC 3 and /SC 5, ISO/IEC JTC 1/WG 9. /SC38, and /SC42. His specialties include open standards and process metadata, workflow and meta-workflow automation. David works as a Research Fellow at the ADAPT Research Centre, Trinity College Dublin, Ireland. Before 2011, he oversaw key research and change projects for Moravia’s worldwide operations. David held research scholarships at universities in Vienna, Hamburg and Geneva, and graduated in 2004 from Brno University with a PhD in Analytic Philosophy. David also holds master’s degrees in Philosophy, Art History, Theory of Art and German Philology.

Standardization and Research

David will explain about the multilingual content standardization ecosystem, starting with foundational standards such as XML and Unicode, over XML vocabularies for payload and metadata exchange, to API and reference architecture specifications. He will explain basic standardization principles with special regard for internet based technologies, touching on different standardization cultures ranging from industry associations, over ad hoc consortia, IETF, OASIS, W3C, Unicode, to traditional SDOs such as ISO, ISO/IEC, ASTM etc. David will also touch on the relationship of standardization, research, and innovation and how it is important or not for research groups and institutes to participate in standardization. Difference between anticipatory and post hoc standardization will be explained and how royalty free standards create and grow markets for technology and innovation.

His talk takes place on Thursday, March 22, 2018 at 13:00 in room E104.

Jan Kybic: Accelerating image registration

Jan Kybic was born in Prague, Czech Republic, in 1974. He received a Mgr. (BSc.) and Ing. (MSc.) degrees with honors from the Czech Technical University, Prague, in 1996 and 1998, respectively. In 2001, he obtained the Ph.D. in biomedical image processing from Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland, for his thesis on elastic image registration using parametric deformation models. Between October 2002 and February 2003, he held a post-doc research position in INRIA, Sophia-Antipolis, France. Since 2003 he is a Senior Research Fellow with Department of Cybernetics, Faculty of Electrical Engineering, Czech Technical University in Prague and passed his habilitation (Associate Professor) in 2010 and became a full professor in 2015. He was a Vice-Dean in 2011-2013 and a Department Head in 2013-2017. Jan Kybic has authored or co-authored 31 articles in peer-reviewed international scientific journals, one book, two book chapters, and over 80 conference publications. He has supervised nine PhD students, six of them have already successfuly graduated. He has also supervised over twenty master, bachelor and short-term student projects.

He is a member of IEEE and served as an Associate Editor for IEEE Transactions on Medical Imaging and as a reviewer for numerous international journals and conferences. He was a general chair of the ISBI 2016 conference.

His research interests include signal and image processing, medical imaging, image registration, splines and wavelets, inverse problems, elastography, computer vision, numerical methods, algorithm theory and control theory.

He teaches Digital Image Processing and Medical Imaging courses.

Accelerating image registration

Image registration is one of the key image analysis tasks, especially in biomedical imaging. However, accurate image registration methods are often slow and this problem is exacerbated by the steadily increasing resolution of today’s acquisition methods. In my talk, I will present two of my relatively recent ideas, how image registration can be accelerated.

First, we take advantage of the fact that image registration is mostly driven by image edges. We take this idea to the extreme. We approximate the similarity criterion by sampling only a~small number of sparse keypoints and consider only normal displacements. Furthermore, we simplify images by segmenting them first. The segmentation can be performed jointly and alternated with the registration steps. Compared to classical image registration methods, our approach is at least one magnitude faster.

The second approach is based on matching generalized geometric graphs and is suitable for images containing linear structures with branches, such as road networks, rivers, blood vessels or neural fibers. Previously used methods could only match relatively small graphs, required good initial guess of the transformation, or could not be used for non-linear deformations. We present two methods which do not have such limitations – one is based on active testing and the second on Monte Carlo tree search, formulating the problem as a single player game. Our method can handle thousands of nodes and thus match very large images quickly. Besides several medical applications, we show for example, how to solve the localization problem by matching a~small aerial photo with a~large map.

His talk takes place on Thursday, March 1, 2018 at 13:00 in room E104.


Ondřej Bojar: Neural Machine Translation: From Basics to Semiotics

obo-2011Ondřej Bojar is an Assistant Professor at Charles University, Institute of Formal and Applied Linguistics (UFAL). Since his participation at the JHU summer engineering workshop in 2006 where the MT system Moses was released, Ondřej Bojar has been primarily active in the field of machine translation (MT), regularly taking part and later also co-organizing the WMT evaluation campaigns and contributing to the best practices of MT evaluation. Ondřej Bojar is the main author of the hybrid system Chimera which outperformed all competing systems in 2013 through 2015 (including Google Translate) in English-to-Czech translation. A variant of that system has been used in several commercial contracts of the department. Ondřej Bojar is now catching up with neural MT (NMT) and his main interest (aside from reaching again the best translation performance) lies in the study of the representations learned by the deep learning models. Is NMT learning any representations of sentence *meaning*, or is it merely a much advanced and softer variant of the copy-paste translation as performed by the previous approaches? His talk takes place on Tuesday, January 16, 2018 at 13:00 in room E105.

Neural Machine Translation: From Basics to Semiotics

In my talk, I will highlight the benefit that neural machine translation (NMT) has over previous statistical approaches to MT. I will then present the current state of the art in neural machine translation, briefly describing the current best architectures and their performance and limitations. In the second part of the talk, I will outline my planned search for correspondence between sentence meaning as traditionally studied by linguistics (or even semantics and semiotics) and the continuous representations learned by neural networks.

Vlastimil Havran: Surface reflectance in rendering algorithms

havran-bigVlastimil Havran is Associate professor at the Czech Technical University in Prague. His research interests include data structures and algorithms for rendering images and videos, visibility calculations, geometric range searching for global illumination, software architectures for rendering, applied Monte Carlo methods, data compression etc. His talk takes place on Monday, December 4, 2017 at 12:00 in room E105.

Surface reflectance in rendering algorithms

The rendering of images by computers, i.e., computationally solving a rendering equation, consists of three components: computing visibility for example by ray tracing, the interaction of light with surface and efficient Monte Carlo sampling algorithms. In this talk, we focus on various aspects of surface reflectance. That is a key issue to get high fidelity of objects’ visual appearance in the rendered images not only in the movie industry but also in real time applications of virtual and augmented reality. First, we recall the initial concepts of surface reflectance and its use in rendering equation. Then we will present our results on the surface reflectance characterization and its possible use in rendering algorithms. Further, we will show why the standard surface reflectance model usually represented as bidirectional reflectance distribution function needs to be extended spatially to achieve high fidelity of visual appearance. As this spatial extension leads to a big data problems, we will describe our algorithm for compression of spatially varying surface reflectance data. We also will describe an effective perceptually motivated method to compare two similar surface reflectance datasets, where one can be the reference data and the second one the result of its compression. As the last topic, we will describe the concepts and problems when we measure such surface reflectance datasets for real-world applications.


Themos Stafylakis: Deep Word Embeddings for Audiovisual Speech Recognition

Themos Stafylakis is a Marie Curie Research Fellow on audiovisual automatic speech recognition at the Computer Vision Laboratory of University of Nottingham (UK). He holds a PhD from Technical University of Athens (Greece) on Speaker Diarization for Broadcast News. He has a strong publication record on speaker recognition and diarization, as a result of his 5-year post-doc at CRIM (Montreal, Canada), under the supervision of Patrick Kenny. He is currently working on lip-reading and audiovisual speech recognition using deep learning methods. His talk takes place on November 22, 2017 at 13:00 in room A112.

Deep Word Embeddings for Audiovisual Speech Recognition

During the last few years, visual and audiovisual automatic speech recognition (ASR) are witnessing a renaissance, which can largely be attributed to the advent of deep learning methods. Deep architectures and learning algorithms initially proposed for audio-based ASR are combined with powerful computer vision models and are finding their way to lipreading and audiovisual ASR. In my talk, I will go through some of the most recent advances in audiovisual ASR, with emphasis on those based on deep learning. I will then present a deep architecture for visual and audiovisual ASR which attains state-of-the-art results in the challenging lipreading-in-the-wild database. Finally, I will focus on how this architecture can generalize to words unseen during training and discuss its applicability in continuous speech audiovisual ASR.

Tunç Aydın: Extracting transparent image layers for high-quality compositing

Tunç Aydın is a Research Scientist at Disney Research located at the Zürich Lab. My current research primarily focuses on image and video processing problems that address various movie production challenges, such as natural matting, green-screen keying, color grading, edge-aware filtering, and temporal coherence, among others. I have also been interested in analyzing visual content in terms of visual quality and aesthetic plausibility by utilizing knowledge of the human visual system. In my work I tend to utilize High Dynamic Range, Stereoscopic 3D, and High Frame-rate content, in addition to standard 8-bit images and videos.

Prior to joining Disney Research, I worked as a Research Associate at the Max-Planck-Institut für Informatik from 2006-2011, where I obtained my PhD degree under the supervision of Karol Myszkowski and Hans-Peter Seidel. I received the Eurographics PhD award in 2012 for my dissertation. I hold a Master’s degree in Computer Science from the College of Computing at Georgia Institute of Technology, and a Bachelor’s degree in Civil Engineering from Istanbul Teknik Universitesi. His talk takes place on Wednesday, November 1, 2017 at 13:00 in room A112.

Extracting transparent image layers for high-quality compositing

Compositing is an essential task in visual content production. For instance, a contemporary feature film production that doesn’t involve any compositing work is a rare occasion. However, achieving production-level quality often requires a significant amount of manual labor by digital compositing artists, mainly due to the limits of existing tools available for various compositing tasks. In this presentation I will talk about our recent work that aims on improving upon existing compositing technologies, where we focus on natural matting, green-screen keying, and color editing. We tackle natural matting using a novel affinity-based approach, whereas for green-screen keying and color editing we introduce a “color unmixing” framework, which we specialize individually for the two problem domains. Using these new techniques we achieve state-of-the-art results while also significantly reducing the manual interaction time.


Jakub Mareček: Urban Traffic Management – Traffic State Estimation, Signalling Games, and Traffic Control

Jakub Mareček is a research staff member at IBM Research. Together with some fabulous colleagues, Jakub develops solvers for optimisation and control problems at IBM’s Smarter Cities Technology Centre. Jakub joined IBM Research from the School of Mathematics at the University of Edinburgh in August 2012. Prior to his brief post-doc in Edinburgh, Jakub had presented an approach to general-purpose integer programming in his dissertation at the University of Nottingham and worked in two start-up companies in Brno, the Czech Republic. His talk takes place on Monday, October 16, 2017 at 13:30 in room D0207.

Urban Traffic Management: Traffic State Estimation, Signalling Games, and Traffic Control

In many engineering applications, one needs to identify a model of a non-linear system, increasingly using large volumes of heterogeneous, streamed data, and apply some form of (optimal) control. First, we illustrate why much of the classical identification and control is not applicable to problems involving time-varying populations of agents, such as in smart grids and intelligent transportations systems. Second, we use tools from robust statistics and convex optimisation to present alternative approaches to closed-loop system identification, and tools from iterated function systems to identify controllers for such systems with certain probabilistic guarantees on the performance for the individual interacting with the controller.

Marc Delcroix and Keisuke Kinoshita: NTT far-field speech processing research

Marc Delcroix is a senior research scientist at NTT Communication Science Laboratories, Kyoto, Japan. He received the M.Eng. degree from the Free University of Brussels, Brussels, Belgium, and the Ecole Centrale Paris, Paris, France, in 2003 and the Ph.D. degree from the Graduate School of Information Science and Technology, Hokkaido University, Sapporo, Japan, in 2007. His research interests include robust multi-microphone speech recognition, acoustic model adaptation, integration of speech enhancement front-end and recognition back-end, speech enhancement and speech dereverberation. He took an active part in the development of NTT robust speech recognition systems for the REVERB and the CHiME 1 and 3 challenges, that all achieved best performances on the tasks. He was one of the organizers of the REVERB challenge, 2014 and of ARU 2017. He is a visiting lecturer at the Faculty of Science and Engineering of Waseda University, Tokyo, Japan.

Keisuke Kinoshita is a senior research scientist at NTT Communication Science Laboratories, Kyoto, Japan. He received the M.Eng. degree and the Ph.D degree from Sophia University in Tokyo, Japan in 2003 and 2010 respectively. He joined NTT in 2003 and since then has been working on speech and audio signal processing. His research interests include single- and multichannel speech enhancement and robust speech recognition. He was the Chief Coordinator of REVERB challenge 2014, an organizing committee member of ASRU-2017. He was honored to receive IEICE Paper Awards (2006), ASJ Technical Development Awards (2009), ASJ Awaya Young Researcher Award (2009), Japan Audio Society Award (2010), and Maeshima Hisoka Award (2017). He is a visiting lecturer at the Faculty of Science and Engineering of Doshisha University, Tokyo, Japan.

Their talk takes place on Monday, August 28, 2017 at 13:00 in room A112.

NTT far-field speech processing research

The success of voice search applications and voice controlled device such as the Amazon echo confirms that speech is becoming a common modality to access information. Despite great recent progress in the field, it is still challenging to achieve high automatic speech recognition (ASR) performance when using microphone distant from the speakers (Far-field), because of noise, reverberation and potential interfering speakers. It is even more challenging when the target speech consists of spontaneous conversations.

At NTT, we are pursuing research on far-field speech recognition focusing on speech enhancement front-end and robust ASR back-ends towards building next generation ASR systems able to understand natural conversations. Our research achievements have been combined into ASR systems we developed for the REVERB and CHiME 3 challenges, and for meeting recognition.

In this talk, after giving a brief overview of the research activity of our group, we will introduce in more detail two of our recent research achievements. First, we will present our work on speech dereverberation using weighted prediction error (WPE) algorithm. We have recently proposed an extension to WPE to integrate deep neural network based speech modeling into the WPE framework, and demonstrate further potential performance gains for reverberant speech recognition.

Next, we will discuss our recent work on acoustic model adaptation to create ASR back-ends robust to speaker and environment variations. We have recently proposed a context adaptive neural network architecture, which is a powerful way to exploit speaker or environment information to perform rapid acoustic model adaptation.

S. Umesh: Acoustic Modelling of low-resource Indian languages

S. Umesh is a professor in the Department of Electrical Engineering at Indian Institute of Technology – Madras. His research interests are mainly in automatic speech recognition particularly in low-resource modelling and speaker normalization & adaptation. He has also been a visiting researcher at AT&T Laboratories, Cambridge University and RWTH-Aachen under the Humboldt Fellowship. He is currently leading a consortium of 12 Indian institutions to develop speech based systems in agricultural domain. His talk takes place on Tuesday, June 27, 2017 at 13:00 in room A112.

Acoustic Modelling of low-resource Indian languages

In this talk, I will present recent efforts in India to build speech-based systems in agriculture domain to provide easy access to information to about 600 million farmers. This is being developed by a consortium of 12 Indian institutions initially in 12 languages, which will then be expanded to another 12 languages. Since the usage is in extremely noisy environments such as fields, the emphasis is on high accuracy by using directed queries which elicit short phrase-like responses. Within this framework, we explored cross-lingual and multilingual acoustic modelling techniques using subspace-GMMs and phone-CAT approaches. We also extended the use of phone-CAT for phone-mapping and articulatory features extraction which were then fed to a DNN based acoustic model. Further, we explored the joint estimation of acoustic model (DNN) and articulatory feature extractors. These approaches gave significant improvement in recognition performance, when compared to building systems using data from only one language. Finally, since the speech consisted of mostly short and noisy utterances, conventional adaptation and speaker-normalization approaches could not be easily used. We investigated the use of a neural network to map filter-bank features to fMLLR/VTLN features, so that the normalization can be done at frame-level without first-pass decode, or the necessity of long utterances to estimate the transforms. Alternately, we used a teacher-student framework where the teacher trained on normalized features is used to provide “soft targets” to the student network trained on un-normalized features. In both approaches, we obtained recognition performance that is better than ivector-based normalization schemes.

Kwang In Kim: Toward Intuitive Imagery: User Friendly Manipulation and Exploration of Images and Videos

Kwang In Kim is a senior lecturer of computer science at the University of Bath. He received a BSc in computer engineering from the Dongseo University in 1996, and MSc and PhD in computer engineering from the Kyungpook National University in 1998 and 2000, respectively. He was a post-doctoral researcher at KAIST, at the Max-Planck-Institute for Biological Cybernetics, at Saarland University, and at the Max-Planck-Institute for Informatics, from 2000 to 2013. Before joining Bath, he was a lecturer at the School of Computing and Communications, Lancaster University. His research interests include machine learning, vision, graphics, and human-computer interaction. His talk takes place in Wednesday, May 10th, 2017, at 3:30pm in room E105.

Toward Intuitive Imagery: User Friendly Manipulation and Exploration of Images and Videos

With the ubiquity of image and video capture devices, it is easy to form collections of images and video. Two important questions in this context are 1) how to retain the quality of individual images and videos and 2) how to explore the resulting large collections. Unlike professionally captured photographs and videos, the quality of the imageries that are casually captured by regular users are usually low. In this talk, we will discuss manipulating and improving such images and videos in several aspects. The central theme of the talk is user-friendliness. Unlike existing sophisticated algorithms, our approaches focus on enabling non-expert users freely manipulate and improve personal imagery collections. We present two specific examples in this context: image enhancement and video object removal. Existing interfaces to these video collections are often simply lists of text-ranked videos which do not exploit the visual content relationships between videos, or other implicit relationships such as spatial or geographical relationships. In the second part of the talk, we discuss data structures and interfaces that exploit content relationships present in images and videos.

Reinhold Häb-Umbach: Neural Network Supported Acoustic Beamforming

Reinhold Häb-Umbach is a professor of Communications Engineering at the University of Paderborn, Germany. His main research interests are in the fields of statistical signal processing and pattern recognition, with applications to speech enhancement, acoustic beamforming and source separation, as well as automatic speech recognition and unsupervised learning from speech and audio. He has more than 200 scientific publications, and recently co-authored the book Robust Automatic Speech Recognition – a Bridge to Practical Applications (Academic Press, 2015). He is a fellow of the International Speech Communication Association (ISCA). His talk takes place on Monday, April 24th, at 1pm in room D0207.

Neural Network Supported Acoustic Beamforming
for Speech Enhancement and Recognition

Abstract: With multiple microphones spatial information can be exploited to extract a target signal from a noisy environment. While the theory of statistically optimum beamforming is well established the challenge lies in the estimation of the beamforming coefficients from the noisy input signal. Traditionally these coefficients are derived from an estimate of the direction-of-arrival of the target signal, while more elaborate methods estimate the power spectral density matrices (PSD) of the desired and the interfering signals, thus avoiding the assumption of an anechoic signal propagation. We have proposed to estimate these PSD matrices using spectral masks determined by a neural network. This combination of data-driven approaches with statistically optimum multi-channel filtering has delivered competitive results on the recent CHiME challenge. In this talk, we detail this approach and show that the concept is more general and can be, for example, also used for dereverberation. When used as a front-end for a speech recognition system, we further show how the neural network for spectral mask estimation can be optimized w.r.t. a word error rate related criterion in and end-to-end setup.

Jiří Matas: Tracking with Discriminative Correlation Filters

Jiří MatasJiří Matas is a full professor at the Center for Machine Perception, Czech Technical University in Prague. He holds a PhD degree from the University of Surrey, UK (1995). He has published more than 200 papers in refereed journals and conferences. Google Scholar reports about 22 000 citations to his work and an h-index 53.
He received the best paper prize at the International Conference on Document Analysis and Recognition in 2015, the Scandinavian Conference on Image Analysis 2013, Image and Vision Computing New Zealand Conference 2013, the Asian Conference on Computer Vision 2007, and at British Machine Vision Conferences in 2002 and 2005. His students received a number of awards, e.g. Best Student paper at ICDAR 2013, Google Fellowship 2013, and various “Best Thesis” prizes.
J. Matas is on the editorial board of IJCV and was the Associate Editor-in-Chief of IEEE T. PAMI. He is a member of the ERC Computer Science and Informatics panel. He has served in various roles at major international conferences, e.g. ICCV, CVPR, ICPR, NIPS, ECCV, co-chairing ECCV 2004 and CVPR 2007. He is a program co-chair for ECCV 2016.
His research interests include object recognition, text localization and recognition, image retrieval, tracking, sequential pattern recognition, invariant feature detection, and Hough Transform and RANSAC-type optimization. His talk takes place on Thursday, March 2nd, at 1pm in room E105.

Tracking with Discriminative Correlation Filters

Visual tracking is a core video processing problem with many applications, e.g. in surveillance, autonomous driving, sport analysis, augmented reality, film post-production and medical imaging.

In the talk, tracking methods based on Discriminative Correlation Filters (DCF) will be presented. DCF-based trackers are currently the top performers on most commonly used tracking benchmarks. Starting from the oldest and simplest versions of DCF trackers like MOSSE, we will progress to kernel-based and multi-channel variants including those exploiting CNN features. Finally, the Discriminative Correlation Filter with Channel and Spatial Reliability will be introduced.

Time permitting, I will briefly introduce a problem that has been so far largely ignored by the computer vision community – tracking of blurred, fast moving objects.

Video recording of the talk is publicly available.

Piotr Didyk: Perception and Personalization in Digital Content Reproduction

didykPiotr Didyk is an Independent Research Group Leader at the Cluster of Excellence on ”Multimodal Computing and Interaction” at the Saarland University (Germany), where he is heading a group on Perception, Display, and Fabrication. He is also appointed as a Senior Researcher at the Max Planck Institute for Informatics. Prior to this, he spent two years as a postdoctoral associate at Massachusetts Institute of Technology. In 2012, he obtained his PhD from the Max Planck Institute for Informatics and the Saarland University for his work on perceptual display. During his studies, he was also a visiting student at MIT. In 2008, he received his M.Sc. degree in Computer Science from the University of Wrocław (Poland). His research interests include human perception, new display technologies, image/video processing, and computational fabrication. His main focus are techniques that account for properties of the human sensory system and human interaction to improve perceived quality of the final images, videos, and 3D prints. His talk takes place on Wednesday, February 15th, 1pm in room A113.

Perception and Personalization in Digital Content Reproduction

There has been a tremendous increase in quality and number of new output devices, such as stereo and automultiscopic screens, portable and wearable displays, and 3D printers. Unfortunately, abilities of these emerging technologies outperform capabilities of methods and tools for creating content. Also, the current level of understanding of how these new technologies influence user experience is insufficient to fully exploit their advantages. In this talk, I will present our recent efforts in the context of perception-driven techniques for digital content reproduction. I will demonstrate that careful combinations of new hardware, computation, and models of human perception can lead to solutions that provide a significant increase in perceived quality. More precisely, I will discuss two techniques for overcoming limitations of 3D displays. They exploit information about gaze direction as well as the motion-parallax cue. I will also demonstrate a new design of automultiscopic screen for cinema and a prototype of a near-eye augmented reality display that supports focus cues. Next, I will show how careful rendering of frames enables continuous framerate manipulations giving artists a new tool for video manipulation. The technique can, for example, reduce temporal artifacts without sacrificing the cinematic look of a movie content. In the context of digital fabrication, I will present a perceptual model for compliance with its applications to 3D printing.

Manuel M. Oliveira: Efficient Deconvolution Techniques for Computational Photography

Manuel M. Oliveira is an Associate Professor of Computer Science at the Federal University of Rio Grande do Sul (UFRGS), in Brazil. He received his PhD from the University of North Carolina at Chapel Hill, in 2000. Before joining UFRGS in 2002, he was an Assistant Professor of Computer Science at the State University of New York at Stony Brook (2000 to 2002). In the 2009-2010 academic year, he was a Visiting Associate Professor at the MIT Media Lab. His research interests cover most aspects of computer graphics, but especially the frontiers among graphics, image processing, and vision (both human and machine). In these areas, he has contributed a variety of techniques including relief texture mapping, real-time filtering in high-dimensional spaces, efficient algorithms for Hough transform, new physiologically-based models for color perception and pupil-light reflex, and novel interactive techniques for measuring visual acuity. Dr. Oliveira was program co-chair of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games 2010 (I3D 2010), and general co-chair of ACM I3D 2009. He is an Associate Editor of IEEE TVCG and IEEE CG&A, and a member of the CIE Technical Committee TC1-89 “Enhancement of Images for Colour Defective Observers”. He received the ACM Recognition of Service Award in 2009 and in 2010. His talk will take place on Tuesday, January 31st, 1 pm in room E105.

Efficient Deconvolution Techniques for Computational Photography

Abstract: Deconvolution is a fundamental tool for many imaging applications ranging from microscopy to astronomy. In this talk, I will present efficient deconvolution techniques tailored for two important computational photography applications: estimating color and depth from a single photograph, and motion deblurring from camera shake. For the first, I will describe a coded-aperture method based on a family of masks obtained as the convolution of one “hole” with a structural component consisting of an arrangement of Dirac delta functions. We call this arrangement of delta functions the structural component of the mask, and use it to efficiently encode scene distance information. I will then show how one can design well-conditioned masks for which deconvolution can be efficiently performed by inverse filtering.  I will demonstrate the effectiveness of this approach by constructing a mask for distance coding and using it to recover color and depth information from single photographs. This lends to significant speedup, extended range, and higher depth resolution compared to previous approaches. For the second application, I will present an efficient technique for high-quality non-blind deconvolution based on the use of sparse adaptive priors. Despite its ill-posed nature, I will show how to model the non-blind deconvolution problem as a linear system, which is solved in the frequency domain. This clean formulation lends to a simple and efficient implementation, which is faster and whose results tend to have higher peak signal-to-noise ratio than previous methods.

Video recording of the talk is publicly available.

Tomáš Mikolov: Neural Networks for Natural Language Processing

mikolovTomáš Mikolov is a research scientist at Facebook AI Research since 2014. Previously he has been a member of Google Brain team, where he developed efficient algorithms for computing distributed representations of words (word2vec project). He has obtained PhD from Brno University of Technology for work on recurrent neural network based language models (RNNLM). His long term research goal is to develop intelligent machines capable of communicating with people using natural language. His talk will take place on Tuesday, January 3rd, 2017, 5pm in room E112.

Neural Networks for Natural Language Processing

Abstract: Neural networks are currently very successful in various machine learning tasks that involve natural language. In this talk, I will describe how recurrent neural network language models have been developed, as well as their most frequent applications to speech recognition and machine translation. Next, I will talk about distributed word representations, their interesting properties, and efficient ways how to compute them. Finally, I will describe our latest efforts to create novel dataset that would allow researchers to develop new types of applications that include communication with human users in natural language.

Gernot Ziegler: Data Parallelism in Computer Vision

gernot_newGernot Ziegler (Dr.Ing.) is an Austrian engineer with an MSc degree in Computer Science and Engineering from Linköping University, Sweden, and a PhD from the University of Saarbrücken, Germany. He pursued his PhD studies at the Max-Planck-Institute for Informatics in Saarbrücken, Germany, specializing in GPU algorithms for computer vision and data-parallel algorithms for spatial data structures. He then joined NVIDIA’s DevTech team, where he consulted in high performance computing and automotive computer vision on graphics hardware. In 2016, Gernot has founded his own consulting company to explore the applications of his computer vision expertise on graphics hardware in mobile consumer, industrial vision and heritage digitalization. His talk will take place on Wednesday, December 14th, 2016, 1pm in room E105.

Data Parallelism in Computer Vision

Abstract: In algorithmic design, serial data dependencies which accelerate CPU processing for computer vision are often counterproductive for the data-parallel GPU. The talk presents data structures and algorithms that enable data parallelism for connected components, line detection, feature detection, marching cubes or octree generation. We will point out the important aspects of data parallel design that will allow you to design new algorithms for GPGPU-based computer vision and image processing yourself. As food for thought, I will sketch algorithmic ideas that could lead to new collaborative results in real-time computer vision.ziegler-talk

Video recording of the talk is publicly available.

Stefan Jeschke: Recent Advances in Vector Graphics Creation and Display

Stefan Jeschke is a scientist at IST Austria. He received an M.Sc. in 2001 and a Ph.D. in 2005, both in computer science from the University of Rostock, Germany. Afterwards, he spend several years as a post doc researcher in several projects at Vienna University of Technology and Arizona State University. His research interest includes modeling and display of vectorized image representations, applications and solvers for PDEs, as well as modeling and rendering complex natural phenomena, preferably in real time. His talk will take place on Tuesday, November 8th, 2016, 1pm in room G202.

Recent Advances in Vector Graphics Creation and Display

This talk gives an overview of my recent work on vector graphics representations as semantically meaningful image descriptions, in contrast to pixel-based raster images. I will cover the problem of how to efficiently create vector graphics either from scratch or from given raster images. The goal was to support designers to produce complex, high-quality representations with only limited manual input. Furthermore, I will talk about various new developments that are mainly based on the so-called “diffusion curves”. Here the goal is to improve the expressiveness of such representations, for example, by adding textures so that natural images appear more realistic without adding excessive amounts of geometry beyond what can be handled by a designer. Rendering such representations at interactive frame rates on modern GPUs is another aspect I will cover in this talk.

Video recording of the talk is publicly available.

Tomáš Pajdla: 3D Reconstruction from Photographs and Algebraic Geometry

pajdlaTomáš Pajdla is a Distinguished Researcher at the CIIRC – Czech Institute of Informatics, Robotics and Cybernetics (ciirc.cvut.cz) and an Assistant Professor at the Faculty of Electrical Engineering (fel.cvut.cz) of the Czech Technical University in Prague. He works in geometry, algebra and optimization of computer vision and robotics, 3D reconstruction from images, and visual object recognition. He is known for his contributions to geometry of cameras, image matching, 3D reconstruction, visual localization, camera and hand-eye calibration, and algebraic methods in computer vision (Google Scholar citations). He coauthored works awarded the best paper prizes at OAGM 1998 and 2013, BMVC 2002 and ACCV 2014. His talk will take place on Wednesday, November 2nd, 2016, 1pm in room E105.

3D Reconstruction from Photographs and Algebraic Geometry

Abstract: pajdla_workWe will show a connection between the state of the art 3D reconstruction from photographs and algebraic geometry. In particular, we will show how some modern tools from computational algebraic geometry can be used to solve some classical as well as recent problems in computing camera calibration and orientation in space. We will present applications in large scale reconstruction from photographs, robotics and camera calibration.

Video recording of the talk is publicly available.

Ralf Schlüter: On the Relation between Error Measures, Statistical Modeling, and Decision Rules

RalfSchlueter_200kbRalf Schlüter studied physics at RWTH Aachen University, Germany, and Edinburgh University, Scotland. He received the Diplom degree with honors in physics in 1995 and the Dr.rer.nat. degree with honors in computer science in 2000, from RWTH Aachen University. From November 1995 to April 1996 Ralf Schlüter was with the Institute for Theoretical Physics B at RWTH Aachen, where he worked on statistical physics and stochastic simulation techniques. Since May 1996 Ralf Schlüter is with the Faculty of Mathematics, Computer Science and Natural Sciences of RWTH Aachen University, where he currently is Academic Director. He leads the automatic speech recognition group at the Human Language Technology and Pattern Recognition lab. His research interests cover speech recognition in general, discriminative training, neural networks, information theory, stochastic modeling, signal analysis, and theoretic aspects of pattern classification. His talk will take place on Tuesday, August 23rd, 2016, 10am in room A112.

On the Relation between Error Measures, Statistical Modeling, and Decision Rules

Abstract: The aim of automatic speech recognition (ASR), or more generally, pattern classification, is to minimize the expected error rate.  This requires a consistent interaction of the error measure with statistical modeling and the corresponding decision rule. Nevertheless, the error measure often is not considered consistently in ASR:

  • error measures usually are not easily tractable due to their discrete nature,
  • the quantitative relation between modeling and error measure at least analytically is unclear and usually is only exploited empirically,
  • the standard decision rule does not consider word error loss.

In this presentation, bounds on the classification error will be presented that can directly be related to acoustic and language modeling. A first analytic relation between language model perplexity and sentence error is established, and the quantitative effect of context reduction and feature omission on the error rate are derived. The corresponding error bounds were discovered and finally analytically proven within a simulation-induced framework, which will be outlined. Also, first attempts on how to design a training criterion to support the use of the standard decision rule while retaining the target of minimum word error rate are discussed. Finally, conditions will be presented under which the standard decision rule does in fact implicitly optimize word/token error rate in spite of its sentence/segment-based target.

Elmar Eisemann: Everything Counts – Rendering Highly-detailed Environments in Real-time

ElmarEisemannBWElmar Eisemann is a professor at TU Delft, heading the Computer Graphics and Visualization Group. Before he was an associated professor at Telecom ParisTech (until 2012) and a senior scientist heading a research group in the Cluster of Excellence (Saarland University / MPI Informatik) (until 2009). He studied at the École Normale Supérieure in Paris (2001-2005) and received his PhD from the University of Grenoble at INRIA Rhône-Alpes (2005-2008). He spent several research visits abroad; at the Massachusetts Institute of Technology (2003), University of Illinois Urbana-Champaign (2006), Adobe Systems Inc. (2007,2008). His interests include real-time and perceptual rendering, alternative representations, shadow algorithms, global illumination, and GPU acceleration techniques. He coauthored the book “Real-time shadows” and participated in various committees and editorial boards. He was local organizer of EGSR 2010, 2012, HPG 2012, and is paper chair of HPG 2015. His work received several distinction awards and he was honored with the Eurographics Young Researcher Award 2011. His talk will take place on Friday, May 20th, 2016, 2pm in room E105.

Everything Counts – Rendering Highly-detailed Environments in Real-time

A traditional challenge in computer graphics is the simulation of natural scenes, including complex geometric models and a realistic reproduction of physical phenomena, requiring novel theoretical insights, appropriate algorithms, and well-designed data structures. In particular, there is a need for efficient image-synthesis solutions, which is fueled by the development of modern display devices, which support 3D stereo, have high resolution and refresh rates, and deep color palettes.

In this talk, we will present methods for efficient image synthesis to address recent rendering challenges. In particular, we will focus on large-scale data sets and present novel techniques to encode highly detailed geometric information in a compact representation. Further, we will give an outlook on rendering techniques for modern display devices, as these often require very differing solutions. In particular, human perception starts to paly an increasing role and has high potential to be a key factor in future rendering solutions.

Video recording of the talk is publicly available.