Branislav Mičušík is a senior scientist at the Austrian Institute of Technology. Prior to that in ’07-’09 he was a visiting research scholar at Stanford University, USA. In ’04-’07 he was a postdoctoral researcher at the Vienna University of Technology, Austria. He received his Ph.D. in ’04 from the Czech Technical University in Prague, at the Center of Machine Perception. His research interests are driven by wish to learn computers and machines to understand what they see in order to infer their own location. He is a holder of the Microsoft Visual Computing Award 2011 given to the best young scientist in Visual Computing in Austria and the Best Scientific Paper Prize at the British Machine Vision Conference in ’07. His talk takes place on Wednesday, May 27, 11am in room E104.
Calibrating Surveillance Camera Networks
Abstract: Camera systems have witnessed a huge increase in the number of installed cameras, generating a massive amount of video data. Current computer vision technologies are not fully able to exploit the visual information available in such large camera networks partially due to the lack of information about camera exact location. A manual calibration with special calibration targets, especially in ad hoc large camera networks, does not scale well with the number of cameras, is too time consuming, hence impractical. Therefore, a fully or semi automatic method with minimal user effort is an inevitable objective which should solely rely on visual information.
I present three approaches to tackle the calibration and localization problem of self-calibrating camera networks purely relying on available visual data. First, I present an approach for the calibration of cameras building on the latest achievements in Structure from Motion community. This stands for localization a camera in a priori built 3D model consisting of either points or line segments. Second, and third respectively, our approaches calibrating a single camera, and multiple surveilance cameras respectively, from detecting and tracking people will be reviewed. I show how multiple view geometry between overlapping and non-overlapping camera views with static and dynamic point correspondences gives a strong cue towards calibrating the cameras yielding practically appealing solutions.
Rafał Mantiuk is a senior lecturer (associate professor) at Bangor University (UK) and a member of a Reasearch Institute of Visual Computing. Before comming to Bangor he received his PhD from the Max-Planck-Institute for Computer Science (2006, Germany) and was a postdoctoral researcher at the University of British Columbia (Canada). He has published numerous journal and conference papers presented at ACM SIGGRAPH, Eurographics, CVPR and SPIE HVEI conferences, applied for several patents and was recognized by the Heinz Billing Award (2006). Rafal Mantiuk investigates how the knowledge of the human visual system and perception can be incorporated within computer graphics and imaging algorithms. His recent interests focus on designing imaging algorithms that adapt to human visual performance and viewing conditions in order to deliver the best images given limited resources, such as computation time or display contrast. His talk takes place on Friday, March 27 at 1pm, in room E104.
From high dynamic range to perceptual realism
Abstract: Today’s computer graphics techniques make it possible to create imagery that is hardly distinguishable from photographs. However, a photograph is clearly no match to an actual real-world scene. I argue that the next big challenge in graphics is to achieve perceptual realism by creating artificial imagery that would be hard to distinguish from reality. This requires profound changes in the entire imaging pipeline, from acquisition and rendering to display, with the strong focus on visual perception.
In this talk I will give an brief overview of several projects related to high dynamic range imaging and the applications of visual perception. Then I will discuss in more detail a project in which we explored the “dark side” of the dynamic range in order to model how people perceived images at low luminance. We use such a model to simulate the appearance of night scenes on regular displays, or to generate compensated images that reverse the changes in vision due to low luminance levels. The method can be used in games, driving simulators, or as a compensation for displays used under varying ambient light levels.
Jörn Anemüller studied Physics at the University of Oldenburg, Germany, and Information Processing and Neural Networks at King’s College, University of London, where he received the M.Sc. in 1996. He earned the Ph.D. in Physics at the University of Oldenburg in 2001with a dissertation on “Across frequency-processing in convolutive blind source separation”. From 2001 to 2004 he conducted work on biomedical signal analysis as a post-doctoral fellow at the Salk Institute for Biological Studies and at the University of California, San Diego. Since 2004 he is member of the scientific staff at the Dept. of Physics, University of Oldenburg, currently leading the statistical signal models research group. His interests include statistical signal processing and machine learning techniques with application to acoustic, speech and biomedical signals. His talk takes place on Thursday, March 26 at 1pm, in room D0206.
Machine learning approaches for estimation of a neuron’s spectro-temporal filter from non-Gaussian stimulus ensebmles
Abstract: Engineers may view an auditory neuron as an unknown but (hopefully) identifiable system that transforms the acoustic stimulus input into a binary “spike” or “no-spike” output. The linear part of the neuron’s spectro-temporal transfer function is commonly refered to as the spectro-temporal receptive field (STRF). From a machine learning perspective, this setting corresponds to the binary classification problem of discriminating spike-eliciting from non-spike-eliciting stimulus examples. The classification-based receptive field (CbRF) estimation method that we proposed recently adapts a linear large-margin classifier to optimally predict experimental stimulus-response data and subsequently interprets learned classifier weights as the neuron’s receptive field filter.
Efficacy of the CbRF method is validated with simulations and for auditory spectro-temporal receptive field estimation from experimental recordings in the auditory midbrain of Mongolian gerbils. Acoustic stimulation is performed with frequency- modulated tone complexes that mimic properties of natural stimuli, specifically non-Gaussian amplitude distribution and higher-order correlations. Results demonstrate that the proposed approach successfully identifies correct underlying STRFs, even in cases where standard second-order methods based on the spike-triggered average (STA) do not.
Applied to small data samples, the method is shown to converge on smaller amounts of experimental recordings and with lower estimation variance than the generalized linear model and recent information theoretic methods. Analysis of temporal variability of receptive fields quantifies differences between processing at different stages along the auditory pathway.
Implications for speech recognition and acoustic event detection are briefly discussed.
Filip Šroubek received the M.Sc. degree in computer science from the Czech Technical University, Prague, Czech Republic in 1998 and the Ph.D. degree in computer science from Charles University, Prague, Czech Republic in 2003. From 2004 to 2006, he was on a postdoctoral position in the Instituto de Optica, CSIC, Madrid, Spain. In 2010/2011 he received a Fulbright Visiting Scholarship at the University of California, Santa Cruz. Currently he is with the Institute of Information Theory and Automation, Academy of Sciences of the Czech Republic. His talk takes place on Tuesday, February 24 at 11am in room E105.
Advances in Image Restoration: from Theory to Practice
Abstract: We rely on images with ever growing emphasis. Our perception of the world is however limited by imperfect measuring conditions and devices used to acquire images. By image restoration, we understand mathematical procedures removing degradation from images. Two prominent topics of image restoration that has evolved considerably in the last 10 years are blind deconvolution and superresolution. Deconvolution by itself is an ill-posed inverse problem and one of the fundamental topics of image processing. The blind case, when the blur kernel is also unknown, is even more challenging and requires special optimization approaches to converge to the correct solution. Superresolution extends blind deconvolution by recovering lost spatial resolution of images. In this talk we will cover the recent advances in both topics that pave the way from theory to practice. Various real acquisition scenarios will be discussed together with proposed solutions for both blind deconvolution and superresolution and efficient numerical optimization methods, which allow fast implementation. Examples with real data will illustrate performance of the proposed solutions.
Ondřej Chum received the MSc degree in computer science from Charles University, Prague, in 2001 and the PhD degree from the Czech Technical University in Prague, in 2005. From 2005 to 2006, he was a research Fellow at the Centre for Machine Perception, Czech Technical University. From 2006 to 2007 he was a post-doc at the Visual Geometry Group, University of Oxford, UK. Recently, he is now an associate professor back at the Centre for Machine Perception. His research interests include object recognition, large-scale image and particular-object retrieval, invariant feature detection, and RANSAC-type optimization. He has coorganized the “25 years of RANSAC” Workshop in conjunction with CVPR 2006, Computer Vision Winter Workshop 2006, and Vision and Sports Summer School (VS3) in Prague 2012 and 2014. He was the recipient of the runner up award for the “2012 Outstanding Young Researcher in Image & Vision Computing” by the Journal of Image and Vision Computing for researchers within seven years of their PhD, and the Best Paper Prize at the British Machine Vision Conference in 2002. In 2013, he was awarded ERC-CZ grant. His talk takes place on Wednesday, January 28 at 3pm in room E104.
Visual Retrieval with Geometric Constraint
Abstract: In the talk, I will address the topic of image retrieval. In particular, I will focus on retrieval methods based on bag of words image representation that exploit geometric constrains. Novel formulations of image retrieval problem will be discussed, showing that the classical ranking of images based on similarity addresses only one of possible user requirements. Retrieval methods efficiently solving the new formulations by exploiting geometric constraints will be used in different scenarios. These include online browsing of image collections, image analysis based on large collections of photographs, or model construction.
For online browsing, I will show queries that try to answer question such as: “What is this?” (zoom in at a detail), “Where is that?” (zoom-out to larger visual context), or “What is to the left / right of this?”. For image analysis, two novel problems straddling the boundary between image retrieval and data mining are formulated: for every pixel in the query image, (i) find the database image with the maximum resolution depicting the pixel and (ii) find the frequency with which it is photographed in detail.
Jaroslav Křivánek is a researcher, developer, and associate professor of Computer Science at the Faculty of Mathematics and Physics of Charles in University Prague. Prior to this appointment, he was a Marie Curie research fellow at Cornell University, and a junior researcher and assistant professor at Czech Technical University in Prague. Jaroslav received his Ph.D. from IRISA/INRIA Rennes and Czech Technical University (joint degree) in 2005. His primary research interests are global illumination, radiative transport (including light transport), Monte Carlo methods, and visual perception, with the goal of developing novel practical ways of producing realistic, predictive renderings of virtual models. The technologies he has co-developed are used, among others, by Weta Digital, PIXAR Animation Studios, or Sony Pictures Imageworks. He is currently working on new software, Corona Renderer, with the goal to challenge the status-quo in rendering technology used for visualizations in architecture and industrial design. In 2014, Jaroslav was selected for the New Europe 100 list, “a list of outstanding challengers who are leading world-class innovation from Central and Eastern Europe for taking computer graphics to the next level”. His talk takes place on Monday, December 8, D0206 at 1pm.
Light Transport Simulation in the ArchViz and Visual Effect industries
Abstract: Research and practice of computer graphics is witnessing a renewed interest in realistic rendering based on robust and efficient light transport simulation using Monte Carlo and other statistical methods. This research effort is propelled by the desire to accurately render general environments with complex geometry, materials and light sources,
which is often difficult with the industry-standard ad hoc solutions. For this reason, the movie and archiviz industries are shifting away from approximate rendering solutions towards physically-based rendering methods, which poses new challenges in terms of strict requirements on high image quality and algorithm robustness.
In this talk, I will summarize some of my contributions in the area of realistic rendering using physically-based light transport simulation. I will start by reviewing the path integral formulation of light transport which is at the basis of the vast majority of recent advances in this area. I will then review our Vertex Connection and Merging algorithm, along with its recent extension to rendering participating media, which aims at robust handling of light transport in scenes with complex, specular materials. This algorithm has gained a very favourable reception by the research community as well as the industry; Within two year from its publication, it has been adopted by numerous major companies in the field, such as Weta, PIXAR or Chaos Group. In the next part of my talk, I will review our recent and ongoing work on light transport simulation in scenes with complex visibility, which remain an open challenge both for architectural visualizations as well as for the movie industry.
Dr. Daniel Ramos finished his PhD in 2007 in Universidad Autonoma de Madrid (UAM), Spain. From 2011, he is an Associate Professor at the UAM. He is a member of the ATVS – Biometric Recognition Group. During his career, he has visited several research laboratories and institutions around the world, including the Institute of Scientific Police at the University of Lausanne (Switzerland), the School of Mathematics at the University of Edinburgh (Scotland), the Electrical Engineering School at the University of Stellenbosch (South Africa), and more recently the Netherlands Forensic Institute, where he has co-organized a workshop on the scientific validation of evidence evaluation methods. His research interests are focused on forensic evaluation of the evidence using Bayesian techniques, validation of forensic evaluation methods, speaker and language recognition, biometric systems and, more generally, signal processing and pattern recognition.
Dr. Ramos is actively involved in several projects focused on different aspects of forensic science, such as yearly R&D contracts with Spanish Guardia Civil or the Management Committee of the EU COST 1106 Action on Forensic Biometrics. He has also participated in several international competitive evaluations of speaker and language recognition technology, such as NIST Speaker Recognition Evaluations since 2004, the Forensic Speaker Recognition Evaluation NFI/TNO 2003 and the NIST Language Recognition Evaluation since 2007. Dr. Ramos has received several distinctions and awards, he is regularly a member of scientific committees in different international conferences, and he is often invited to give talks in conferences and institutions, for instance as a keynote speaker at the European Academy of Forensic Sciences Conference 2012 (EAFS 2012) in The Hague, The Netherlands. His talk takes place on Wednesday, December 3, E105 at 10am.
Meeting Forensic Science Requirements with Automatic Speaker Recognition Systems
Abstract: This talk aims at describing the requirements of forensic science that are critical for the use automatic speaker recognition systems in forensic applications. We first introduce the current context of forensic science in Europe at different levels. Then, we describe how to meet those requirements with score-based automatic speaker recognition systems, with focus on two main forensic scenarios: investigative and evaluative. Investigative applications include mainly database search and distributed speech acquisition and sharing, and we give some examples of systems and projects in this context. Evaluative applications include interpretation and reporting of the output of the system to court, where a likelihood ratio approach is the preferred recommendation in Europe, and where effective communication to court is of paramount importance. We will address key concepts like the evaluation of evidence; the expression of conclusions; the validation of forensic interpretation methods; the accreditation of laboratories by appropriate standards; and the current efforts for converging to common procedures in Europe.
Daniel Sýkora is an Assistant Professor at the Czech Technical University in Prague. His main research interest is strongly coupled with his long-standing passion for hand-drawn animation. He developed numerous techniques which allow to eliminate repetitive and time consuming tasks while still preserve full creative freedom of manual work. To turn these research ideas into practical products Daniel intensively cooperates with studio Anifilm in Prague as well as renowned industrial partners such as Disney, Adobe, or TVPaint Development. His talk takes place on November 19, E104 at 2pm.
Adding Depth to Hand-drawn Images
Abstract: Recovering depth from a single image remains an open problem after decades of active research. In this talk we focus on a specific variant of the problem where the input image is hand-crafted line drawing. As opposed to previous attempts to provide complete 3D reconstruction either by imposing various geometric constraints or using sketch-based interfaces to produce a full 3D model incrementally, we seek for a specific kind of bas-relief approximation which is less complex to create while still sufficient for many important tasks that can arise in 2D pipelines. It enables to maintaining correct visibility and connectivity of individual parts during interactive shape manipulation, deformable image registration, and fragment composition. In the context of image enhancement it helps to improve perception of depth, generate 3D-like shading or even global illumination effects, and allows to produce stereoscopic imagery as well as source for 3D printing.
Josef Kittler is professor of Machine Intelligence at the Centre for Vision, Speech and Signal Processing, University of Surrey. He received his BA, PhD and DSc degrees from the University of Cambridge. He teaches and conducts research in the subject area of Signal Processing and Machine Intelligence, with a focus on Machine Learning, Biometrics, Video and Image Database retrieval, Automatic Inspection, Medical Data Analysis, and Cognitive Vision. He published a Prentice Hall textbook on Pattern Recognition: A Statistical Approach and several edited volumes, as well as more than 700 scientific papers, including in excess of 200 journal papers. He serves on the Editorial Board of several scientific journals in Pattern Recognition and Computer Vision. He became Series Editor of Springer Lecture Notes on Computer Science in 2004. He served as President of the International Association for Pattern Recognition 1994-1996. He was elected Fellow of the Royal Academy of Engineering in 2000. In 2006 he was awarded the KS Fu Prize from the International Association for Pattern Recognition, for outstanding contributions to pattern recognition. He received Honorary Doctorates from the University of Lappeenranta in 1999 and the Czech Technical University in Prague in 2007. In 2008 he was awarded the IET Faraday Medal and in 2009 he became EURASIP Fellow. His talk takes place on November 4, E104 at 2pm.
3D Assisted 2D Face Recognition
Abstract: 3D Morphable Face Models (3DMM) have been used in face recognition for some time now. They can be applied in their own right as a basis for 3D face recognition and analysis involving 3D face data. However their prevalent use over the last decade has been as a versatile tool designed to assist 2D face recognition in many different ways. For instance, 3DMM can be used for pose, illumination and expression normalisation of 2D face images. It has the generative capacity to augment the training and test databases for various 2D face processing related tasks. It can expand the gallery set for pose invariant face matching. For any 2D face image it can furnish complementary information, in terms of its 3D shape and texture. It can also aid multiple frame fusion by providing the means of registering a set of 2D images.
A key enabling technology for this versatility is 3D face model to 2D face image fitting. The recent developments in 3D model to 2D image fitting will be discussed. They include the use of symmetry to improve the accuracy of illumination estimation, multistage close form fitting to accelerate the fitting process, modifying the imaging model to cope with 2D images of low resolution, and building illumination free 3DMM. These various enhancements will be overviewed and their merit demonstrated on a number of face analysis related problems in the context of 2D face recognition.
Karol Myszkowski is a senior researcher in the Computer Graphics Group of the Max-Planck-Institut für Informatik. In the past, he served as an Associate Professor at the University of Aizu, Japan. He also worked as a Research Associate and then Assistant Professor at Szczecin University of Technology. His research interests include perception issues in graphics, high-dynamic range imaging, global illumination, rendering, and animation. His talk takes place on Thursday, October 9, 10:00, E104.
Perceptual Display: Towards Reducing Gaps Between Real World and Displayed Scenes
Abstract: The human visual system (HVS) has its own limitations (e.g., the quality of eye optics, the luminance range that can be simultaneously perceived, and so on), which to certain extent reduce the requirements imposed on display devices. Still a significant deficit of reproducible contrast, brightness, spatial pixel resolution, and depth ranges can be observed, which fall short with respect to the HVS capabilities. Moreover, unfortunate interactions between technological and biological aspects create new problems, which are unknown for real-world observation conditions.
In this talk, we are aiming at the exploitation of perceptual effects to enhance apparent image qualities. At first, we show how the perceived image contrast and brightness can be improved by exploiting the Cornsweet and glare illusions. Then, we present techniques for hold-type blur reduction, which is inherent for LCD displays. Also, we investigate apparent resolution enhancements, which enable showing image details beyond the physical pixel resolution of the display device. Finally, we discuss the problem of perceived depth enhancement in stereovision, as well as comfortable handling of specular effects, film grain, and video cuts.
Video recording of the talk is publicly available.
Sanjeev Khudanpur received the B.Tech. degree in Electrical Engineering from the Indian Institute of Technology, Bombay, in 1988, and the Ph.D. degree in Electrical and Computer Engineering from the University of Maryland, College Park, in 1997. His doctoral dissertation was supervised by Prof. Prakash Narayan and was titled Model Selection and Universal Data Compression. Since 1996, he has been on the faculty of the Johns Hopkins University. Until June 2001, he was an Associate Research Scientist in the Center for Language and Speech Processing and, from July 2001 to June 2008, an Assistant Professor in the Department of Electrical and Computer Engineering and the Department of Computer Science. He became an Associate Professor in July 2008. He is also affiliated with the Johns Hopkins University Human Language Technology Center of Excellence. In Fall 2000, he held a visiting appointment in the Institute for Mathematics and its Applications (IMA), University of Minnesota, Minneapolis, MN. He organized two IMA workshops on the role of mathematics in multimedia – “Mathematical Foundations of Speech Processing and Recognition,” and “Mathematical Foundations of Natural Language Modeling.” The talk of Sanjeev Khudanpur takes place on Friday, July 4, 1pm, at E104.
Statistical Language Modeling Turns Thirty-Something: Are We Ready To
Abstract: It has been 14 years since Roni Rosenfeld described “Two Decades of Statistical Language Modeling: Where Do We Go From Here?” in a special issue of the Proceedings of the IEEE (August 2000). Perhaps it is time to review what we have learnt in the years since? This lecture will begin with what was well known in 2000 — n-grams, decision tree language models, syntactic language models, maximum entropy (log-linear) models, latent semantic analysis and dynamic adaptation — and then move on to discuss new techniques that have emerged since, such as models with sparse priors, nonparametric Bayesian methods (including Dirichlet processes), and models based on neural networks, including feed-forward, recurrent and deep belief networks. Rather than just a survey, the main goal of the lecture will be to expose the core mathematical and statistical problems in language modeling, and to explain how various competing methods address these issues. It will be argued that the key to solving what appears at first blush to be a hopelessly high-dimensional, sparse-data estimation problem is to structure the model (family) and to guide the choice of parameter values using linguistic knowledge. It is hoped that viewing the core issues in this manner will enable the audience to gain a deeper understanding of the strengths and weaknesses of various approaches. And, no, we are not yet ready to settle down yet. But we now know what we are looking for: it varies from application to application. To each his own!
Hynek Hermansky is the Julian S. Smith Professor of the Electrical Engineering and the Director of Centre for Language and Speech Processing at the Johns Hopkins University in Baltimore, Maryland. His main research interests are in bio-inspired speech processing. He has been working in speech research for over 30 years, previously as a Director of Research at the IDIAP Research Institute, Martigny and a Titular Professor at the Swiss Federal Institute of Technology in Lausanne, Switzerland, a Professor and Director of the Center for Information Processing at OHSU Portland, Oregon, a Senior Member of Research Staff at U S WEST Advanced Technologies in Boulder, Colorado, a Research Engineer at Panasonic Technologies in Santa Barbara, California, a Research Fellow at the University of Tokyo, and an Assistant Professor at the Brno University of Technology, Czech Republic. He is a Fellow of IEEE for “Invention and development of perceptually-based speech processing methods”, and a Fellow of International Speech Communication Association for “Pioneering bio-inspired approaches to processing of speech”. He is the holder of the 2013 International Speech Communication Association Medal for Scientific Achievement, is a Member of The Board of the International Speech Communication Association, and a Member of the Editorial Board of Speech Communication. He was the General Chair of the 2013 ICASSP Workshop on Automatic Speech Recognition and Understanding, a Member of the Organizing Committee at the 2011 ICASSP in Prague, Technical Chair at the 1998 ICASSP in Seattle and an Associate Editor for IEEE Transaction on Speech and Audio. He holds 10 US patents and authored or co-authored over 200 papers in reviewed journals and conference proceedings. His speech processing techniques such as Perceptual Linear Prediction, RASTA spectral filtering, multi-stream speech information processing or data-driven discriminative Tandem technique are widely used in research laboratories worldwide as well as in industrial applications. Prof. Hermansky holds Dr.Eng. degree from the University of Tokyo, and Dipl. Ing. degree from Brno University of Technology, Czech Republic. His talk takes place on Thursday, July 3, 1pm, at E104.
My Adventures with Speech
Abstract: I intend to mention some techniques I got involved in during the past 40 years. I will not dwell too much on details of the techniques. These are documented in various publications. Rather, I will try to talk about things which we, researchers, may say in private but seldom write about: about personal intuitions and beliefs, about excitements, frustrations, surprises, and interesting encounters on the road, while struggling to understand and emulate one of the most significant achievements of human race, the ability to communicate by speech.
Brian A. Barsky is Professor of Computer Science and Affiliate Professor of Optometry and Vision Science at University of California, Berkeley. His research interests include computer aided geometric design and modeling, interactive 3D computer graphics, computer aided cornea modeling and visualization, medical imaging, and virtual environments for surgical simulation. His talk takes place on Thursday, June 12, 3pm, at E104.
The BLUR Project at Berkeley
Abstract: The multidisciplinary BLUR project at UC Berkeley combines computer graphics with optics, optometry, and photography. This research investigates mathematical models to describe the shape of the cornea and algorithms for cornea measurement, scientific and medical visualization for the display of cornea shape, mathematics and algorithms for the design and fabrication of contact lenses, simulation of vision using actual patient data measured by wavefront aberrometry, photo-realistic rendering algorithms for generating imagery with optically-correct depth of field, view camera simulation. This talk will present an overview of rendering algorithms for simulating depth of field found in photographs and of vision-realistic rendering algorithms for simulating a subject’s vision. Recent work on vision correcting displays will also be briefly introduced.
Alex Wilkie will kindly share his deep knowledge of predictive rendering. Alex is well known rigorous researcher, and a very good speaker. Do not miss the chance to learn about his recent advances in realistic rendering, presented at SIGGRAPH, EUROGRAPHICS, or EGSR. His talk takes place on Monday, May 19, 2pm, at E105.
Predictive Rendering – The Other Type of Realistic Computer Graphics
Abstract: This talk has two parts: in the first, we first discuss the basic differences between mainstream computer graphics, and genuinely predictive image synthesis. In the second part, we give a brief overview of the application domains predictive rendering is useful for, the technological state of the art in this field, and the main research directions that are currently being investigated. This includes the specific topics that our group in Prague is working on now, and which directions will probably be upcoming research areas in the near term future.