If you're a regular reader of this column, you know that I'm enthusiastic about the potential of "embedded vision" – the widespread, practical use of computer vision in embedded systems, mobile devices, PCs, and the cloud. Processors and sensors with sufficient performance for sophisticated computer vision are now available at price, size, and power consumption levels appropriate for many markets, including cost-sensitive consumer products and energy-sipping portable devices. This is ushering in an era of machines that "see and understand".
But while hardware has advanced rapidly, developing robust algorithmic solutions remains a vexing challenge for many vision applications. In real-world situations, reliably extracting meaning from pixels is often difficult, due to the diversity of scenes and imaging conditions that may be presented to an image sensor. For example, a vision-based automotive driver assistance system may be tasked with distinguishing pedestrians from other objects with similar geometry and coloration, such as road signs, utility poles, and trees.
For humans, distinguishing between pedestrians and other objects is virtually effortless, so it's natural to assume that this is a straightforward task for machines as well. But when you begin to contemplate the variations in how people dress, how they move, where they may be situated relative to the vehicle, lighting conditions, and so on, creating an algorithm to reliably detect pedestrians can be daunting. Similar challenges are found in many types of vision applications. These challenges often result in the creation of very complex, multi-layered algorithms that examine images for features, group features into objects, and then classify objects based on complex rules. These algorithms may include several alternative ways to perform a task depending on conditions. (For example, is the vehicle stopped or moving? Is it day or night?) Developing and validating such algorithms can be extremely challenging.
Is there a better way? Perhaps. Some of the most sophisticated image-understanding systems deployed today rely on machine learning. In some cases, machine learning systems dispense with procedural techniques for recognizing objects and situations, and instead provide a framework that enables a system to be trained through examples. So, rather than trying to describe in exhaustive detail how to tell the difference between a pedestrian and a tree under a wide range of conditions, a machine learning approach might endow a system with the ability to learn (and to generalize) through examples, and then train the system by showing it numerous examples—allowing the system to figure out for itself what visually distinguishes a pedestrian from other kinds of objects.
I believe that the potential of machine learning in vision applications is vast. Just as a skilled physician learns through long experience to quickly recognize certain illnesses from a brief examination of a patient, vision systems may soon be learning to recognize many kinds of things by being trained (rather than "told" how) to do so.
Machine learning isn’t a new idea. But it’s become a very hot field lately, and the pace of progress seems to be accelerating. I am particularly excited about the potential for machine vision to enable better solutions to challenging visual understanding problems. And that's why I’m thrilled that one of the giants of machine learning for computer vision, Yann LeCun, will be the morning keynote speaker at the Embedded Vision Summit West, a conference I'm organizing that will take place on May 29th in Santa Clara, California. Yann is a professor at New York University and also recently joined Facebook as its Director of Artificial Intelligence. Yann's talk is titled "Convolutional Networks: Unleashing the Potential of Machine Learning for Robust Perception Systems," and it will be one of the highlights of a full day of high-quality, insightful educational presentations. The Summit will also feature over thirty demonstrations of leading-edge embedded vision technology, and opportunities to interact with experts in embedded vision applications, algorithms, tools, processors and sensors.
If you're involved in, or interested in learning about, incorporating visual intelligence into your products, I invite you to join us at the Embedded Vision Summit West on May 29th in Santa Clara. Space is limited, so please register now.
Jeff Bier is president of BDTI and founder of the Embedded Vision Alliance. Please post a comment here or send him your feedback at http://www.BDTI.com/Contact.
Add new comment