Rapid progress in computer vision now allows for the possibility of automation of tasks such as video surveillance and airport security. While machines have proved effective at identifying objects using cameras and other sensors (such as using lidar technology), they struggle to integrate smoothly into the real, disorderly 3D world of a street or household.

This requires fine perceptual capabilities that come to humans instinctively. Frying an egg, for instance, requires the interaction with a disorderly set of objects in three dimensions, and proves extremely challenging for a machine.

In an attempt to address this shortcoming, researchers at Duke University set about developing new technology to help robots ‘make sense’ of 3D objects at a glimpse. Humans are able to glance at a new object and instinctively identify it, its orientation, and location with respect to other objects.

PhD student Ben Burchfiel worked with colleagues at the university to develop a robot perception algorithm, able to understand objects without examining them from different angles, even when partially obscured.

The team trained their algorithm using a dataset of thousands of complete 3D scans of common objects a household robot may need to interact with, such as toilets, tables and chairs. The algorithm ‘learned’ to categorise objects by identifying their characteristics. This meant that, when an object is glimpsed, the algorithm narrows down the list of possible items by identifying shared properties, such as the legs of chairs and tables, then finding a best match. This reduces the processing time to approximately a second.

To test the method, the researchers ran more than 900 examples of 10 different 3D objects viewed just from above.

“Overall, we make a mistake a little less than 25 per cent of the time, and the best alternative makes a mistake almost half the time, so it is a big improvement,” said Burchfiel. “But it still isn’t ready to move into your house. You don’t want it putting a pillow in the dishwasher.”