Imagine if you could settle/rekindle domestic arguments by asking your smart speaker when the room last got cleaned or whether the bins already got taken out?
Or — for an altogether healthier use-case — what if you could ask your speaker to keep count of reps as you do squats and bench presses? Or switch into full-on ‘personal trainer’ mode — barking orders to peddle faster as you spin cycles on a dusty old exercise bike (who needs a Peloton!).
And what if the speaker was smart enough to just know you’re eating dinner and took care of slipping on a little mood music?
Now imagine if all those activity tracking smarts were on tap without any connected cameras being plugged inside your home.
Another bit of fascinating research from researchers at Carnegie Mellon University’s Future Interfaces Group opens up these sorts of possibilities — demonstrating a novel approach to activity tracking that does not rely on cameras as the sensing tool.
Installing connected cameras inside your home is of course a horrible privacy risk. Which is why the CMU researchers set about investigating the potential of using millimeter wave (mmWave) doppler radar as a medium for detecting different types of human activity.
The challenge they needed to overcome is that while mmWave offers a “signal richness approaching that of microphones and cameras”, as they put it, data to train AI models to recognize different human activities as RF noise is not plentiful.
Not to be deterred, they set about sythensizing doppler data to feed a human activity tracking model — devising a software pipeline for training privacy-preserving activity tracking AI models.
The results can be seen in this video — where the model is shown correctly identifying a number of different activities, including cycling, clapping, waving and squats. Purely from its ability to interpret the mmWave signal the movements generate — and purely having been trained on public video data.
“We show how this cross-domain translation can be successful through a series of experimental results,” they write. “Overall, we believe our approach is an important stepping stone towards significantly reducing the burden of training such as human sensing systems, and could help bootstrap uses in human-computer interaction.”
Researcher Chris Harrison confirms the mmWave doppler radar-based sensing doesn’t work for “very subtle stuff” (like spotting different facial expressions). But he says it’s sensitive enough to detect less vigorous activity — like eating or reading a book.
The motion detection ability of doppler radar is also limited by a need for line-of-sight between the subject and the sensing hardware. (Aka: “It can’t reach around corners yet.” Which, for those concerned about future robots’ powers of human detection, will surely sound slightly reassuring.)
Detection does require special sensing hardware, of course. But things are already moving on that front: Google has been dipping its toe in already, via project Soli — adding a radar sensor to the Pixel 4, for example.
Google’s Nest Hub also integrates the same radar sense to track sleep quality.
“One of the reasons we haven’t seen more adoption of radar sensors in phones is a lack of compelling use cases (sort of a chicken and egg problem),” Harris tells TechCrunch. “Our research into radar-based activity detection helps to open more applications (e.g., smarter Siris, who know when you are eating, or making dinner, or cleaning, or working out, etc.).”
Asked whether he sees greater potential in mobile or fixed applications, Harris reckons there are interesting use-cases for both.
“I see use cases in both mobile and non mobile,” he says. “Returning to the Nest Hub… the sensor is already in the room, so why not use that to bootstrap more advanced functionality in a Google smart speaker (like rep counting your exercises).
“There are a bunch of radar sensors already used in building to detect occupancy (but now they can detect the last time the room was cleaned, for example).”
“Overall, the cost of these sensors is going to drop to a few dollars very soon (some on eBay are already around $1), so you can include them in everything,” he adds. “And as Google is showing with a product that goes in your bedroom, the threat of a ‘surveillance society’ is much less worry-some than with camera sensors.”
Startups like VergeSense are already using sensor hardware and computer vision technology to power real-time analytics of indoor space and activity for the b2b market (such as measuring office occupancy).
But even with local processing of low-resolution image data, there could still be a perception of privacy risk around the use of vision sensors — certainly in consumer environments.
Radar offers an alternative to such visual surveillance that could be a better fit for privacy-risking consumer connected devices such as ‘smart mirrors‘.
“If it is processed locally, would you put a camera in your bedroom? Bathroom? Maybe I’m prudish but I wouldn’t personally,” says Harris.
He also points to earlier research which he says underlines the value of incorporating more types of sensing hardware: “The more sensors, the longer tail of interesting applications you can support. Cameras can’t capture everything, nor do they work in the dark.”
“Cameras are pretty cheap these days, so hard to compete there, even if radar is a bit cheaper. I do believe the strongest advantage is privacy preservation,” he adds.
Of course having any sensing hardware — visual or otherwise — raises potential privacy issues.
A sensor that tells you when a child’s bedroom is occupied may be good or bad depending on who has access to the data, for example. And all sorts of human activity can generate sensitive information, depending on what’s going on. (I mean, do you really want your smart speaker to know when you’re having sex?)
So while radar-based tracking may be less invasive than some other types of sensors it doesn’t mean there are no potential privacy concerns at all.
As ever, it depends on where and how the sensing hardware is being used. Albeit, it’s hard to argue that the data radar generates is likely to be less sensitive than equivalent visual data were it to be exposed via a breach.
“Any sensor should naturally raise the question of privacy — it is a spectrum rather than a yes/no question,” agrees Harris. “Radar sensors happen to be usually rich in detail, but highly anonymizing, unlike cameras. If your doppler radar data leaked online, it’d be hard to be embarrassed about it. No one would recognize you. If cameras from inside your house leaked online, well… ”
What about the compute costs of synthesizing the training data, given the lack of immediately available doppler signal data?
“It isn’t turnkey, but there are many large video corpuses to pull from (including things like Youtube-8M),” he says. “It is orders of magnitude faster to download video data and create synthetic radar data than having to recruit people to come into your lab to capture motion data.
“One is inherently 1 hour spent for 1 hour of quality data. Whereas you can download hundreds of hours of footage pretty easily from many excellently curated video databases these days. For every hour of video, it takes us about 2 hours to process, but that is just on one desktop machine we have here in the lab. The key is that you can parallelize this, using Amazon AWS or equivalent, and process 100 videos at once, so the throughput can be extremely high.”
And while RF signal does reflect, and do so to different degrees off of different surfaces (aka “multi-path interference”), Harris says the signal reflected by the user “is by far the dominant signal”. Which means they didn’t need to model other reflections in order to get their demo model working. (Though he notes that could be done to further hone capabilities “by extracting big surfaces like walls/ceiling/floor/furniture with computer vision and adding that into the synthesis stage”.)
“The [doppler] signal is actually very high level and abstract, and so it’s not particularly hard to process in real time (much less ‘pixels’ than a camera).” he adds. “Embedded processors in cars use radar data for things like collision breaking and blind spot monitoring, and those are low end CPUs (no deep learning or anything).”
The research is being presented at the ACM CHI conference, alongside another Group project — called Pose-on-the-Go — which uses smartphone sensors to approximate the user’s full-body pose without the need for wearable sensors.
CMU researchers from the Group have also previously demonstrated a method for indoor ‘smart home’ sensing on the cheap (also without the need for cameras), as well as — last year — showing how smartphone cameras could be used to give an on-device AI assistant more contextual savvy.
In recent years they’ve also investigated using laser vibrometry and electromagnetic noise to give smart devices better environmental awareness and contextual functionality. Other interesting research out of the Group includes using conductive spray paint to turn anything into a touchscreen. And various methods to extend the interactive potential of wearables — such as by using lasers to project virtual buttons onto the arm of a device user or incorporating another wearable (a ring) into the mix.
The future of human computer interaction looks certain to be a lot more contextually savvy — even if current-gen ‘smart’ devices can still stumble on the basics and seem more than a little dumb.