Kucza, N., Porrmann, F., Stollenwerk, C., & Hagemeyer, J. (2025). Efficient Edge AI for Next Generation Smart Mirror Applications. IEEE Access, 1-1. https://doi.org/10.1109/ACCESS.2025.3574492

A natural user interaction interface is required to enhance living environments to be fully smart, which would imply features like user-specific, individual interaction as well as awareness of context, situation, and environment. One already introduced example of such an interaction interface using only web service-based speech recognition is voice assistants, such as Amazon Alexa. A widely used visual solution is a smart mirror, consisting of a semi-transparent mirror underlay with a display and a computing device. In most cases, it only shows information next to the mirror image and does not feature an interactive interaction with the user. Hence, it does not utilize the potential of locally processed artificial intelligence. This publication presents the enhancement of a smart mirror with locally processed artificial intelligence, ensuring low-latency interaction as well non-exposure of private data. For a pleasant user experience, the image-based recognition of objects, gestures, and faces in combination with voice commands via speech recognition is used. With these features, the user can be identified, interact with the system, and receive personalized information or assistance. Due to the modular software architecture using ROS2 nodes, different embedded hardware modules and accelerators can be utilized. Possible configurations for such hardware and accelerator systems as well as optimization to increase energy efficiency are described and evaluated. Low power consumption, and therefore high energy-efficiency, is necessary for the application as it is intended to be integrated into private homes. Optimizations using pruning are evaluated and show improvements in energy per processed frame, required calculations per frame (50% reduction) , and accuracy (increased for some classes by by 15 %). Currently the best solution regarding efficiency and optimized DNN models achieves 30 FPS for object, gesture, and face detection models simultaneously while consuming around 38Watts, resulting in 1.3 mJ per frame.

Download Here