The ability to identify pedestrians unobtrusively is essential for smart buildings to provide customized environments, energy saving, health monitoring and security-enhanced services. In this paper, we present an unobtrusive pedestrian identification system by passively listening to people’s walking sounds. The proposed acoustic system can be easily integrated with the widely deployed voice assistant devices while providing the context awareness ability. This work focuses on two major tasks. Firstly, we address the challenge of recognizing footstep sounds in complex indoor scenarios by exploiting deep learning and the advanced stereo recording technology that is available on most voice assistant devices. We develop a Convolutional Neural Network-based algorithm and the footstep sound-oriented signal processing schemes to identify users by their footstep sounds accurately. Secondly, we design a “live” footstep detection approach to defend against replay attacks. By deriving the novel inter-footstep and intra-footstep characteristics, we distinguish live footstep sounds from the machine speaker’s replay sounds based on their spatial variances. The system is evaluated under normal scenarios, traditional replay attacks and the advanced replays, which are designed to forge footstep sounds both acoustically and spatially. Extensive experiments show that our system identifies people with up to 94.9% accuracy in one footstep and shields 100% traditional replay attacks and up to 99% advanced replay attacks.