This work proposes a media sound-based authentication method to protect smartphone notification privacy unobtrusively, which wisely hides or presents sensitive content by verifying who is hold- ing the phone. We show that media sounds, such as the melodies of notification tones (e.g., iPhone message and Samsung whistle) can be directly used to sense and verify the user’s gripping hand. Because sounds and vibrations co-exist, we capture two novel re- sponses via the smartphone mic and accelerometer to describe how the individual’s contacting palm interferes with the signals in two different domains. Based on the two responses, we develop a convolutional neural network-based algorithm to verify the user. Moreover, because the smartphone sensors are all embedded on the same motherboard, we develop a cross-domain method to validate such hard-to-forge physical relationships among the mic, speaker and accelerometer. They prevent external sounds from cheating the system. Additionally, we consider the notification vibration as a special type of media sound, which also results in two responses, and extend our method to work in the silent mode. Extensive exper- iments with ten notification tones and four phone models show that our system verifies users with 95% accuracy and prevents replay sounds with 100% accuracy.