Sound and Voice Solutions

Animo Wave Base
Acoustic and Vibration Analysis

AnimoWaveBase is a vibration and acoustic analysis solution for preventive maintenance of industrial and infrastructure facilities. It makes use of our expertise in signal processing to detect signs of failure, contributing to prevent operation halts as part of an IoT/M2M solution.

Speech Synthesis

FineSpeech is Japan’s most sold speech synthesis software, running in a large variety of CPU and operating systems. Ranging from a small footprint version that still keeps a high level of clearness to a high-end version that sounds as natural as human speech, FineSpeech can be customized to meet your needs. FineSpeech is used in various scenes such as in automatic reading for deployment of information related to disaster prevention, safety, traffic, e-books, smartphones, car navigation systems, and also dialog systems for interaction between humans and robots.

Speaker Authentication

VoicePassport is a biometric authentication technology based on voice. This technology permits authenticating a person from his or her voice. We offer two authentication methods: the keyword method in which a previously defined keyword is used, and the free-word method in which the authentication is based on free utterances. The authentication provides secure access to personal databases, logging into e-learning systems, telephone banking, and other telephony services.

Speech signal processing

VoiceBase is a speech digital signal processing library. We offer in this library most of the basic speech signal processing functions that are necessary for handling speech in information systems such as analysis, compression, synthesis, noise reduction, speed change, equalizing, etc.


VoiceTagging is a speech and speaker recognition technology which recognizes the keyword as it was spoken and is speaker-dependent. The keyword can be mapped to execute a function directly such as reading e-mails, etc.
Key advantage of this function is a language-independent.

Sound Hoppping

SoundHopping technology makes it possible to embedded information in sound.
Information broadcast needs only a sound source and a speaker. Only a smartphone is needed for the detection. Information redundancy is used to achieve a high level of robustness against noise.


Call center

VoiceTracking Server
Voice logging

This voice logging solution permits precise and fast sharing of all the contents and nuance in the customer’s precious voice. Our actual implementation cases include 450 companies totalling 50,000 seats. VoiceTracking includes special functions such as “sound tags” on the logged voice, fast and slow playback using speaking rate change technology to facilitate listening, real-time monitoring, and chasing playback.

VoiceTracking KeywordFinder
Speech recognition/search

Using word-spotting speech recognition technology, relevant words such as prohibited terms and marketing-related words in large logged data or live telephone calls can be retrieved and visualized, resulting in shorter monitoring time and lower cost, improving compliance management efficiency KeywordFinder is offered in two versions: the server version, suitable to existing data files, and the client version, suitable to streaming conversations.

Speech analysis

We perform automatic analysis of large logged telephone speech data including dialog analysis, keyword search, speaking rate detection, voice stress detection, nodding count analysis, etc. Quantitative evaluation of speech makes it possible to evaluate call center agents in an objective way. It may also help improving business or perform market-oriented analysis (VOC – voice of customer).

VOC Analyzer
Voice mining

We analyze and quantify personal character information from the customer features included in the call recording data. Personal analysis analyzes gender, age, weight, and stature in character analysis to estimate behavior (positive and negative), thinking (introverted and diplomatic) information. We are able to automate our customer classification, effectively implement measures such as reducing marketing costs, channel consideration for customer attributes, and matching operator compatibility.

DTMF recognition
Push-tone input

This software accurately recognizes DTMF signals (also known as “tone” or “push” signals). It permits using push-type telephone lines to input numbers. The software is used for user ID and password input in IVR systems and voice portal systems, for primary service routing of incoming calls received by call centers and help desks, etc.



Snoring Check “ZooZii” – Patented technology –

A cloud-based service that consists of recording and analyzing snoring sounds at your bedside using a smartphone. This service is base on Animo’s proprietary sound analysis technology. It can also be used as a tool to record your physical and daily life conditions.
Supervision: Prof. Kimitaka Kaga, Prof. Emeritus at the University of Tokyo.

Aphasia rehabilitation

Based on the global structuration theory, this software helps restoring impaired linguistic abilities through the simultaneous use of multiple senses such as audition, vision, and tactile sense. We support promoting effective rehabilitation in hospitals or at home with a clear narration, beautiful pictures, and pleasant excercises.
Supervision: Prof. K. Yonemoto, Prof. Emeritus at the Jikei University School of Medicine

SUGI SpeechAnalyzer for medical care
Acoustics/utterance and language acquisition test

This tool enables visualization of speech waveform, pitch, spectrum, etc. It is widely used in basic research on speech language, otorhynolaryngology, acoustic analysis for dentistry, and analysis of utterances/language acquisition in the treatment of language and auditory problems.
Supervision: Miyoko Sugito, Director of the Institute for Speech Communication Research

Larynx Scan
Laryngeal disease screening

This system aims at achieving early detection of vocal fold nodules vocal fold polyps, and laryngeal cancer through the analysis of speech data. We perform automatic analysis through the Internet of speech data recorded from a PC (screening test). Thus, doctors may focus their attention on more suspicious cases, saving the patient’s time and money.

Joint development: Prof. Hiroyuki Fukuda, Director of Tokyo Voice Center, International University of Health and Welfare