ECE434 Project - Mandroid
Team members: Dylan Turner
A humanoid talking robot head.
The head listens to you with a microphone, formulates a response, and then replies using speech synthesis while moving its mouth.
Currently, I have PWM working on the beaglebone along with servo control (with a circuit for a high-torque servo), a rudimentary speech synthesizer, speech recognition, the mouth, and a rudimentary chat bot all working!
An example of a much more complex version of what I'm aiming for can be found here. At a minimum, this robot will respond to some speech input, move its mouth a bit, and output some speech output. If I can't build a complex chat bot in time, that's okay with me. I plan to focus on the other three parts more than anything else.
Everything is placed inside the mask and uses wooden dowels glued together with hot glue to hold it with support.
The base is connected to a thick wooden dowel which has a second platform on its top. On this secondary platform rests the BeagleBone black, and on top of the black is a tiny breadboard with a circuit to control the servo.
The servo is attached to the support structure within the mask whereas all other components are attached to the primary wooden dowel and simply *covered* by the mask. The mask itself (with servo) is completely removable to allow tinkering with the internals.
Obviously, there's a beaglebone
The dowels and wooden base are used to make a framework for holding up the mask as well as holding the beaglebone. A T-shaped structure created from thin dowels attaches at the mask's ears and nose, and the motor rests under their joint, attached with hot-glue.
The T-shaped dowel structure is attached to a servo at the angle. I used a 5V 20kg servo because of the weight. This central dowel is connected to the jaw of the head, which has been cut where it meets the chin and on the sides, so the mouth can open more easily.
There is a small circuit to provide a large current to the servo without voltage drops (as the servo can be driven with a 3.3V signal which can be provided by the Beaglebone, but requires 4.8-6.8V with a decent current at its source). This circuit is dead simple, and it's described in the fritzing diagram below. The transistors used are NPN B547.
Attached to the central pole is a downward facing Rock-band microphone and an outward facing mini-speaker. The microphone is a USB mic, and the speaker attaches to a 3.5mm cable which thin goes into a USB adapter. Then both USB devices are plugged into a hub which hangs down from the beagle bone.
To see how it physically attaches, view the images in the above section.
First, make sure you have Python 3.7
If you don't, then you can install it with:
sudo apt install libpython3.7-dev
If it gets deprecated, you can update the versions in the Makefile (all instances of 3.7 become 3.x or whatever), or build it from source using the instructions here.
Afterwards, you're ready to install the program.
Installing from Package
THIS IS THE BETA RELEASE AND IS NOT COMPLETE YET!
wget https://github.com/blueOkiris/man-droid/releases/download/beta-1.0/mandroid-arm.deb sudo dpkg -i mandroid.deb
Then to make easier to access (i.e. run it by calling "mandroid" in a terminal):
ln -s /opt/mandroid/mandroid /usr/bin/mandroid
And finally, to startup on boot:
sudo systemctl enable mandroid
Building from Source
Here's the installation commands for installing the Mandroid software:
sudo apt install -y libsdl2-dev libsdl2-mixer-dev python3-pyaudio pybind11-dev flac pip3 install PyAudio pip3 install SpeechRecognition git clone https://github.com/blueOkiris/python-duckduckgo cd python-duckduckgo sudo python3 setup.py install cd .. git clone https://github.com/blueOkiris/man-droid cd man-droid make sudo make install
Here's the explanation:
- Install Dependencies:
- SDL2_mixer is required for speech synthesis: `libsdl2-dev libsdl2-mixer-dev`
- The Python pip libraries `PyAudio` and `SpeechRecognition` is required for speech recognition. It relies on: `python3-pyaudio`
- The python speech recognition library is called in C++ using pybind: `pybind11-dev`
- Flac for audio input
- Download custom duckduckgo library (for search)
- Go into the directory
- Install it
- Leave the directory
- Download main project from git
- Go into the project folder
- Build it with make
- Install system service for running at start
When you do
sudo make install
after building, the program should autostart upon reboot.
If you opt-out of that, you can start it with start it with:
cd <Location of Repo> ./mandroid
Once running, you can talk to the robot and it will respond.
Currently, you can only say "bye" or "goodbye" to end the program and "tell me about `x`" to get web info on `x`
Here is where you brag about what your project can do.
Include a YouTube demo the audio description.
Theory of Operation
At the top level, there is the Mandroid object (Brain.hpp/.cpp). It is an abstract class in C++ and has instances of two other abstract classes: a SpeechRecognizer as its ears (Listen.hpp/.cpp) and a SpeechSynthesizer as its mouth (Speech.hpp/.cpp).
The created instance of a Mandroid is currently a child called IfElseBot (Brain.hpp/.cpp). This implementation is based on if and if-else statements, the most barebones way to program a chat bot. Another implementation could utilize a natural language processing library or machine learning to be more "real," but as it stands, the only implementation is the IfElseBot.
As an implementation of the Mandroid class, the IfElseBot utilizes a SpeechRecognizer and a SpeechSynthesizer. The specific children of these abstract classes utilized by the IfElseBot are a PythonSpeechRecognizer (Listen.hpp/.cpp) which calls a python function from the C++ code to process language into an std::string and a ClipBasedSpeechSynthesizer (Speech.hpp/.cpp) which loads audio clips and pieces them together to produce sound.
The ClipBasedSpeechSynthesizer also makes use of a Servo (Servo.hpp/.cpp) to physically move a mouth. This Servo makes a system call to launch a python program that initializes the PWM pin (for some reason it was the only way to make it work). It then uses the sysfs interface to control the duty cycle driven into the physical servo.
Back to the top-level, now that it can speak and hear, the IfElseBot is able to process speech and produce a result. One of its operations also makes use of a python library that grabs information about a topic from Duck Duck Go. The other operation exits the program.
Different implementations of speech are possible as long as they produce methods for producing sound from IPA and converting English text to IPA. Different implementations for listening are possible as long as they have a listen method for producing a string represent heard speech. Different implementations of the Brain are possible as long as they have a respond function which produces a boolean for if the program should quit or not.
There is also a set of test functions (Tests.hpp) which go through the various functionalities.
It should be noted that I've made an attempt to never let the program crash, and to keep retrying or simply move on if something fails.
In cases of failure or dysfunction, when installed a service, the program outputs to /var/mandroid.log.
Speaking of the service it installs the program and necessary files to /opt/mandroid/ and creates a symlink to /usr/bin/mandroid.
As the only team member, I did all of the work.
The project can be broken up into four main sections with subsections.
- Servo/Head control
- PWM Control
- Building structure to support jaw (T-structure)
- Attaching motor to jaw and to support structure
- Speech Recognition
- Link to Python speech library
- Attach USB microphone
- Speech Synthesis
- Recording sound files
- IPA map to sound files
- Synthesis object with instance of Servo
- Attach speaker through USB
- Brain (Chat-bot)
- Tie it all together
- Process inputs and produce sentences in IPA as response
- Create main dowel "tower" to place brain on top and rest mask over parts
The main areas of improvement are improving the sound of the speech synthesis and the brain's complexity.
The biggest improvement is in the chat bot, giving it natural language processing and more commands it can do.
Afterwards, you'd have to improve the speech synthesis. It sounds like a speak and spell, and the dictionary of known pronunciations is small, so it has to guess pronunciations a lot.
Both of these are based on abstract classes, so both additions could integrate well with the system.
Beyond that, the power circuitry can be improved so there's less power cords, a USB wifi adapter would make it simpler to initialize, the hardware could be better hidden, the physical structure could be more robust, and more motion could be added to the face like moving eyes and multiple "muscles" for better facial movement.
Also, it may be possible to power the beaglebone from the same power source as the servo.
Give some concluding thoughts about the project. Suggest some future additions that could make it even more interesting.