Digitizing Facial Movement During Singing
1 Digitizing Facial Movement During Singing : Original Concept
A machine that uses flex sensors to detect movement and change in facial muscles/mechanics during phonation, specifically singing. Sensors can then be reverse engineered for output. Why? Because fascination with the physics of sound, the origin of the phonetic alphabet (Phoenician/Greek war tool later adapted by the Romans), and the mechanics of the voice (much facial movement/recognition research at the moment leans in the direction of expression rather than sound generation).
Found two exceptional pieces on not just the muscles of the face but the muscular and structural mechanics of speech AND two solid journal articles about digitizing facial tracking. After reading the better part of The Mechanics of the Human Voice, and being inspired by the Physiology of Phonation chapter, we decided to develop a system of sensors to log the muscle position and contraction of singers during different pitches in an attempt to funnel the data into an audio output that translates the sound. For example, is there a particular and common muscle contraction/extension that occurs during high C? We save that, then while a different non-singing user contracts in the same way the computer recognizes that and plays the corresponding note.
2 Digitizing Facial Movement During Singing : Prototype
We decided to use a three-part apparatus for tracking facial movements during simple singing: a barometric pressure sensor to monitor airflow, electromyography (EMGs) to monitor extrinsic facial muscles and a camera to track facial movement. While this data feeds in we will be manually recording fluctuations at different frequencies. Specially, the aim is to record changes in pressure, muscle contractions and position for each note in one octave—comparing to the face’s resting state. Ideally, data mining will be performed on at least five professional vocalists.
The barometric pressure sensor will be placed in front of the text subject, discretely mounted to a tiny stand. The EMGs will be placed (via medical adhesive) to the extrinsic muscles of the larynx involved in articulation; relevant muscles include the orbicularis oris (lips), geniohyoid (tongue and lower jaw), mylohyoid (mandible), masseter (cheek and jaw) and the hyoglossus (tongue). And finally, facial movement and geometry will be monitored using an external camera.
Luckily we found an Instructables DIY EMG guide.
3 Digitizing Facial Movement During Singing : Sensor Assembly
Pressure Sensor Diagram:
DIY Electromyography Circuit Diagram & EMG Map:
FaceOSC/Syphon/Processing Source Code:
import codeanticode.syphon.*; import oscP5.*;
PGraphics canvas; SyphonClient client; OscP5 oscP5; Face face = new Face();
public void setup() { size(640, 480, P3D); println(“Available Syphon servers:”); println(SyphonClient.listServers()); client = new SyphonClient(this, “FaceOSC”); oscP5 = new OscP5(this, 8338); }
public void draw() { background(255); if(client.available()) { canvas = client.getGraphics(canvas); image(canvas, 0, 0, width, height); } print(face.toString()); } }
void oscEvent(OscMessage m) { face.parseOSC(m); }
+ LINK to the (ORIGINAL) Face class.
4 Digitizing Facial Movement During Singing : Geometry Troubleshooting
We ran tests on the FaceOSC/Syphon/Processing facial geometry detector and were able to mine the following data (seen in the above screen capture):
pose
scale: 3.8911576
position: [ 331.74725, 153.13004, 0.0 ]
orientation: [ 0.107623726, -0.06095604, 0.085640974 ]
gesture
mouth: 14.871553 4.777506
eye: 2.649438 2.6013117
eyebrow: 7.41446 7.520543
jaw: 24.912415
nostrils: 5.7812777
Using the on-camera feed to track geometric displacement in the face (relating to muscle movement), only the data in the “gesture” category really applies—categories with two values are recording x and y fluctuation. Fortunately, scale, position and orientation do not bias the gesture readings. Unfortunately, there is a fair bit of natural fluctuation in the readings. We’ve gathered that fluctuations of over 0.5 are actual movement while those under 0.5 are natural oscillations in the camera’s reading.
Digitizing Facial Movement During Singing : Sensor Build
We had a bit of trouble getting just the right stuff (one of our circuit chips was not available for order and the BMP085 barometric pressure sensor arrived without a breakout board) but worked on building out the EMG to -> Arduino circuitry:
As well as writing preliminary code to read the two physical sensor systems (EMG & BMP085) and print muscle contractions and changes in air pressure and temperature. Much of the pressure print out code is taken from Jim Lindblom’s BMP085 Barometric Pressure Sensor Quickstart post on Sparkfun and much of the muscle contraction print out code is derived from Brian Kaminski’s USB Biofeedback Game Controller project guide on Instructables. We’ve refined the muscle sensors from five down to one, and with that, the muscle sensing will occupy analog pin 0 while the pressure/temperature sensor will occupy pins 4 and 5. Additionally we struggled with finding the Reference, Mid and End points of the facial muscle as well as the ideal muscle to use—as we had never worked with EMGs before.
Those issues established, build for the pressure sensor is super minimal and does not require any extensive set up before we receive the breakout board. Also, after experience with the sensitivity of camera-based facial geometry tracking and research into the size, flexion/extension, and general active proficiency of the sternocleidomastoid muscle (located in the neck) we decided to make that our single muscle to electromyographically track. The sternocleidomastoid muscle contributes quite a bit by way of posture and air flow during the singing process.
Sternocleidomastoid Muscle:
5 Digitizing Facial Movement During Singing : Interface Development
With all that established, it was time to work on building an interface that could help us to track all elements of the sensing system (pressure, electromyography, and geometry), compile these with audio frequency/pitch, display them in an easy to understand manner and then save everything to a data file for reference. We came up with a system of sliders (data is mapped to the tested highest and lowest points of each input so that sliders sitting at the top of the interface are reading at zero fluctuation and sliders which have reached the bottom are extended to their full potential). Actual number values are printed to the console and saved to a local data fie for reference. Geometry (mouth height/width, jaw protrusion) are measured in centimeters (cm), air pressure in pascals (Pa) and muscle fluctuation is to be measure by an unaltered analog reading. Additionally, the interface display and logs the frequency/pitch (using the minim library’s Fast Fourier Transform class and a popular frequency to MIDI formula).
Current monitoring interface and datafile screencap:
6 Digitizing Facial Movement During Singing : Data Collection and Reverse Engineering
While building the interface we had to keep in mind that we would soon be reverse engineering our findings into a “singing machine” and that our code should begin to reflect that. Our solution to solved two late-game issues: what is the form and application of this singing machine and what does it look like? Our testing interface of sliders became the perfect jumping off point to a visual interface that resembling a soundboard. Once we have reviewed enough data to know which configurations of mouth height/width, jaw protrusion, sternocleidomastoid fluctuation and air pressure produce each of the twelve tones and semitones in an octave we can program Ableton to output those notes upon a non-singing user’s input. The user can then view their biomechanical data through a familiar sound-generating interface: a soundboard.
While that is still a long term goal (of personal fun and artistic exploration), what we ended up with is a singing evaluation interface; a virtual vocal coach. A user opens their computer, turns on their camera, attaches an EMG/pressure sensor and sings while the interface visualizes a data stream of biomechanical information relevant to performance: