Ticha-Voice Box Documentation


The ‘Voice Box’ is a musical instrument (of sorts) that receives audio input from the microphone and performs real-time pitch changes with a custom glove-controller. It can be used as both a personal listening device and a means of communication: the user has the option to either speak directly into the microphone and have their altered voice projected from the speaker, or plug in headsets and listen to the distorted noises of the world around them.

Inspiration / Critical Reflection
The project was inspired by a number of things that were not necessarily directly related to each other. Initially when I wanted to make simple piano gloves I was actually inspired by my frequent practice of tapping on tables or chairs that I developed as a result of not having ready access to a piano. In order to manifest this habit, I decided to create a portable instrument that allowed other people to listen to the sounds I hear in my head. But soon I discovered that a number of people have made instruments like these in the past, so instead of being a personal project it turned into a re-implementation of what has already been done countless times. So I decided to refocus my scope of inspiration in an effort to create something that was more novel. When I stumbled across Adafruit’s Wave Shield and Voice Changer project I immediately had my heart set on making a device that distorted voices in some way. I was initially aiming to create gloves that allowed a person to autotune their voice in real-time and make them sound like Imogen Heap, but given the limited time I had and my lack of understanding of how sound frequencies work I had to keep things relatively simple. Thus instead of a real-time autotuner, I built a real-time pitch-shifter.

The Voice Box surprisingly became a device that had some personal value as well, as its concept revolves around the difficulty to understand others and their difficulty to understand me. As I was testing the final product, I become engrossed in puppeteering other people’s voices and speaking in voices that were hardly decipherable – and it was then that I realized these gloves had created a wall between myself and society. Using these gloves turned into a very self-reflective experience, as it caused me to exhibit strange control freak behaviors and made me think about why I was able to extract so much enjoyment out of exercising power over others.

Technical Details
Electrodes are placed around the joints of each of my fingers so that whenever I bend one of them I would cause the electrodes to make contact – triggering a switch that creates the voice pitch-shifting effect. Essentially the electrodes behave like normal momentary switches, but were specifically designed to function without having to make contact with an external surface/object. This allows for an ease of use and enables user to make the more natural gestures common in playing keyboard instruments and typing.

Some technical hurdles I had to overcome: Although using electrodes seems to be a conceptually simple idea, they were surprisingly difficult to implement properly. I initially only had a pull-up resistor for each finger (to prevent short circuiting), but when I tested it out I noticed that the Arduino was not correctly interpreting the digital input data; namely, when the electrodes made contact with each other the input was read as 1’s, but when they were separated the input was just a jumbled mess of 0’s and 1’s. To overcome this issue I had to add pull-down resistors to explicitly make the ‘open’ and ‘closed’ states distinct. But however annoying the resistor handling was, I think the greatest technical hurdle I overcame was getting the pitch shifting to actually work. Adafruit’s original voice changer project uses a potentiometer to make pitch shifts, and because that is an analog input it is not possible to change your voice in real-time (running two analog inputs concurrently is beyond the capacity of an Arduino). So I theorized that while it’s not possible to dynamically change pitch using an analog input, it could technically be possible with multiple digital inputs. Luckily my theory was correct, and making things work just required some simple modifications to Adafruit’s original code.

(Sorry for not using Fritzing – there are too many parts to the device and I felt it would be much easier for me to show what’s going on with photos)

/* Code adapted from ADAVOICE, an Arduino-based voice pitch changer */


SdReader  card;  // This object holds the information for the card
FatVolume vol;   // This holds the information for the partition on the card
FatReader root;  // This holds the information for the volumes root directory
FatReader file;  // This object represent the WAV file for a pi digit or period
WaveHC    wave;  // This is the only wave (audio) object, -- we only play one at a time
#define error(msg) error_P(PSTR(msg))  // Macro allows error messages in flash memory

#define ADC_CHANNEL 0 // Microphone on Analog pin 0

// Wave shield DAC: digital pins 2, 3, 4, 5
#define DAC_CS_PORT    PORTD
#define DAC_CS         PORTD2
#define DAC_CLK        PORTD3
#define DAC_DI_PORT    PORTD
#define DAC_DI         PORTD4
#define DAC_LATCH      PORTD5

uint16_t in = 0, out = 0, xf = 0, nSamples; // Audio sample counters
uint8_t  adc_save;                          // Default ADC mode

// WaveHC didn't declare it's working buffers private or static,
// so we can be sneaky and borrow the same RAM for audio sampling!
extern uint8_t
buffer1[PLAYBUFFLEN],                   // Audio sample LSB
buffer2[PLAYBUFFLEN];                   // Audio sample MSB
#define XFADE     16                      // Number of samples for cross-fade
#define MAX_SAMPLES (PLAYBUFFLEN - XFADE) // Remaining available audio samples

// Keypad/WAV information.  Number of elements here should match the
// number of keypad rows times the number of columns, plus one:
const char *sound[] = {
  "startup" };                      // Extra item = boot sound

int button6State = 0; 
int button7State = 0; 
int button8State = 0; 
int button9State = 0; 
int button11State = 0; 

//////////////////////////////////// SETUP

void setup() {
  uint8_t i;


  // The WaveHC library normally initializes the DAC pins...but only after
  // an SD card is detected and a valid file is passed.  Need to init the
  // pins manually here so that voice FX works even without a card.
  pinMode(2, OUTPUT);    // Chip select
  pinMode(3, OUTPUT);    // Serial clock
  pinMode(4, OUTPUT);    // Serial data
  pinMode(5, OUTPUT);    // Latch
  digitalWrite(2, HIGH); // Set chip select high

  // Init SD library, show root directory.  Note that errors are displayed
  // but NOT regarded as fatal -- the program will continue with voice FX!
  if(!card.init())             SerialPrint_P("Card init. failed!");
  else if(!vol.init(card))     SerialPrint_P("No partition!");
  else if(!root.openRoot(vol)) SerialPrint_P("Couldn't open dir");
  else {
    PgmPrintln("Files found:");
    // Play startup sound (last file in array).
    playfile(sizeof(sound) / sizeof(sound[0]) - 1);

  // Optional, but may make sampling and playback a little smoother:
  // Disable Timer0 interrupt.  This means delay(), millis() etc. won't
  // work.  Comment this out if you really, really need those functions.
  TIMSK0 = 0;

  // Set up Analog-to-Digital converter:
  analogReference(EXTERNAL); // 3.3V to AREF
  adc_save = ADCSRA;         // Save ADC setting for restore later

  for(int i = 6; i < = 9; i++) {
    pinMode(i, INPUT);
  pinMode(11, INPUT);

  while(wave.isplaying); // Wait for startup sound to finish...
  startPitchShift(700);     // and start the pitch-shift mode by default.

//////////////////////////////////// LOOP

// As written here, the loop function scans a keypad to triggers sounds
// (stopping and restarting the voice effect as needed).  If all you need
// is a couple of buttons, it may be easier to tear this out and start
// over with some simple digitalRead() calls.

void loop() {
  button6State = digitalRead(6);
  button7State = digitalRead(7);
  button8State = digitalRead(8);
  button9State = digitalRead(9);
  button11State = digitalRead(11);

  if (button6State == HIGH) { //thumb   

  else if (button7State == HIGH) { //pointer   
  else if (button8State == HIGH) { //middle    
  else if (button9State == HIGH) { //ring   

  if (button11State == HIGH) { //pinky


//////////////////////////////////// HELPERS

// Open and start playing a WAV file
void playfile(int idx) {
  char filename[13];

  (void)sprintf(filename,"%s.wav", sound[idx]);
  Serial.print("File: ");

  if(!file.open(root, filename)) {
    PgmPrint("Couldn't open file ");
  if(!wave.create(file)) {
    PgmPrintln("Not a valid WAV");

//////////////////////////////////// PITCH-SHIFT CODE

void startPitchShift(int pitch) {
  // Right now the sketch just uses a fixed sound buffer length of
  // 128 samples.  It may be the case that the buffer length should
  // vary with pitch for better results...further experimentation
  // is required here.
  nSamples = 128;
  //nSamples = F_CPU / 3200 / OCR2A; // ???
  //if(nSamples > MAX_SAMPLES)      nSamples = MAX_SAMPLES;
  //else if(nSamples < (XFADE * 2)) nSamples = XFADE * 2;

  memset(buffer1, 0, nSamples + XFADE); // Clear sample buffers
  memset(buffer2, 2, nSamples + XFADE); // (set all samples to 512)

  // WaveHC library already defines a Timer1 interrupt handler.  Since we
  // want to use the stock library and not require a special fork, Timer2
  // is used for a sample-playing interrupt here.  As it's only an 8-bit
  // timer, a sizeable prescaler is used (32:1) to generate intervals
  // spanning the desired range (~4.8 KHz to ~19 KHz, or +/- 1 octave
  // from the sampling frequency).  This does limit the available number
  // of speed 'steps' in between (about 79 total), but seems enough.
  TCCR2A = _BV(WGM21) | _BV(WGM20); // Mode 7 (fast PWM), OC2 disconnected
  TCCR2B = _BV(WGM22) | _BV(CS21) | _BV(CS20);  // 32:1 prescale
  OCR2A  = map(pitch, 0, 1023,
  F_CPU / 32 / (9615 / 2),  // Lowest pitch  = -1 octave
  F_CPU / 32 / (9615 * 2)); // Highest pitch = +1 octave

  // Start up ADC in free-run mode for audio sampling:
  DIDR0 |= _BV(ADC0D);  // Disable digital input buffer on ADC0
  ADMUX  = ADC_CHANNEL; // Channel sel, right-adj, AREF to 3.3V regulator
  ADCSRB = 0;           // Free-run mode
  ADCSRA = _BV(ADEN) |  // Enable ADC
  _BV(ADSC)  |        // Start conversions
  _BV(ADATE) |        // Auto-trigger enable
  _BV(ADIE)  |        // Interrupt enable
  _BV(ADPS2) |        // 128:1 prescale...
  _BV(ADPS1) |        //  ...yields 125 KHz ADC clock...
  _BV(ADPS0);         //  ...13 cycles/conversion = ~9615 Hz

  TIMSK2 |= _BV(TOIE2); // Enable Timer2 overflow interrupt
  sei();                // Enable interrupts

void stopPitchShift() {
  ADCSRA = adc_save; // Disable ADC interrupt and allow normal use
  TIMSK2 = 0;        // Disable Timer2 Interrupt

ISR(ADC_vect, ISR_BLOCK) { // ADC conversion complete

  // Save old sample from 'in' position to xfade buffer:
  buffer1[nSamples + xf] = buffer1[in];
  buffer2[nSamples + xf] = buffer2[in];
  if(++xf >= XFADE) xf = 0;

  // Store new value in sample buffers:
  buffer1[in] = ADCL; // MUST read ADCL first!
  buffer2[in] = ADCH;
  if(++in >= nSamples) in = 0;

ISR(TIMER2_OVF_vect) { // Playback interrupt
  uint16_t s;
  uint8_t  w, inv, hi, lo, bit;
  int      o2, i2, pos;

  // Cross fade around circular buffer 'seam'.
  if((o2 = (int)out) == (i2 = (int)in)) {
    // Sample positions coincide.  Use cross-fade buffer data directly.
    pos = nSamples + xf;
    hi = (buffer2[pos] < < 2) | (buffer1[pos] >> 6); // Expand 10-bit data
    lo = (buffer1[pos] < < 2) |  buffer2[pos];       // to 12 bits
  if((o2 < i2) && (o2 > (i2 - XFADE))) {
    // Output sample is close to end of input samples.  Cross-fade to
    // avoid click.  The shift operations here assume that XFADE is 16;
    // will need adjustment if that changes.
    w   = in - out;  // Weight of sample (1-n)
    inv = XFADE - w; // Weight of xfade
    pos = nSamples + ((inv + xf) % XFADE);
    s   = ((buffer2[out] < < 8) | buffer1[out]) * w +
      ((buffer2[pos] << 8) | buffer1[pos]) * inv;
    hi = s >> 10; // Shift 14 bit result
    lo = s >> 2;  // down to 12 bits
  else if (o2 > (i2 + nSamples - XFADE)) {
    // More cross-fade condition
    w   = in + nSamples - out;
    inv = XFADE - w;
    pos = nSamples + ((inv + xf) % XFADE);
    s   = ((buffer2[out] < < 8) | buffer1[out]) * w +
      ((buffer2[pos] << 8) | buffer1[pos]) * inv;
    hi = s >> 10; // Shift 14 bit result
    lo = s >> 2;  // down to 12 bits
  else {
    // Input and output counters don't coincide -- just use sample directly.
    hi = (buffer2[out] < < 2) | (buffer1[out] >> 6); // Expand 10-bit data
    lo = (buffer1[out] < < 2) |  buffer2[out];       // to 12 bits

  // Might be possible to tweak 'hi' and 'lo' at this point to achieve
  // different voice modulations -- robot effect, etc.?

  DAC_CS_PORT &= ~_BV(DAC_CS); // Select DAC
  // Clock out 4 bits DAC config (not in loop because it's constant)
  DAC_DI_PORT  &= ~_BV(DAC_DI); // 0 = Select DAC A, unbuffered
  DAC_DI_PORT  |=  _BV(DAC_DI); // 1X gain, enable = 1
  for(bit=0x08; bit; bit>>=1) { // Clock out first 4 bits of data
    if(hi & bit) DAC_DI_PORT |=  _BV(DAC_DI);
    else         DAC_DI_PORT &= ~_BV(DAC_DI);
  for(bit=0x80; bit; bit>>=1) { // Clock out last 8 bits of data
    if(lo & bit) DAC_DI_PORT |=  _BV(DAC_DI);
    else         DAC_DI_PORT &= ~_BV(DAC_DI);
  DAC_CS_PORT    |=  _BV(DAC_CS);    // Unselect DAC

  if(++out >= nSamples) out = 0;

Comments are closed.