So this will be an informal blog of sorts, detailing my research towards my PhD in Art and Computational Technology. Updates will appear as and when, but will hopefully provide you with an understanding of what i’m doing, and how i’m doing it!
EDIT: Ok, so it turns out i’m not so great at updating this as I do things, however I will continue to post, they’ll just be sporadic!
First Term Report:
THE COMBINATION OF BRAIN-COMPUTER INTERFACE WITH AUGMENTED REALITY: FURTHERING THE KNOWLEDGE BASE FOR PERCEPTION, ACTION AND THE REALISED FORM IN RHYTHMIC MUSICAL EVENTS
By combining a brain-computer interface (BCI) with an augmented reality (AR) device there is potential to further the understanding between perception, action and the realised form of user-set musical rhythms, through contextualising rhythmic materials in relation to their perception and cognition. Working with AR affords the opportunity of a highly controllable sensorial environment previously unattainable in prior BCMI experiments, which could be exploited to increase selected accompanying visual stimulus, as well as, isolate and remove any that are deemed undesirable. In order to facilitate the classification of signal data from the user’s brain activity as acquired from the electroencephalogram (EEG), machine learning (and specifically deep-learning methods) will be explored.
The use of EEG in music is nothing new, Alvin Lucier composed ‘Music for Solo Performer’ in 1965 (Lucier 1976)1, a piece utilising direct sonification of EEG data to create sound. In 1990, 25 years later David Rosenboom hypothesised that it might be possible to detect parts of musical experience in EEG components, such as the performer’s selective attention (Rosenboom 1990)2. However in the last ten years the affordability of technological components required, and the advancements in the field of cognitive science and computer-science a renewed interest in BCI technology has emerged. As barriers of entry have fallen slightly, commercially available headsets such as NeuroSky’s EEG Biosensor or Emotiv’s EPOC headset have appeared, affording the opportunity for more users from broader backgrounds to experiment. Coupled with the affordable emerging VR/AR technologies, human-computer interfaces are facing significant change, becoming more encompassing and responsive to our needs and desires as users.
Methodology demanding the user deliberately train their brain activity voluntarily is considered ‘active’ or ‘hard’ BCMI (George 2014)3 whereas no deliberate input on behalf on the user is considered ‘passive’ BCMI. Traditionally the former is used as it affords the greater resolution of control often desired for musical output. Two main approaches to controlling EEG data are ‘conscious effort’ and ‘operant conditioning’ (Miranda et al 2011)4, ‘conscious effort’ induced EEG change by specific cognitive tasks such as motor, auditory or visual imagery. ‘Operant conditioning involves the presentation of a task in conjunction with some form of feedback, which allows the user to develop unconscious control of the EEG.’(Miranda et al 2014)5
Occurring in both of these approaches is the paradigm of event-related potentials (ERPS), which are ‘small changes in the electrical activity of the brain that are recorded from the scalp… brought about by some external or internal event.’(Otten et al 1997)6 A common ERP in BCI study is the P300 where the user evokes a positive potential limited at 8Hz around 300ms after a stimulus. However as the potential is difficult to identify amongst ordinary ‘background’ EEG activity, averaging is required across a number of trials (Palaniappan 2014).7 Therefore single-trial classification of P300 is not possible, although by applying a moving average method continuous interactions can assign a value rather than binary statement (Grierson 2014)8. Consequently the temporal limitations involved in detecting the potential render this approach unsuitable without additional context, such as dynamic attending theory (Drake, Jones, & Baruh, 2000; Jones & Boltz, 1989)9.
‘The main premise is that expectancy levels can be manipulated by presenting temporal patterns varying in regularity, thus making impending events more or less predictable. This is also reflected in computational models of rhythm processing, such as the coupled oscillator model presented by Large and colleagues (Large & Jones, 1999; Large & Kolen, 1995)10, in which the percept of a rhythm is built up out of multiple oscillators with different period lengths, related to different hierarchical levels of the rhythm.‘ (Schaefer et al 2011)11
Combined with a machine-learning paradigm (logistic regression algorithm) to classify accented from non-accented beats in auditory-imagery by investigating the EEG yielded an average accuracy of 70% for perception and 61% for imagery data (Vlek et al 2011).12
An alternative method is motor-imagery. Motor-imagery uses event-related synchronisation (ERS) detection; found on in the corresponding hemisphere of the motor cortex to that of the side of the body the user imagines to move and event- related de-synchronisation (ERD) which occurs on the opposite side of the body. Both of which ‘generally occur in mu (~8-12Hz) and beta (~13-20Hz) frequency ranges.’(Palaniappan 2014)13 This can be monitored via EEG on C3 and C4 locations above the motor cortex and has been used (albeit with an accuracy around 60%) to determine the selected attention of the user to auditory stimuli (Hill et al).14
Additionally within the premotor cortex sensorimotor rhythm occurs (13-15Hz), as well as sensorimotor synchronization (SMS). Which is a form of ‘referential behavior (Pressing, 1999)15 in which an action is temporally coordinated with a predictable external event, the referent. Usually, the term ‘SMS’ refers to a situation in which both the action and the referent are periodic, so that the predictability of the referent arises from its regular recurrence. SMS thus can be said to involve the temporal coordination of a motor rhythm with an external rhythm.’(Repp p969)16 Properties of auditory short-latency gamma band (20-60Hz) activity (GBA) have also shown evidence for a relationship with perception and attention to temporally structure auditory sequences, supporting ‘active expectancy-based processing, (Large 1999)17 and demonstrate the potential usefulness of short-latency auditory neuroelectric responses for studying processing of rapid temporal patterns such as ongoing speech and music.’(Synder & Large 2005)18
To provide greater homogeneity between pre and post-realised rhythmic idea the BCI will be dual layered; classifying and cross-referencing between a perception-based evoked response and a conscious decision based event. The latter would either be the auditory-imagery approach as described above (Vlek et al 2011)19 or motor-imagery used for determining the user’s selected auditory attention. Both options could be accompanied with a visual stimulus to elicit a more defined response. This would be decided by the results of a trial study with multiple participants. Each layer would be classified using a machine-learning paradigm and converged to improve single trial results.
The visual stimulus and graphical user interface (GUI) will be displayed in an oculus rift virtual reality (VR) headset. Rather than using the headset as a VR tool, digital cameras will be mounted on the front of the headset to create an augmented reality (AR) platform. Choosing an AR platform rather than full VR should negate some of the difficulty and mediate computational effort required to provide sufficient matching and self-representation in regards to immersion when interacting with the interface.
“Immersion is a description of a technology, and describes the extent to which the computer displays are capable of delivering an inclusive, extensive, surrounding and vivid illusion of reality to the senses of a human participant.”(Slater & Wilbur 1997)20
Additionally the visual cues required for the accompanying stimulus could come from manipulating the environment, rather than explicit manifestations.
Given the success of deep-belief networks such as convolutional restricted Boltzmann machines for (CRBM) ‘unsupervised feature learning for audio classification using convolutional deep belief networks’ (Lee et al 2011)21 the modulated rhythmic sequences would be presented hierarchically, with those most relevant to the rhythm (based on training provided by previous user selection cross-referenced with the thematic content of the rhythm) presented more prominently. If selected this would then become the next target rhythm and the cycle would begin again.
The user would be presented an initial short rhythm, which was either pre-existing or generated by the user by plotting points/creating intervals on a horizontal timeline. The software would then playback the audio sequence, whilst the BCI measures evoked responses during playback. These events will then be classified and used as anchor points for modulating the rhythm. The rhythmic content will then be developed thematically (relative to a set music theory or generative music theory), in an iterative process, using the anchor points as guide. Finally the various rhythms are presented in order of most relevant to the user, based on classification data.
The resulting interface should help users develop exciting new rhythms and thematic events relative to their perception of their formative musical material. The biggest challenge will be maintaining accurate readings with commercially available EEG systems, as they have limited placements and work at a considerably lower resolution compared to the lab equipment used in cognitive-science research.
5. CONCLUSIONS AND FUTURE WORK
To conclude the study could uncover further knowledge of how ours brain process auditory events, and how the corresponding data can be harnessed for effective human-computer interaction, notably in creative practices. Given the emergent trend of utilising biometric user data as a means for tailoring relevant end-user experiences, the convergence of BCI and AR technologies in a single interface will demonstrate the likely future of commercially available computational interfaces and provide previously un-thought considerations for such devices.
Can BCIs be used successful to aid composition in musicians?
Can VR/AR headsets provide a sufficient sensory input to stimulate the user effectively?
Will the convergence of technologies successfully demonstrate the potential and usability of integrated BCI interfaces in the future?
1 Lucier, A. (1976) Statement on: music for solo performer. In Rosenboom D (ed) Biofeedback and the arts, results of early experiments. Vancover, Canada
2Rosenboom D.(1990) Extended musical interface with the human nervous 2Rosenboom D.(1990) Extended musical interface with the human nervous system. Berkeley, USA
3 George L. and Lécuyer A, (2014) Passive Brain-Computer Interfaces, Guide to Brain-Computer Music Interfacing, E R Miranda, J Castet, , p298,
4 Miranda E, Magee WL, Wilson JJ, Eaton J, Palaniappan R (2011) Brain-Computer music interfacing: from basic research to the real world of special needs, p134- 140
5. Miranda E, Castet J, (2014) Brain-Computer Music Interfacing: Interdisciplinary Research at the Crossroads of Music, Science, and Biomedical Engineering, Guide to Brain-Computer Music Interfacing, , p4
6 L J Otten and M D. Rugg, (1997) (see Cole & Rugg, 1995; Kutas & Dale, 1997) Interpreting Event-Related Brain Potentials, p3
7 Palaniappan R, (2014),Electroencephalogram-based Brain-Computer interface: An introduction, Guide to Brain-Computer Music Interfacing, p35
8 Grierson M, Chris Kiefer (2014), Contemporary Approaches to Music BCI Using P300 Event Related Potentials Music Interfacing, Guide to Brain-Computer Music Interfacing, p46
9 Drake, C., Jones, M. R., & Baruch, C. (2000). The development of rhythmic attending in auditory sequences: attunement, referent period, focal attending. Cognition, 77, p251–288
Jones, M. R., & Boltz, M. (1989). Dynamic attending and responses to time, Psychological Review, 96(3), p459–491
10 Large, E. W., & Jones, M. R. (1999). The dynamics of attending: How people track time-varying events. Psychological Review, 106(1), 119–159.
11 Schaefer RS , Vlke, R J, Desain P, (2011) Decomposing rhythm processing: electroencephalography of perceived and self-imposed rhythmic patterns. Psychological Research 75, p95
12, 17Vlek R. J, Schaefer R. S., Gielen C.C.A.M, Farquhar J.D, Desain P, (2011) Shared mechanisms in perception and imagery of auditory accents, Clincial Neurophysiology 122, p1526
13 Palaniappan R, (2014) Electroencephalogram-based Brain-Computer interface: An introduction, Guide to Brain-Computer Music Interfacing, E R Miranda, J Castet, p34
14 Hill N J, Lal T N, Schröder M, Hinterberger T, Birbaumer N, Schölkopf B. (NO DATE) Selective Attention to Auditory Stimuli: A Brain-Computer Interface Paradigm,
15 Pressing, J. (1999). The referential dynamics of cognition and action. Psychological Review, 106, 714-747
16 Repp B, (2005) Sensorimotor Synchronization: A review of the tapping literature, Psychonomic Bulletin & Review, p969,
17,Large E W, Jones M R (1999) The dynamics of attending: how people track, E.W. Time-varying events, Psychol. Rev. 106,p119-159
18 Snyder J S, Large E W, (2005) Gamma-band activity reflects the metric structure of rhythmic tone sequences, Cognitive Brain Research 24, p126,
20 Slater M, Wilbur S, (1997), A framework for immersive virtual environments (FIVE): Speculations on the role of presense in virtual environments. Presence Teleoperators and Virtual Enironments 6: 603-616, Being there p20
21 Lee H, Grosse R, Ranganath R, and Ng A Y, (2011), Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks, Communications of the ACM, vol. 54, no. 10, pp. 95-103, 2011
Combining sensorial data with real-time analysis: an exploration of consciousness through the manipulation of expectation in composition.
By studying behaviour, sensory experience and human computer interaction I am to develop new methods of realising sonic events. Replacing traditional static computational interfaces with device that is immersive and responsive.
The act of composition is an extraordinarily complex process. It recalls both semantic and episodic memory, combined in the declarative form, as well as utilising procedural memory to realise one’s goals. The interplay between declarative and procedural memory could be succinctly referred to as intuition – the ability to pick the most appropriate tool or device for the intended result. From a neurological standpoint it has been documented that:
“Modulation in dorsal prefrontal cortex… activation may correspond to the conscious selection of isolated movements, without awareness of the reason for choosing that particular movement. In other words, a person may guess where the next target will appear and prepare to respond accordingly, but may not realize that their intuition is based on developing sequence knowledge. If so, such activation may represent the first step in the construction of a recognizable sequence. There is no doubt that the antecedents of declarative knowledge are acquired during procedural sequence learning: given enough repetitions, normal subjects become aware of the sequence.” 1
Musicologically this occurs no more prevalently or fundamentally than with rhythm.
“To perceive a rhythm means to relate one sound with other sounds by means of duration ratios and thereby to learn and to recognise explicit rhythm categories.”234
The perception of rhythmic events and their salient spectromorphological attributes can be used to ascertain an expectation of succeeding events. Little is known regarding the relationship between cognition and the mutually inclusive interactions of behavioural and sensorial input when referenced to working memory that specifically pertain to the compositional outcome. This could be due to the mechanical nature of devices used.
Although current sound-generating devices available are often streamlined to be intuitive, this is only the manner in which they present the stored data of compositional value that they contain; e.g. a note contained on a piano key would be analogous music data – the chromatic presentation of data being proportionate to the chromatic rise (or fall) to the corresponding pitch – one key up, one semitone up in the 12 tone scale. This typically requires both extensive procedural memory and prior knowledge in order to be used effectively. In practice this can be construed as a barrier to entry affecting the ability to articulate oneself effectively. By changing the fundamental basis of how devices operate and respond to input, with the intention of having an intuitive interface for the desired cognitive use, there is potential to provide a superior tool.
Through simultaneous study of behaviour and the sensorial environment in conjunction with the statistical properties derived from the use of the device and its output, and subsequent manipulation and re-use of the device, I aim to deliver evidence to improve efficiency in the delivery of sound by generating devices. This will also provide a basis for understanding ours cognitive functions in relation to our conscious decision making, allowing the possibility of correlating what we know in terms of semantic and working memory with how we realise musical concepts and expectations of succeeding sensorial events.
“Expectations are neural circuits whose activation depends on the pattern of sensory inputs. In practice, these energized circuits facilitate certain motor behaviours and/or cause attention to be directed in particular ways. Motor behaviours may be facilitated directly or may be facilitated indirectly by evoking feeling states that act as motivators.”5
Once I have collected and analysed data regarding sensory and behavioural responses to a range of sonic materials, I will construct a computational framework that will manipulate the realtime input to align it with the most expected event as dictated by response data. I believe this will result in a cleaner process of generating sound with less signal loss or interference caused by inaccurate or insignificant articulation of the user. This will be demonstrated through the medium of interactive sonic art installations and performances focusing on different materials and their properties, providing a body of work examining the correlation between the sonic aesthetic of the material and its cognitive function.
The extensive use of technology would be vital to accomplish this task. The use of physical computing and experimentation with human computer interaction will provide techniques that capture previously unobserved behaviour and should result in an enhanced dynamic environmental model to test procedural actions and intuition. In order to process and manipulate such quantities of sensorial and behaviour data, generative programming and algorhythmic synthesis will be utilised to structure input in a coherent fashion, providing a dynamic framework in which generative structures will form possible outcomes, resulting in a more adaptive device. Retrospectively this data will then be analysed to assess the correlation between specific sonic materials and the manner in which they are used to determine if there are cognitive biases for corresponding materials. The results can then be used to restructure and organise the framework of the device making processes more efficient. The framework will be constructed in Max/MSP and Supercollider using a variety of different hardware such as: Arduino microprocessors, piezo pickups, Leapmotion and Kinnect (sensor), an iphone (or similar mobile device), microphones, digital projectors and other playback devices.
The resulting art will intrinsically challenge existing roles of consciousness and the nature of composition by removing traditional physical boundaries of devices ad could provide further resources and evidence for study in cognitive science. The devices and computational methods devised could potentially provide practical solutions and tools that would be applicable to any human computer interaction, not just within the arts.
Proposed Timeline of Study.
1. Research into cognitive influence and individual perception, underlining sensorial factors that affect cognition.
2. Research and testing of initial selected technologies and established physical computing methodology to capture sensory experience and behaviour.
3. Application of initial technology – beta patches. Application of sensory capturing devices, and methodology experimentation.
4. Research and experimentation of generative programming.
5. Beta test of devices by myself, followed by a variety of composers culminating in a performance.
6. Collate feedback from composers and listeners, analyse input and
performance data of software/patches framework for the device(s). Evaluate output.
7. Review sensory components. Further research into sensory/cognitive processes. Re-evaluate technology. Alter patches/device for greater efficiency, further research and application of generative programming. Upon completing individual sensorial/behaviour research smaller modular patches and devices will be demonstrated via small installations focusing specifically an individual input at a time.
8. Demonstration of complete device in an installation followed by critical analysis.
9. Conclude research.
1 1.Daniel B. Willingham, 1.Joanna Salidis, And 2.John D.E. Gabrieli,
Direct Comparison of Neural Systems Mediating Conscious and Unconscious Skill Learning. 1.Department of Psychology, University of Virginia, Charlottesville, Virginia 22904; 2.Department of Psychology, Stanford University. Stanford, California 94305,
2 Georg Boenn,
The Importance Of Husserl’s Phenomenology of Internal Time-Consciousness for Music Analysis and Composition,
University of Glamorgan,
Cardiff School of Creative and Cultural Industries,
3 H. Honing,
Structure and Interpretation of Rhythm and Timing, Tijdschrift voor Muziektheorie, 7:3,
4 Papadelis, G. and Papanikolaou, G.
The Perceptual Space Between and Within Musical Rhythm Categories, In: Davidson, J.W. ed,
The Music Practitioner,
5 David Huron,
Sweet Anticipation: Music and the Psychology of Expectation, MIT Press,