Technology Showcase

Real Time Synthesised Sound Effects Service (RTSFX)

    RPPtv is an SME providing original, online content creation services. We have led a number of
    completed projects in the development of these technologies from InnovateUK, UK Govt and the EU, have twice been invited to showcase at the IBC New Technology Village and been shortlisted for RTS Technology Innovation. Currently project lead for the SFX Synthesis Service Cross Platform and Robotics and Autonomous Sytems ASSIGN InnovateUK projects, RPPtv is leading the field to integrate sound synthesis and autonomous system capabilities into the worldwide digital effects production chain. The company is developing a cloud based service for the global professional TV, video, film, games and emerging immersive/webVR industry and this will be available subsequently for the expanding prosumer/social media “enthusiasts” or “home” makers of media content. The software components of the service will enable any user to add, automatically and in real time, quality synthethsised sound effects to virtual reality and immersive productions. The technology reduces production time and cost, bringing real time control of sounds within the consumer’s in media experience by using artificial intelligence (AI) and machine learning to assist the sound placement and tracking synchronised to the visuals. It can work for all cross-platform media production. The technology is being developed by a strong partnership of academic audio engineering research excellence and industry end users to validate and bring a user focused design methodology. Agreements are in place to commercialise the service.

    Digital visual effects have reached a stunning level of realism and are almost indistinguishable from reality. Animation is frequently created by indicating the scene and the action to a rendering engine, without designers having to specify every little detail. Yet, sound effects (SFX) lag behind, requiring complex manual integration and investment in large, difficult to navigate sound effects libraries. In immersive media, co-ordinating 360 and 3D video with realistic, believable sound effects is a problematic task with current technologies. Our sound synthesis approach enables the creation of customized sound effects in real-time, using a small number of versatile lightweight models delivered as a cloud service to eliminate the problem of storage and piracy. It makes SFX production intuitive regardless of skill and unleashes creativity. RPPtv has a proven track record in developing content creation tools delivered from the cloud. Our objective is to build and validate a scalable prototype embedded in industry by September 2018, consisting of three key components: generation of sounds; real-time manipulation and control of these sounds; and integration of sound effects into content production workflows utilising machine learning for scene/object detection. Critical mass of sounds is achieved by analysing and reverse engineering existing SFX libraries into the models, thus also enabling users to turn their own libraries into controllable sound synthesis models. The project involves R&D to implement the first autonomous sound synthesis system, patent searches indicate freedom to operate and a new patent application is being prepared.

    With support from our partners, RPPtv is creating a new venture to commercialise real-time generated SFX and disruptive tools for their creation. We are the first to integrate sound synthesis and autonomous system capabilities into the worldwide digital effects production. We won £1.1 M worth of research funding and additional directors’ input value is £450K. Upon completion of a beta protoype embedded in industry in September 2018, we will require a further round of investment and are thus also interested in starting conversations with VC’s looking to get in at growth stage.

    Research with professional sound designers revealed many challenges we can address. Integration of quality SFX into the scene occupies the vast majority of their effort. Video-driven sounds (such as footsteps, which depend on surfaces and must be synchronised with visuals) are the most difficult and time-consuming. Game sound designers continually tweak play events, key frames, and ambience generation to properly integrate sounds into the gameplay. And virtual and augmented reality has a strong need for decision-making systems because the entire world needs to be interactive and it is not possible for a sound designer to manually control all aspects. Throughout computer graphics, animation is driven by digital storyboard information. The rendering of visuals is a property of the object and scene data – if the video script calls for a character to drop a glass, this information is sent to the graphics engine and we see it falling in the virtual world. Sound effects should follow this same paradigm, thus enabling autonomous generation and synchronisation of SFX in immersive, game, film and augmented reality design.

    The Solution – Our company delivers a crossplatform synthetic SFX solution that meets the needs of immersive media and innovates the sound effect production process. RPPtv’s sound synthesis approach enables the creation of customized sound effects in real-time, using a small number of versatile, lightweight models that require little storage space and make sound production easy, intutitive, and fun. SFX are easily generated using intuitive controls, replacing laborious integration and repetitive re-use of sounds across media prodcutions including the amateur and prosumer markets. The SFX models low data footprint meet the requirements of mobile technologies and are cost-efficient. Catering to the professional market, we are producing our technology as software that can transform the SFX production process. Our software algorithms are being developed to deliver reverse engineering technology to sound designers, enabling them to upload, recreate and shape existing SFX samples and sound effect archives to their desire. The use of our models, integrated with scene and object recognition, will have the potential to reduce lengthy search times when looking for specific sounds and offer considerable benefits to sound designers already invested in traditional sound libraries. Our Innovate UK funded autonomous systems project develops real-time, contextual sound generation. This is essential to the immersive market where current technology and resource capabilities are challenged by the increasing complexity of 360 and 3D environments.

    The Business Model – An intelligent sound synthesis service, with its versatility and flexibility, would provide new ways to create and manipulate content yet cover the full range of existing services and products. It would address the main problems with the existing model: complexity, inflexibility, monotony, and the need for manual intervention. This project represents a major step towards building an audio production studio using fully synthetic, procedural and generative techniques, just as Pixar was the first fully computer animated film studio, or as Industrial Light & Magic create solely digital visual effects. Our two revenue streams come from the licensing of tools aimed at enhancing the sound production process and the provision of SFX samples by subscription or pay per sound. Cloud delivery will protect against piracy and allow us to simultaneously exploit different, user-specific revenue models targeting user groups across amateur to professional as well as supporting the industries required media formats. Our sound production technologies will be licensed as software to professional sound designers and studios. For instance, a falling ball in a visual scene can be detected and tracked. By combining our technologies (reverse engineering of sound effects, visual data capture to inform sound generation and placement) with existing graphics data, the sound of the ball can be generated and placed. Discussions with end users show this to be an effective route to the professional market, where extensive investments in SFX libraries mean that sound synthesis tools would be purchased for their unique functionality in enhancing the process of production and use of existing resources. By introducing a professional product, we can validate and promote the service to prosumers and amateurs. A freemium SFX online service will be used to attract amateur users and raise awareness in addition to subscription bundles for more frequent users, offering the option to invest more in our service as users’ skills increase.

    Appendix 5: Technology demo video

Our scene, object and emotion monitoring algorithms have been integrated into a health system for an ageing population (Innovate UK Community Channel SW) and the company intends to look at the sector as the AI capability develops.

Current activities 2017: Machine Learning and AI

    RPPtv are working on machine learning user interfaces to enable multiple parameters to be controlled by a single touch interface. This uses machine learning and Artificial intelligence, coupled with our technological innovations utilising neural networks, to map a number of parameter variations.
    This allows end users to develop heuristic behaviour patterns allowing granular control over multiple or even single sound and visual elements.
    This allows for creating complex soundscapes with a simplified controller with interactive User Interface. We incorporate object and scene recognition to aid in determining given elements within a scene so this control mechanism has built in “intelligence” for both visual and audio components.
    A second use of machine learning is to understand the parameters of two different sounds say fire and wind and populate all the points between so move between fire, fiery wind and wind seamlessly with one central. This can then be mixed at a granular level to produce totally different sound elements, as an example intelligently combining frying sounds with other elements could represent an electrical disturbance or fire.
    Classification and navigation of sounds archives by machine learning and audio grouping. Sound archives can be understood by accessing characteristics of sounds and using this process to then learn what a sound is by nature, this information can be intelligently grouped to pull up sounds that are close to one being analysed or to allow whole archives to be re grouped according to their closeness i.e. the system will group similar sounds together allowing simple single touch navigation through complex un indexed or limited indexed sound archives.