Accepted Papers List

ARTS1142
TeSTNeRF: Text-Driven 3D Style Transfer via Cross-Modal Learning
Jiafu Chen, Boyan Ji, Zhanjie Zhang, Tianyi Chu, Zhiwen Zuo, Lei Zhao, Wei Xing, Dongming Lu
[+] More
[-] Less
Text-driven 3D style transfer aims at stylizing a scene according to the text and generating arbitrary novel views with consistency. Simply combining image/video style transfer methods and novel view synthesis methods results in flickering when changing viewpoints, while existing 3D style transfer methods learn styles from images instead of texts. To address this problem, we for the first time design an efficient text-driven model for 3D style transfer, named TeSTNeRF, to stylize the scene using texts via cross-modal learning: we leverage an advanced text encoder to embed the texts in order to control 3D style transfer and align the input text and output stylized images in latent space. Furthermore, to obtain better visual results, we introduce style supervision, learning feature statistics from style images and utilizing 2D stylization results to rectify abrupt color spill. Extensive experiments demonstrate that TeSTNeRF significantly outperforms existing methods and provides a new way to guide 3D style transfer.
List of keywords
Application domains -> Images and visual arts
Application domains -> Other domains of art or creativity
ARTS1472
Collaborative Neural Rendering Using Anime Character Sheets
Zuzeng Lin, Ailin Huang, Zhewei Huang
[+] More
[-] Less
Drawing images of characters with desired poses is an essential but laborious task in anime production. Assisting artists to create is a research hotspot in recent years. In this paper, we present the Collaborative Neural Rendering (CoNR) method, which creates new images for specified poses from a few reference images (AKA Character Sheets). In general, the diverse hairstyles and garments of anime characters defies the employment of universal body models like SMPL, which fits in most nude human shapes. To overcome this, CoNR uses a compact and easy-to-obtain landmark encoding to avoid creating a unified UV mapping in the pipeline. In addition, the performance of CoNR can be significantly improved when referring to multiple reference images, thanks to feature space cross-view warping in a carefully designed neural network. Moreover, we have collected a character sheet dataset containing over 700,000 hand-drawn and synthesized images of diverse poses to facilitate research in this area. The code and dataset is available at https://github.com/megvii-research/IJCAI2023-CoNR.
List of keywords
Application domains -> Images and visual arts
Methods and resources -> Datasets, knowledge bases and ontologies
ARTS1743
NAS-FM: Neural Architecture Search for Tunable and Interpretable Sound Synthesis Based on Frequency Modulation
Zhen Ye, Wei Xue, Xu Tan, Qifeng Liu, Yike Guo
[+] More
[-] Less
Developing digital sound synthesizers is crucial to the music industry as it provides a low-cost way to produce high-quality sounds with rich timbres. Existing traditional synthesizers often require substantial expertise to determine the overall framework of a synthesizer and the parameters of submodules. Since expert knowledge is hard to acquire, it hinders the flexibility to quickly design and tune digital synthesizers for diverse sounds. In this paper, we propose “NAS-FM”, which adopts neural architecture search (NAS) to build a differentiable frequency modulation (FM) synthesizer. Tunable synthesizers with interpretable controls can be developed automatically from sounds without any prior expert knowledge and manual operating costs. In detail, we train a supernet with a specifically designed search space, including predicting the envelopes of carriers and modulators with different frequency ratios. An evolutionary search algorithm with adaptive oscillator size is then developed to find the optimal relationship between oscillators and the frequency ratio of FM. Extensive experiments on recordings of different instrument sounds show that our algorithm can build a synthesizer fully automatically, achieving better results than handcrafted synthesizers. Audio samples are available at https://nas-fm.github.io/
List of keywords
Application domains -> Music and sound
Methods and resources -> Machine learning, deep learning, neural models, reinforcement learning
ARTS2515
IberianVoxel: Automatic Completion of Iberian Ceramics for Cultural Heritage Studies
Pablo Navarro, Celia Cintas, Manuel Lucena, José Manuel Fuertes, Antonio Rueda, Rafael Segura, Carlos Ogayar-Anguita, Rolando González-José, Claudio Delrieux
[+] More
[-] Less
Accurate completion of archaeological artifacts is a critical aspect in several archaeological studies, including documentation of variations in style, inference of chronological and ethnic groups, and trading routes trends, among many others. However, most available pottery is fragmented, leading to missing textural and morphological cues. Currently, the reassembly and completion of fragmented ceramics is a daunting and time-consuming task, done almost exclusively by hand, which requires the physical manipulation of the fragments. To overcome the challenges of manual reconstruction, reduce the materials’ exposure and deterioration, and improve the quality of reconstructed samples, we present IberianVoxel, a novel 3D Autoencoder Generative Adversarial Network (3D AE-GAN) framework tested on an extensive database with complete and fragmented references. We generated a collection of 1001 3D voxelized samples and their fragmented references from Iberian wheel-made pottery profiles. The fragments generated are stratified into different size groups and across multiple pottery classes. Lastly, we provide quantitative and qualitative assessments to measure the quality of the reconstructed voxelized samples by our proposed method and archaeologists’ evaluation.
List of keywords
Application domains -> Other domains of art or creativity
Methods and resources -> Machine learning, deep learning, neural models, reinforcement learning
Theory and philosophy of arts and creativity in AI systems -> Cultural and social impacts of creative AI
Theory and philosophy of arts and creativity in AI systems -> Support of human creativity
ARTS5112
Learn and Sample Together: Collaborative Generation for Graphic Design Layout
Haohan Weng, Danqing Huang, Tong Zhang, Chin-Yew Lin
[+] More
[-] Less
In the process of graphic layout generation, user specifications including element attributes and their relationships are commonly used to constrain the layouts (e.g.,"put the image above the button”). It is natural to encode spatial constraints between elements using a graph. This paper presents a two-stage generation framework: a spatial graph generator and a subsequent layout decoder which is conditioned on the previous output graph. Training the two highly dependent networks separately as in previous work, we observe that the graph generator generates out-of-distribution graphs with a high frequency, which are unseen to the layout decoder during training and thus leads to huge performance drop in inference. To coordinate the two networks more effectively, we propose a novel collaborative generation strategy to perform round-way knowledge transfer between the networks in both training and inference. Experiment results on three public datasets show that our model greatly benefits from the collaborative generation and has achieved the state-of-the-art performance. Furthermore, we conduct an in-depth analysis to better understand the effectiveness of graph condition modeling.
List of keywords
Application domains -> Images and visual arts
Application domains -> Text, literature and creative language
Methods and resources -> Machine learning, deep learning, neural models, reinforcement learning
Theory and philosophy of arts and creativity in AI systems -> Autonomous creative or artistic AI
ARTS5448
Evaluating Human-AI Interaction via Usability, User Experience and Acceptance Measures for MMM-C: A Creative AI System for Music Composition
Renaud Bougueng Tchemeube, Jeffrey Ens, Cale Plut, Philippe Pasquier, Maryam Safi, Yvan Grabit, Jean-Baptiste Rolland
[+] More
[-] Less
With the rise of artificial intelligence (AI), there has been increasing interest in human-AI co-creation in a variety of artistic domains including music as AI-driven systems are frequently able to generate human-competitive artifacts. Now, the implications of such systems for the musical practice are being investigated. This paper reports on a thorough evaluation of the user adoption of the Multi-Track Music Machine (MMM) as a minimal co-creative AI tool for music composers. To do this, we integrate MMM into Cubase, a popular Digital Audio Workstation (DAW), by producing a "1-parameter" plugin interface named MMM-Cubase, which enables human-AI co-composition. We conduct a 3-part mixed method study measuring usability, user experience and technology acceptance of the system across two groups of expert-level composers: hobbyists and professionals. Results show positive usability and acceptance scores. Users report experiences of novelty, surprise and ease of use from using the system, and limitations on controllability and predictability of the interface when generating music. Findings indicate no significant difference between the two user groups.
List of keywords
Theory and philosophy of arts and creativity in AI systems -> Evaluation of artistic or creative outputs produced by AI Systems
Application domains -> Music and sound
Methods and resources -> Machine learning, deep learning, neural models, reinforcement learning
Theory and philosophy of arts and creativity in AI systems -> Social (multi-agent) creativity and human-computer co-creation
Theory and philosophy of arts and creativity in AI systems -> Support of human creativity
ARTS5508
DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models
Sicheng Yang, Zhiyong Wu, Minglei Li, Zhensong Zhang, Lei Hao, Weihong Bao, Ming Cheng, Long Xiao
[+] More
[-] Less
The art of communication beyond speech there are gestures. The automatic co-speech gesture generation draws much attention in computer animation. It is a challenging task due to the diversity of gestures and the difficulty of matching the rhythm and semantics of the gesture to the corresponding speech. To address these problems, we present DiffuseStyleGesture, a diffusion model based speech-driven gesture generation approach. It generates high-quality, speech-matched, stylized, and diverse co-speech gestures based on given speeches of arbitrary length. Specifically, we introduce cross-local attention and self-attention to the gesture diffusion pipeline to generate better speech matched and realistic gestures. We then train our model with classifier-free guidance to control the gesture style by interpolation or extrapolation. Additionally, we improve the diversity of generated gestures with different initial gestures and noise. Extensive experiments show that our method outperforms recent approaches on speech-driven gesture generation. Our code, pre-trained models, and demos are available at https://github.com/YoungSeng/DiffuseStyleGesture.
List of keywords
Application domains -> Performances, dance
Application domains -> Music and sound
Methods and resources -> Machine learning, deep learning, neural models, reinforcement learning
Theory and philosophy of arts and creativity in AI systems -> Social (multi-agent) creativity and human-computer co-creation
ARTS5558
Automating Rigid Origami Design
Jeremia Geiger, Karolis Martinkus, Oliver Richter, Roger Wattenhofer
[+] More
[-] Less
Rigid origami has shown potential in large diversity of practical applications. However, current rigid origami crease pattern design mostly relies on known tessellations. This strongly limits the diversity and novelty of patterns that can be created. In this work, we build upon the recently developed principle of three units method to formulate rigid origami design as a discrete optimization problem, the rigid origami game. Our implementation allows for a simple definition of diverse objectives and thereby expands the potential of rigid origami further to optimized, application-specific crease patterns. We showcase the flexibility of our formulation through use of a diverse set of search methods in several illustrative case studies. We are not only able to construct various patterns that approximate given target shapes, but to also specify abstract, function-based rewards which result in novel, foldable and functional designs for everyday objects.
List of keywords
Methods and resources -> Applications and software frameworks
Application domains -> Ideation
Application domains -> Other domains of art or creativity
Methods and resources -> Evolutionary algorithms
Methods and resources -> Machine learning, deep learning, neural models, reinforcement learning
Theory and philosophy of arts and creativity in AI systems -> Support of human creativity
ARTS5568
Towards Symbiotic Creativity: A Methodological Approach to Compare Human and AI Robotic Dance Creations
Allegra De Filippo, Luca Giuliani, Eleonora Mancini, Andrea Borghesi, Paola Mello, Michela Milano
[+] More
[-] Less
Artificial Intelligence (AI) has gradually attracted attention in the field of artistic creation, resulting in a debate on the evaluation of AI artistic outputs. However, there is a lack of common criteria for objective artistic evaluation both of human and AI creations. This is a frequent issue in the field of dance, where different performance metrics focus either on evaluating human or computational skills separately. This work proposes a methodological approach for the artistic evaluation of both AI and human artistic creations in the field of robotic dance. First, we define a series of common initial constraints to create robotic dance choreographies in a balanced initial setting, in collaboration with a group of human dancers and choreographer. Then, we compare both creation processes through a human audience evaluation. Finally, we investigate which choreography aspects (e.g., the music genre) have the largest impact on the evaluation, and we provide useful guidelines and future research directions for the analysis of interconnections between AI and human dance creation.
List of keywords
Theory and philosophy of arts and creativity in AI systems -> Evaluation of artistic or creative outputs produced by AI Systems
Application domains -> Performances, dance
ARTS5605
The ACCompanion: Combining Reactivity, Robustness, and Musical Expressivity in an Automatic Piano Accompanist
Carlos Cancino-Chacón, Silvan Peter, Patricia Hu, Emmanouil Karystinaios, Florian Henkel, Francesco Foscarin, Gerhard Widmer
[+] More
[-] Less
This paper introduces the ACCompanion, an expressive accompaniment system. Similarly to a musician who accompanies a soloist playing a given musical piece, our system can produce a human-like rendition of the accompaniment part that follows the soloist’s choices in terms of tempo, dynamics, and articulation. The ACCompanion works in the symbolic domain, i.e., it needs a musical instrument capable of producing and playing MIDI data, with explicitly encoded onset, offset, and pitch for each played note. We describe the components that go into such a system, from real-time score following and prediction to expressive performance generation and online adaptation to the expressive choices of the human player. Based on our experience with repeated live demonstrations in front of various audiences, we offer an analysis of the challenges of combining these components into a system that is highly reactive and precise, while still a reliable musical partner, robust to possible performance errors and responsive to expressive variations.
List of keywords
Application domains -> Music and sound
Methods and resources -> Applications and software frameworks
Theory and philosophy of arts and creativity in AI systems -> Social (multi-agent) creativity and human-computer co-creation
ARTS5607
Discrete Diffusion Probabilistic Models for Symbolic Music Generation
Matthias Plasser, Silvan Peter, Gerhard Widmer
[+] More
[-] Less
Denoising Diffusion Probabilistic Models (DDPMs) have made great strides in generating high-quality samples in both discrete and continuous domains. However, Discrete DDPMs (D3PMs) have yet to be applied to the domain of Symbolic Music. This work presents the direct generation of Polyphonic Symbolic Music using D3PMs. Our model exhibits state-of-the-art sample quality, according to current quantitative evaluation metrics, and allows for flexible infilling at the note level. We further show, that our models are accessible to post-hoc classifier guidance, widening the scope of possible applications. However, we also cast a critical view on quantitative evaluation of music sample quality via statistical metrics, and present a simple algorithm that can confound our metrics with completely spurious, non-musical samples.
List of keywords
Methods and resources -> Machine learning, deep learning, neural models, reinforcement learning
Application domains -> Music and sound
Theory and philosophy of arts and creativity in AI systems -> Autonomous creative or artistic AI
Theory and philosophy of arts and creativity in AI systems -> Evaluation of artistic or creative outputs produced by AI Systems
ARTS5652
Q&A: Query-Based Representation Learning for Multi-Track Symbolic Music re-Arrangement
Jingwei Zhao, Gus Xia, Ye Wang
[+] More
[-] Less
Music rearrangement is a common music practice of reconstructing and reconceptualizing a piece using new composition or instrumentation styles, which is also an important task of automatic music generation. Existing studies typically model the mapping from a source piece to a target piece via supervised learning. In this paper, we tackle rearrangement problems via self-supervised learning, in which the mapping styles can be regarded as conditions and controlled in a flexible way. Specifically, we are inspired by the representation disentanglement idea and propose Q&A, a query-based algorithm for multi-track music rearrangement under an encoder-decoder framework. Q&A learns both a content representation from the mixture and function (style) representations from each individual track, while the latter queries the former in order to rearrange a new piece. Our current model focuses on popular music and provides a controllable pathway to four scenarios: 1) re-instrumentation, 2) piano cover generation, 3) orchestration, and 4) voice separation. Experiments show that our query system achieves high-quality rearrangement results with delicate multi-track structures, significantly outperforming the baselines.
List of keywords
Application domains -> Music and sound
Methods and resources -> Machine learning, deep learning, neural models, reinforcement learning
Theory and philosophy of arts and creativity in AI systems -> Autonomous creative or artistic AI
ARTS5672
Graph-based Polyphonic Multitrack Music Generation
Emanuele Cosenza, Andrea Valenti, Davide Bacciu
[+] More
[-] Less
Graphs can be leveraged to model polyphonic multitrack symbolic music, where notes, chords and entire sections may be linked at different levels of the musical hierarchy by tonal and rhythmic relationships. Nonetheless, there is a lack of works that consider graph representations in the context of deep learning systems for music generation. This paper bridges this gap by introducing a novel graph representation for music and a deep Variational Autoencoder that generates the structure and the content of musical graphs separately, one after the other, with a hierarchical architecture that matches the structural priors of music. By separating the structure and content of musical graphs, it is possible to condition generation by specifying which instruments are played at certain times. This opens the door to a new form of human-computer interaction in the context of music co-creation. After training the model on existing MIDI datasets, the experiments show that the model is able to generate appealing short and long musical sequences and to realistically interpolate between them, producing music that is tonally and rhythmically consistent. Finally, the visualization of the embeddings shows that the model is able to organize its latent space in accordance with known musical concepts.
List of keywords
Application domains -> Music and sound
Methods and resources -> Machine learning, deep learning, neural models, reinforcement learning
Theory and philosophy of arts and creativity in AI systems -> Autonomous creative or artistic AI
Theory and philosophy of arts and creativity in AI systems -> Social (multi-agent) creativity and human-computer co-creation
Theory and philosophy of arts and creativity in AI systems -> Support of human creativity