Choi Sen Ho
1
*
, Chap Issac Fung
1
*
, Huicheng Zhang
1
*
, Yulun Wu
1
*
,
Yueqiao Zhang
1
, Zhili Tan
1
, Qiuqiang Kong
2
, Lui Siu Hang
1
, Yaolong Ju
3
#
1 The Central Media Technology Institute, Huawei, China
,
2 The Chinese University of Hong Kong, Hong Kong SAR, China
, 3 Great Bay University, Dongguan, China
,
* Equal Contribution
, # Corresponding Author
In this project, we propose an enhanced pipeline addressing three key challenges:
Our framework acts as a mixing role of improvisational accompanist, composer, and audio engineer, integrating:
This pipeline enables studio-quality musical productions from users' singing while preserving their unique expressive characteristics, representing a significant step toward personalized music generation with no input length limitation.
We demonstrate the effectiveness of our system through qualitative and quantitative evaluations, showing superior alignment, expressiveness, and stylistic consistency compared to state-of-the-art baselines.
Our system delivers precise Music Information Retrieval (MIR) on studio-quality vocal recordings and generates professional-grade MIDI accompaniments, ensuring high fidelity and musical coherence.
Our system also supports a wide range of musical styles, from pop and rock to R&B and traditional Chinese, showcasing its versatility and adaptability to different genres.
| Vocal Input | ||||
| Ballad | ||||
| R&B | ||||
| Funk | ||||
| Chinese Traditional | ||||
| DJ | ||||
| Title Artist |
Air Traffic Clara Berry And Wooldog |
PunchDruck Grants |
Take a Step Meaxic |
Fire Night Panther |
| Vocal Input | ||||
| Ballad | ||||
| R&B | ||||
| Rock | ||||
| Chinese Traditional | ||||
| Title Artist |
Vermont The Districts |
Spacestation Strand Of Oaks |
Bounty Steven Clark |
Curfews Snowmine |
Our system effectively handles amateur singing inputs with performance variations, generating musically coherent accompaniments that enhance the overall listening experience.
Below are some vocals recorded by amateur singers in real-world smartphone situations, demonstrating the system's robustness and versatility across different singing styles.
| Vocal Input | ||||
|
Output *Random Style |
||||
| Title Title in Chinese |
Around the Winter 大约在冬季 |
Tales of the Red Cliff 醉赤壁 |
Flower in a Mirror 镜中花 |
Still in Love with You 依然爱你 |
We also support a wide range of input diversity. From the demo of diversity on amateur recording inputs listed below, you will see not only vocal input, but also hiphop rap input and Chinese traditional instrumental input.
| Vocal Input | |||
|
Ballad |
|||
|
R&B |
|||
|
Funk |
|||
|
DJ |
|||
|
Pure Piano |
|||
| Title Title in Chinese |
The Rain in Qingming 清明雨上 |
Pyrus Reblossom 梨花又开放 |
Babe*Rap -- |
| Vocal Input | |||
|
Ballad |
|||
|
R&B |
|||
|
Funk |
|||
|
DJ |
|||
|
Pure Piano |
|||
| Title Title in Chinese |
Sky 海阔天空 |
Dizi Solo: Trip to Gusu*Instrumental 笛子独奏:姑苏行 |
Not a Hero 不谓侠 |
Visit the GitHub repository for more details about this project.