MusIDE is a professional DAW alternative with AI at its core. Compose entire songs from lyrics — singing synthesis via DiffSinger, realistic instrument rendering via SoundFont, and full-process orchestration by an LLM Agent with 20+ tool calls.
From a single lyric to a complete song — MusIDE handles the entire creative process with AI.
Industry-grade DiffSinger (AAAI-2022) singing voice synthesis. Input lyrics + MIDI notes/durations to generate singing vocals. Uses OpenCPOP pretrained models, supports K_step=100 ultra-fast diffusion, NSF-HiFiGAN vocoder, and pitch shifting via librosa.
Professional instrument synthesis via FluidSynth + SF2 SoundFont. 5 premium tone libraries covering all scenarios: Salamander C5 (24MB), GigaPiano (17MB), FluidR3 GM+GS full set (144MB), GeneralUser GS (30MB), TimGM6mb minimal (6MB). Full 128 General MIDI programs + drum kits supported.
Full-context LLM dialogue intelligence with support for any OpenAI-compatible API. The AI Agent executes professional system prompts, with 20+ tool calls covering audio processing/editing/synthesis, project control, AI composition, etc. Generate complete songs directly from conversation.
Professional multi-track editor built with Web Audio API and Canvas, supporting up to 16 tracks. Each track has independent properties: name, color, icon, instrument, volume, pan. Standard operations: add/delete/drag-move/cut, double-click for detailed note editing.
Horizontal timeline with BPM-based bar/beat grid (default 120 BPM, 4/4 time). Real-time moving playhead, timeline zoom, loop sections. Complete transport controls: play/pause/stop/record/loop, time display, playback control, BPM real-time adjustment.
Piano roll note editor with drag-select, batch move/delete, snap-to-grid. Supports 3 octave ranges (C3-B5), pitch adjustment, note add/delete. Real-time pitch and duration feedback during editing.
Tab-style vertical mixing console view for each channel: independent volume, pan slider, VU meter peak values real-time updated. Mute/solo buttons, channel grouping, batch operations. Real-time audio via AudioContext, supporting multiple simultaneous plays.
Built-in complete music theory module supporting 18 keys (major/minor/Harmonic minor/Japanese scale, etc.), 21 chord types, 20+ chord progressions, 7 cadence types, 4 typical structure templates. AI Agent automatically selects appropriate key, mode, and chord progressions based on style during composition.
AI source separation via Demucs (customizable: vocals/drums/bass/other). Whisper-based speech recognition for audio-to-text transcription. Background thread async processing with real-time progress updates.
Full file manager: create/upload/download/delete with auto audio file detection. Complete Git integration (status/log/branch/stage/commit/push/pull/checkout/clone/diff/stash). Built-in terminal, project settings, search & replace, mobile-optimized responsive UI.
One-command install on any platform. Default address: http://localhost:12346
One codebase, runs everywhere Python runs.
A modular Flask service with rich route blueprints and a Catppuccin-themed web UI.