code

Acestep Songwriting

Verified

by dumoedss

Professional music creation knowledge for writing captions, lyrics, and choosing music parameters for ACE-Step. After using this guide, produce two things for the acestep skill: 1. **Caption** (`-c`): Style/genre/instruments/emotion description 2. **Lyrics** (`-l`): Complete structured lyrics with tags 3. **Parameters**: `--duration`, `--bpm`, `--key`, `--time-signature`, `--language` --- **Caption is the most important factor affecting generated music.** Supports multiple formats: simple style

View on GitHub

ACE-Step Songwriting Guide

Professional music creation knowledge for writing captions, lyrics, and choosing music parameters for ACE-Step.

Output Format

After using this guide, produce two things for the acestep skill:

  1. Caption (-c): Style/genre/instruments/emotion description
  2. Lyrics (-l): Complete structured lyrics with tags
  3. Parameters: --duration, --bpm, --key, --time-signature, --language

Caption: The Most Important Input

Caption is the most important factor affecting generated music.

Supports multiple formats: simple style words, comma-separated tags, complex natural language descriptions.

Common Dimensions

| Dimension | Examples |

|-----------|----------|

| Style/Genre | pop, rock, jazz, electronic, hip-hop, R&B, folk, classical, lo-fi, synthwave |

| Emotion/Atmosphere | melancholic, uplifting, energetic, dreamy, dark, nostalgic, euphoric, intimate |

| Instruments | acoustic guitar, piano, synth pads, 808 drums, strings, brass, electric bass |

| Timbre Texture | warm, bright, crisp, muddy, airy, punchy, lush, raw, polished |

| Era Reference | 80s synth-pop, 90s grunge, 2010s EDM, vintage soul, modern trap |

| Production Style | lo-fi, high-fidelity, live recording, studio-polished, bedroom pop |

| Vocal Characteristics | female vocal, male vocal, breathy, powerful, falsetto, raspy, choir |

| Speed/Rhythm | slow tempo, mid-tempo, fast-paced, groovy, driving, laid-back |

| Structure Hints | building intro, catchy chorus, dramatic bridge, fade-out ending |

Caption Writing Principles

  1. Specific beats vague — "sad piano ballad with female breathy vocal" > "a sad song"
  2. Combine multiple dimensions — style+emotion+instruments+timbre anchors direction precisely
  3. Use references well — "in the style of 80s synthwave" conveys complex aesthetic quickly
  4. Texture words are useful — warm, crisp, airy, punchy influence mixing and timbre
  5. Don't pursue perfection — Caption is a starting point, iterate based on results
  6. Granularity determines freedom — Less detail = more model creativity; more detail = more control
  7. Avoid conflicting words — "classical strings" + "hardcore metal" degrades output

- Fix: Repetition reinforcement — Repeat the elements you want more

- Fix: Conflict to evolution — "Start with soft strings, middle becomes metal rock, end turns to hip-hop"

  1. Don't put BPM/key/tempo in Caption — Use dedicated parameters instead

Lyrics: The Temporal Script

Lyrics controls how music unfolds over time. It carries:

  • Lyric text itself
  • Structure tags ([Verse], [Chorus], [Bridge]...)
  • Vocal style hints ([raspy vocal], [whispered]...)
  • Instrumental sections ([guitar solo], [drum break]...)
  • Energy changes ([building energy], [explosive drop]...)

Structure Tags

| Category | Tag | Description |

|----------|-----|-------------|

| Basic Structure | [Intro] | Opening, establish atmosphere |

| | [Verse] / [Verse 1] | Verse, narrative progression |

| | [Pre-Chorus] | Pre-chorus, build energy |

| | [Chorus] | Chorus, emotional climax |

| | [Bridge] | Bridge, transition or elevation |

| | [Outro] | Ending, conclusion |

| Dynamic Sections | [Build] | Energy gradually rising |

| | [Drop] | Electronic music energy release |

| | [Breakdown] | Reduced instrumentation, space |

| Instrumental | [Instrumental] | Pure instrumental, no vocals |

| | [Guitar Solo] | Guitar solo |

| | [Piano Interlude] | Piano interlude |

| Special | [Fade Out] | Fade out ending |

| | [Silence] | Silence |

Combining Tags

Use - for finer control, but keep it concise:

✅ [Chorus - anthemic]
❌ [Chorus - anthemic - stacked harmonies - high energy - powerful - epic]

Put complex style descriptions in Caption, not in tags.

Caption-Lyrics Consistency

Models are not good at resolving conflicts. Checklist:

  • Instruments in Caption ↔ Instrumental section tags in Lyrics
  • Emotion in Caption ↔ Energy tags in Lyrics
  • Vocal description in Caption ↔ Vocal control tags in Lyrics

Vocal Control Tags

| Tag | Effect |

|-----|--------|

| [raspy vocal] | Raspy, textured vocals |

| [whispered] | Whispered |

| [falsetto] | Falsetto |

| [powerful belting] | Powerful, high-pitched singing |

| [spoken word] | Rap/recitation |

| [harmonies] | Layered harmonies |

| [call and response] | Call and response |

| [ad-lib] | Improvised embellishments |

Energy and Emotion Tags

| Tag | Effect |

|-----|--------|

| [high energy] | High energy, passionate |

| [low energy] | Low energy, restrained |

| [building energy] | Increasing energy |

| [explosive] | Explosive energy |

| [melancholic] | Melancholic |

| [euphoric] | Euphoric |

| [dreamy] | Dreamy |

| [aggressive] | Aggressive |

Lyric Writing Tips

  1. 6-10 syllables per line — Model aligns syllables to beats; keep similar counts for lines in same position (±1-2)
  2. Uppercase = stronger intensityWE ARE THE CHAMPIONS! (shouting) vs walking through the streets (normal)
  3. Parentheses = background vocalsWe rise together (together)
  4. Extend vowelsFeeeling so aliiive (use cautiously, effects unstable)
  5. Clear section separation — Blank lines between sections

Avoiding "AI-flavored" Lyrics

| Red Flag | Description |

|----------|-------------|

| Adjective stacking | "neon skies, electric hearts, endless dreams" — vague imagery filler |

| Rhyme chaos | Inconsistent patterns or forced rhymes breaking meaning |

| Blurred boundaries | Lyric content crosses structure tags |

| No breathing room | Lines too long to sing in one breath |

| Mixed metaphors | Water → fire → flying — listeners can't anchor |

Metaphor discipline: One core metaphor per song, explore its multiple aspects.

Music Metadata

Most of the time, let LM auto-infer. Only set manually when you have clear requirements.

| Parameter | Range | Description |

|-----------|-------|-------------|

| bpm | 30–300 | Slow 60–80, mid 90–120, fast 130–180 |

| keyscale | Key | e.g. C Major, Am. Common keys (C, G, D, Am, Em) most stable |

| timesignature | Time sig | 4/4 (most common), 3/4 (waltz), 6/8 (swing) |

| vocal_language | Language | Usually auto-detected from lyrics |

| duration | Seconds | See duration calculation below |

When to Set Manually

| Scenario | Set |

|----------|-----|

| Daily generation | Let LM auto-infer |

| Clear tempo requirement | bpm |

| Specific style (waltz) | timesignature=3/4 |

| Match other material | bpm + duration |

| Specific key color | keyscale |

Duration Calculation

Estimation Method

  • Intro/Outro: 5-10 seconds each
  • Instrumental sections: 5-15 seconds each
  • Typical structures:

- 2 verses + 2 choruses: 120-150s minimum

- 2 verses + 2 choruses + bridge: 180-240s minimum

- Full song with intro/outro: 210-270s (3.5-4.5 min)

BPM and Duration Relationship

  • Slower BPM (60-80): Need MORE duration for same lyrics
  • Medium BPM (100-130): Standard duration
  • Faster BPM (150-180): Can fit more lyrics, but still need breathing room

Rule of thumb: When in doubt, estimate longer. A song too short feels rushed.

Note: Lyrics tags (piano, powerful, whispered) are consistent with Caption (piano ballad, building to powerful chorus, intimate).