better papagayo lipsync

Issue #507 resolved
Alessandro Padovani created an issue

blender 2.92, commit 59c302c

This is related to #504. Automatic lipsync is a fast way to get our figures to talk, but the result is often "robotic". This is because lipsync translates every single phoneme into a complete mouth action for that phoneme. While we as humans tend to "relax" a lot between mouth poses.

The following "rules" are what I extrapolated when editing a lipsync to get a more human behaviour. We may add a tool to "relax" a lipsync track by following these rules. They're simple but effective. Essentially they remove most etc poses to blend-in the other phonemes. The principle is that phonemes tend to maintain their shape in a human speech. Then we may find better rules for relaxing but I believe this is a good start.

  1. only keep the first initial "rest" pose, remove all the "rest" and "etc" poses until the first phoneme
  2. keep a "etc" pose only between two same open vowels "AI E O" and delete all other "etc". ex. "AI etc AI" is good, while "AI etc O" becomes "AI O" and "AI etc FV" becomes "AI FV"
  3. use 50% of next shape for "etc", ex. "AI etc AI" becomes "AI AI50% AI"
  4. set open vowels "AI E O" to 50% between consonants "FV MBP W", ex. "FV AI50% FV"
  5. only keep the first final "rest" pose, remove all the "rest" and "etc" poses after the last phoneme

Also an option to "emphasize" the open vowels "AI E O" may be effective to control the speech. For example we may set a 150% or 200% factor to emphasize the speech. Or we may set 75% or 50% to diminish it. Please note that "etc" must be affected too being the middle pose, ex. with rule 3 150% of "AI etc AI" becomes "AI150% AI75% AI150%".

Attached is an example with a dat file from lipsync-o-tizer that's an automatic lipsync derived from papagayo. First the automatic lipsync papagayo-test.mp4, then the relaxed version with a 150% emphasis papagayo-fix.mp4. We see that the relaxed version is much more human like, while in the automatic version there are "robotic glitches".

https://morevnaproject.org/papagayo-ng/
https://www.autolipsync-o-tizer.com/

Comments (13)

  1. Alessandro Padovani reporter

    As a side note to allow “emphasize” we need to unset the slider limits in the global settings. Also it would be useful to implement #480 for general animation.

  2. Alessandro Padovani reporter

    Commit 66b1ada doesn’t work fine. It seems to remove all etc poses that’s not ideal since this way two successive phonemes that are the same don’t get spoken, ex. “AI etc AI etc AI“ in papagayo becomes “AA AA AA“ in daz. Then Thomas feel free to improve the rules if you find a better way, but just removing etc doesn’t work.

    Also it would be better to have “relax” as an option so the user can load the original papagayo animation if he wants to.

    I’m attaching papagayo-fix.blend for reference. That’s the daz animation for papagayo-fix.mp4, relaxed with 150% emphasis, how it should be by following the relax rules above.

  3. Alessandro Padovani reporter

    update. If possible it would be nice to improve rule 4. That is, the idea is that the mouth doesn’t have time to stretch to full emphasis when the vowel is between two consonants. But this is only true if the consonants are close enough. And more precisely it’s only the second consonant that matters, that’s the one closing the mouth after the vowel. So we may consider this factor.

    This adds complexity because we have to take into account the distance between phonemes too. So if it is too complex then the original rule 4 may be fine enough.

    4 bis. set open vowels "AI E O" to 50% if followed by a consonant "FV MBP W" in a 3 frames range, ex. "AI50% FV"

  4. Alessandro Padovani reporter

    Since the test speech is quite short I’m posting it here together with the relaxed conversion. So it may be easier to check it for improvements or errors and discuss it in general.

    ORIGINAL PAPAGAYO
    01 rest
    02 rest
    02 etc
    04 AI
    06 etc
    09 AI
    10 etc
    14 AI
    16 etc
    20 E
    24 etc
    35 MBP
    37 E
    40 etc
    45 FV
    47 O
    48 etc
    49 MBP
    52 AI
    54 MBP
    57 AI
    58 etc
    61 O
    68 U
    69 etc
    72 U
    73 etc
    77 E
    81 AI
    83 FV
    86 AI
    88 WQ
    91 E
    93 etc
    100 FV
    103 AI
    108 etc
    111 AI
    113 etc
    115 AI
    119 FV
    123 WQ
    126 AI
    127 etc
    142 U
    149 rest
    151 rest
    

    BLENDER RELAXED WITH 150% EMPHASIS (RULE 4 BIS)
    01 rest
    04 AA150
    06 AA75
    09 AA150
    10 AA75
    14 AA150
    20 EH150
    35 M
    37 EH150
    45 F
    47 OW75
    49 M
    52 AA75
    54 M
    57 AA150
    61 OW150
    68 UW
    72 UW
    77 EH150
    81 AA75
    83 F
    86 AA75
    88 W
    91 EH150
    100 F
    103 AA150
    108 AA75
    111 AA150
    113 AA75
    115 AA150
    119 F
    123 W
    126 AA150
    142 UW
    149 rest
    

  5. Thomas Larsson repo owner

    OK, I think I got it now. The latest commit should work as desired. There is also an option to update the limits of open vowels, so you don’t have to do that manually.

    The strength of an open vowel followed by a silent vowel (vowel? F and M are not vowels) is reduced to half if the time distance is 3 or less. One could perhaps replace this by a sliding scale depending on the distance. So the factor would be 25% for 1 frame, 50% for 2 frames, and 75% for 3 frames.

  6. Alessandro Padovani reporter

    Commit df754fe seems to work fine here. Though I can’t understand the difference with “update limits”, it seems the same with and without, at least with the provided test speech.

    As for “silent vowel” I didn’t know how to name a “silent” phoneme so I invented it. But “consonant” is probably better, though it refers to a letter rather than a phoneme. So I edited the comments above with “consonant”.

    edit. Thomas I agree on the “sliding scale” concept and there are a lot of things that can probably be improved. I just didn’t want to add too much complexity as a first step, but if you feel confident about improving the relax rules please do it.

    I believe it is important to load the relaxed version as an option. So the user can load the original papagayo track if he wants to. This is for various reasons. First this is experimental I’m not an expert at all and it’s not extensively tested so relaxing may not work fine in some cases. Then in papagayo you can edit the phonemes, so if the user already edited the track to his wish he may want the original not relaxed track to load in blender. Finally some advanced user may simply want to edit the original papagayo track himself in blender.

    So Thomas may we please have “relax” as an option ? Or if not please let me know why.

  7. Thomas Larsson repo owner

    Relaxing is made optional in the last commit. The emphasis and update limits options are only displayed when relaxing is active, because they don’t affect the raw papagayo data.

    Update limits is useful if you load the file with emphasis > 1, because by default the visemes are limited between 0 and 1. The picture shows the difference at frame 4 when the test animation was loaded with emphasis = 1.5. Of course, once the limits have been changed you can load new animations and not run into limits.

  8. Alessandro Padovani reporter

    Commit 8c9d7cd works fantastic. Thank you Thomas for the fast fix.

    As for the limits that’s commented in my second post and I keep the sliders limits off in the global settings to allow for emphasis. That’s why I didn’t see any difference. A better popup would be “Update sliders limits to allow for emphasis.“, I myself didn’t understand what it’s for with the actual popup.

    Then if you may implement #480 as requested in the second post that is to allow a general “emphasis” or “plussing” for animation.

  9. bouich jules

    wow thank you so much guys for working on the lips sync!

    i have already noticed some HUGE changes on the last build AMAZING!

    thank you

  10. Alessandro Padovani reporter

    You’re welcome. Luckily Thomas seems interested in new features for animation so we get them implemented. I can’t really do anything good myself without him.

  11. Log in to comment