RoPECraft – Supplementary Material

Phase

Flow matching optimization yields strong results but sometimes causes duplicated subjects during motion or orientation changes. To mitigate this, we analyze the signals in the frequency domain. By leveraging the fact that spatial displacements cause phase shifts in the Fourier domain, we introduce a phase constraint to improve the spatiotemporal consistency of motion transfer.

Phase vs Magnitude

We ablate two regularizer for magnitude and phase. Using phase constraints as regularizer yields superior results.

Limitations

Demonstration of current shortcomings and broader impacts. Hover over each clip to preview the effect. In the video a boat in the background intermittently appears and disappears, likely due to limitations in the backbone network.

Comparisons

The following results showcase the comparison of RoPECraft across a wide range of reference motion videos. Hover over each clip to preview the generated motion transfer in action. In each video, the original video is on the top row labeled as "DAVIS" and the generated video is on the other 2 rows labeled as the name of the method. The prompt used for each video is also provided at the bottom.

CogVideoX Results

Following results showcase the generalization of RoPECraft to other models. The first video is from the DAVIS dataset, second video is from RoPECraft built on CogVideoX, third video is from Go With The Flow and last one is from DiTFlow. The prompt used for each video is also provided at the bottom. Hover over each clip to preview the generated motion transfer in action.

As we can see, Go With The Flow struggles with object shapes or multiple subjects since it is tied to the original video which the noise was extracted from. DiTFlow on the other hand fails to transfer motion in some scenarios. Our method is able to transfer motion in all scenarios and is able to generalize to other models.

More Results

Supplementary videos showcase ∼30 more generations across complex trajectories. Hover over each clip to preview the generated motion transfer in action.