Deepfake
A deepfake is a synthetic media (image, video, or audio) generated or manipulated by a deep learning model, typically an autoencoder or GAN, to depict a person doing or saying something they never did. Operators encounter deepfakes when using local AI tools like Stable Diffusion or audio cloning models (e.g., RVC) to swap faces or voices. The term matters because deepfakes raise ethical and legal concerns, and local AI operators must be aware of misuse risks and detection methods.
Deeper dive
Deepfakes emerged from academic research on autoencoders and generative adversarial networks (GANs). A typical deepfake pipeline involves training an encoder-decoder pair on source and target faces, then swapping latent representations. Modern approaches use diffusion models (e.g., Stable Diffusion with inpainting) or neural radiance fields for higher quality. On local hardware, face-swapping tools like Roop or FaceFusion run on consumer GPUs (e.g., RTX 3060) at real-time speeds for 720p video. Audio deepfakes use text-to-speech models (e.g., Tortoise TTS) or voice conversion (e.g., RVC) to clone a speaker's timbre. Detection relies on artifacts like inconsistent blinking, lighting mismatches, or frequency analysis. Operators should watermark generated media and avoid non-consensual use.
Practical example
An operator runs FaceFusion on an RTX 4070 to swap a celebrity's face into a video clip. The tool loads a pre-trained InsightFace model (~200 MB) and processes frames at ~30 FPS for 1080p. VRAM usage peaks at ~4 GB. The operator must ensure they have consent from the person whose face is used, as non-consensual deepfakes can violate laws.
Workflow example
In LM Studio, an operator loads a voice cloning model like RVC. They record a 30-second sample of a target voice, then run inference to convert their own speech into the target's voice. The workflow involves selecting the model file (e.g., rvc_v2.pth), adjusting pitch and format parameters, and exporting the output as a WAV file. The operator must verify the output for artifacts and avoid impersonation without permission.
Reviewed by Fredoline Eruo. See our editorial policy.