Updated for 2026

MakeThisVid generates video from a text prompt — describe a scene in natural language, and the AI renders a 6-second 720p or 8-second 1080p video with audio in 45–90 seconds. No image upload required; text is the only input. Plans from $19.99/mo; Starter Pack $2.99 for one 720p video.

AI Video Generator from Text: Type a Prompt, Get a Video

By MakeThisVid Editorial Updated May 6, 2026 Methodology

A text prompt is the most expressive input for AI video. No product photo required. No reference image. Just words: describe the scene, the subject, the movement, the light, the mood — and the AI renders it. MakeThisVid's text-to-video generation turns a sentence into a 6-second 720p or 8-second 1080p video clip with audio. You're not choosing from a template library or assembling existing stock footage. You're generating original video from a description that doesn't exist yet. The output is a clean 720p or 1080p MP4, watermark-free, commercial use included — ready for ads, social content, or client delivery. Text input works best when you're precise. The AI interprets your description literally, so the more specific the scene details, the more intentional the output. This page walks through how to write prompts that produce usable results.

How to Use MakeThisVid

From prompt to downloadable MP4, ready to deploy.

Open the Create page in Text → Video mode

Go to makethisvid.com/create. Text → Video is the default mode. A prompt field accepts your scene description in natural language — no special syntax required.
Write a specific scene description

Effective text prompts describe: (1) the subject — what or who is in the frame; (2) the environment — surface, background, setting; (3) the lighting — natural, studio, dramatic, soft; (4) the camera movement — push-in, orbit, static, dolly; (5) the mood or aesthetic — cinematic, vibrant, minimal, gritty. All five in one prompt produces the most controllable results.
Iterate on the prompt

Text-to-video is iterative. Your first generation reveals how the AI interpreted your description. Adjust specifics — add a camera movement if the shot was static, specify lighting if it was flat, add texture words if the scene felt generic. Each generation is 1–2 credits; the cost to iterate is low.
Select resolution and generate

720p (1 credit) for early-stage tests. 1080p (2 credits) for production-quality output. Click Generate. The AI renders the video in 45–90 seconds. Credits refund automatically on failures.
Download and use the video

Download the MP4 — audio is included, watermark-free, commercial use licensed. Post it, run it as an ad, send it to a client, or use it as a visual asset in any medium.

Who Uses MakeThisVid for This

Ad creative without product assets

Don't have product photos or video footage? Generate the ad entirely from text. Describe the product, its setting, and the visual treatment — the AI renders the scene from scratch.

Concept validation before production

Generate a text-to-video rough of a campaign concept before committing to a shoot. Show clients or stakeholders what the visual direction looks like — at the cost of a few credits.

Abstract or atmospheric content

Some of the strongest text-to-video outputs are abstract — light effects, color gradients, nature scenes, mood sequences. These don't require product photography and are particularly effective as brand-feel or ambient social content.

Frequently Asked Questions

How do I write a text prompt for AI video?

Describe the subject, environment, lighting, camera movement, and mood in one sentence or paragraph. E.g., 'A glass perfume bottle sits on a rain-wet marble surface, studio rim lighting catches the facets, slow 180-degree orbit, luxury cosmetics aesthetic, instrumental audio.' Specific language produces more intentional output than vague words like 'cool' or 'nice.'

Do I need to upload any images?

No. Text → Video generates entirely from your written description — no image input required. If you want to animate an existing product photo, use Photo → Video mode instead.

What kinds of scenes can I generate from text?

Product reveals, nature scenes, abstract atmospheric clips, lifestyle scenarios, concept illustrations, architectural renderings, event teasers. The AI handles a wide range of scene types. Extreme close-ups with strong textures and clear subjects tend to produce the most compelling outputs.

How many words should my text prompt be?

1–5 sentences is the practical range. Too short (3–5 words) leaves too much for the AI to decide. Too long (500+ words) dilutes the signal. A 2–3 sentence prompt covering subject, environment, and tone is a reliable starting point.

What resolution and duration are the generated videos?

6 seconds. 720p (1 credit) or 1080p (2 credits). Both include audio and are watermark-free. 16:9 landscape by default; 9:16 vertical available for Instagram Reels and TikTok.

Is commercial use included for text-generated videos?

Yes. Every MakeThisVid plan and credit pack — including the $2.99 starter — includes a commercial use license for any generated video, regardless of the input type.

How does text-to-video compare to photo-to-video?

Text-to-video gives you more creative freedom — the AI generates the scene from scratch, so the output isn't constrained by an existing image. Photo-to-video gives you more visual control — your product or subject appears as the first frame and the AI animates from there. Both modes generate 6-second 720p or 8-second 1080p clips at the same cost.

What if the text-generated video doesn't match my prompt?

Refine the prompt with more specific language and regenerate. The most common causes of mismatch: too vague a description, conflicting visual cues, or an unusual scene the AI hasn't been well-trained on. If the render fails entirely, the credit is refunded automatically.

Type a scene. Get a video.

Describe the product, setting, and mood — AI renders a 6-second 720p or 8-second 1080p clip with audio in under 2 minutes.

Try MakeThisVid