Muutke küpsiste eelistusi

Build a Text-to-Image Generator (from Scratch): With Transformers and Diffusions [Kõva köide]

  • Formaat: Hardback, 360 pages, kõrgus x laius x paksus: 235x190x20 mm, kaal: 639 g
  • Ilmumisaeg: 23-Jan-2026
  • Kirjastus: Manning Publications
  • ISBN-10: 1633435423
  • ISBN-13: 9781633435421
  • Formaat: Hardback, 360 pages, kõrgus x laius x paksus: 235x190x20 mm, kaal: 639 g
  • Ilmumisaeg: 23-Jan-2026
  • Kirjastus: Manning Publications
  • ISBN-10: 1633435423
  • ISBN-13: 9781633435421
Build your own vision transformer and diffusion models for text-to-image generation–from scratch!

Build a Text-to-Image Generator (from Scratch) takes you step-by-step through creating your own AI models that can generate images from text. You’ll explore two methods of image generation—vision transformers and diffusion models—and learn vital AI development techniques as you go.

Build a Text-to-Image Generator (from Scratch) teaches you how to:

 • Build and train models to generate high resolution images based on text descriptions
 • Edit an existing image based on text prompts
 • Build and train a model to add captions to images
 • Build and train a vision transformer to classify images
 • Fine-tune LLMs for downstream tasks such as classification, text or image generation
 • Better differentiate real images from deepfakes

Build a Text-to-Image Generator (from Scratch) dives into the powerful models behind AI image generators like DALL-E and Stable Diffusion. We believe that the best way to learn is to build something from scratch, so in this book you’ll build your very own diffusion model and vision transformer. As you work through each stage of development, you’ll develop an understanding of how these models can be customized, applied, and integrated for impressive multimodal AI.

About the book

Build a Text-to-Image Generator (from Scratch) guides you through creating AI models that can generate amazing images from simple text prompts. You’ll explore two distinct methods, learning how transformers turn images into sequences of patches, and how diffusion models refine noise into coherent images. Author Mark Liu explains each stage with clear text, diagrams, and examples. You’ll develop models that can classify images, automatically add image captions, reconstruct images, and deliver high-resolution content. By the time you’re done, you’ll have a deep understanding of how image generation AI works—and the satisfaction of building your text-to-image models!

About the reader

For machine learning enthusiasts and data scientists with intermediate Python skills.

About the author

Mark Liu is the founding director of the Master of Science in Finance program at the University of Kentucky. He is also the author of Learn Generative AI with PyTorch.

Get a free eBook (PDF or ePub) from Manning as well as access to the online liveBook format (and its AI assistant that will answer your questions in any language) when you purchase the print book.

Arvustused

This book stands out for its hands-on, no-fluff approach to text-to-image generationperfect for practitioners who want to build rather than just theorize. The clear PyTorch implementations, Colab-friendly examples, and practical exercises make even advanced concepts like Diffusion Models feel achievable. Simeon Leyzerzon, President, Excelsior Software Ltd. 





This book is a great hands-on intro to how text-to-image models like Stable Diffusion actually work under the hood. It explains the roles of transformers, VAEs, and denoising U-Nets in a super approachable way, with lots of code you can run yourself. If youre curious about generative AI and want to build or tweak your own models, this is a solid place to start. Ravikumar Sanapala, Product Manager, Reality Labs, Meta 

PART 1: UNDERSTANDING ATTENTION AND TRANSFORMERS 

1 A TALE OF TWO MODELS: TRANSFORMERS AND DIFFUSIONS 

2 BUILD A TRANSFORMER 

3 CLASSIFY IMAGES WITH A VISION TRANSFORMER (VIT)

4 ADD CAPTIONS TO IMAGES 

PART 2: INTRODUCTION TO DIFFUSION MODELS 

5 GENERATE IMAGES WITH DIFFUSION MODELS 

6 CONTROL WHAT IMAGES TO GENERATE IN DIFFUSION MODELS 

7 GENERATE HIGH-RESOLUTION IMAGES WITH DIFFUSION MODELS 

PART 3: TEXT-TO-IMAGE GENERATION WITH DIFFUSION MODELS 

8 CLIP: A MODEL TO MEASURE THE SIMILARITY BETWEEN IMAGE AND TEXT 

9 TEXT-TO-IMAGE GENERATION WITH LATENT DIFFUSION 

10 A DEEP DIVE INTO STABLE DIFFUSION 

PART 4: TEXT-TO-IMAGE GENERATION WITH TRANSFORMERS 

11 VQGAN: CONVERT IMAGES INTO SEQUENCES OF INTEGERS 

12 A MINIMAL IMPLEMENTATION OF DALL-E 

PART 5: NEW DEVELOPMENTS AND CHALLENGES 

13 NEW DEVELOPMENTS AND CHALLENGES IN TEXT-TO-IMAGE GENERATION 

APPENDIX 

INSTALL PYTORCH AND ENABLE GPU TRAINING LOCALLY AND IN COLAB 
Mark Liu is a professor and program director known for translating cutting-edge AI into practical curricula. With years mentoring graduate students and professionals, Mark brings clarity, rigor, and enthusiasm to every page. He distills deep generative-model expertise into step-by-step guidance that empowers readers to build powerful visual AI systems.