Super Study Guide: Transformers & Large Language Models

September 12, 2024

41

Price: ~~$39.99~~ - $37.99
(as of Sep 12, 2024 18:40:41 UTC – Details)

This book is a concise and illustrated guide for anyone who wants to understand the inner workings of large language models in the context of interviews, projects or to satisfy their own curiosity.

It is divided into 5 parts:

Foundations: primer on neural networks and important deep learning concepts for training and evaluationEmbeddings: tokenization algorithms, word-embeddings (word2vec) and sentence embeddings (RNN, LSTM, GRU)Transformers: motivation behind its self-attention mechanism, detailed overview on the encoder-decoder architecture and related variations such as BERT, GPT and T5, along with tips and tricks on how to speed up computationsLarge language models: main techniques to tune Transformer-based models, such as prompt engineering, (parameter efficient) finetuning and preference tuningApplications: most common problems including sentiment extraction, machine translation, retrieval-augmented generation and many more

From the Publisher

Transformers & Large Language Models

Book

Book overview

This 250-page book contains ~600 intuitive and colored illustrations along with practical examples to deeply understand concepts related to Transformers & Large Language Models.

The parts below show a glimpse of what this book has to offer.

Target audience Students Researchers Job seekers Machine Learning practitioners Industry leaders

Embeddings

Deep Learning fundamentals & Embeddings

FoundationsNeural networks: input, hidden layer, output, softmax layersTraining: parameter learning, optimizers (including Adam and AdamW), loss functions (cross-entropy, KL divergence and others), regularization (dropout, early stopping, weight regularization)Evaluation: data splits, metrics (confusion matrix and common metrics), bias-variance trade-offEmbeddingsTokenization: tokenizer, vocabulary, BPE, Unigram, encoding, decodingToken embeddings: word2vec (CBOW, skip-gram), GloVEDocument embeddings: bag of words, recurrent neural networks, GRU, LSTM, ELMoEmbedding operations: cosine similarity, t-SNE, locality-sensitive hashing

Transformer

Transformers

Transformer architecture & extensionsSelf-attention mechanism: query key, value, multi-head attentionTransformer model walkthrough and detailed exampleEncoder-only models (BERT), decoder-only models (GPT) and encoder-decoder models (T5)Tricks to optimize complexity & interpretabilitySparse attention: ReformerLow rank attention: Linformer, Performer, LongformerHardware optimization: flash attentionAttention maps, TCAV, integrated gradients, LIME, TracIn

RAG

Large Language Models & Applications

Large language modelsPretraining: data mixtures, training objectivePrompt engineering: context window, token sampling, in-context learning, chain of thought, ReAct, injection, model hallucinationFinetuning: SFT, PEFT methods (LoRA, Prefix Tuning, Adapters)Preference tuning: RLHF (Reward modeling, reinforcement learning), DPO (supervised approach and variants such as IPO)ApplicationsRetrieval augmentation: RAG including retriever, generator and relevant hyperparametersMachine translation, summarization, sentiment extraction and othersMetrics include BLEU, WER, ROUGE, METEOR

ASIN ‏ : ‎ B0DC4NYLTN
Publisher ‏ : ‎ Independently published (August 3, 2024)
Language ‏ : ‎ English
Paperback ‏ : ‎ 247 pages
ISBN-13 ‏ : ‎ 979-8836693312
Item Weight ‏ : ‎ 15.4 ounces
Dimensions ‏ : ‎ 6 x 0.58 x 9 inches