Learning about LLMs
Here I will write about my journey learning about LLMs.
Running LLMs locally
The first thing I was interested in was running pretrained LLMs locally and learning about the tools and libraries available to do so. I have heard a lot about llama.cpp and ollama being used to run LLMs on Apple M series chips, so I wanted to try this out first.
llama.cpp
allows for LLM inference in C++ on a variety of hardware and supports many models out of the box.ollama
is built aroundllama.cpp
, it is more use-friendly and aims at further optimizing the performance and efficiency ofllama.cpp
. It is the one I chose to use.
Running ollama serve
starts up a server and ollama run <model>
runs the specified model.
Different models can be downloaded, imported from GGUF files or customized with a prompt.
GGUF
is a binary format that is used for quick loading and saving of models.
The tool also coffers a REST API to interact with the models and manage them.
Example usage
This is starting ollama
, and querying the llama3
model (downloaded beforehand) with "Why is the sky blue?" as a prompt. The response is returned in JSON format,
with the response
field containing the generated text.
&
If we are only interested in the response, we can use jq
to extract it:
|
In addition, the reponse also returns a context
field which can be used to keep a short conversational memory between requests by
encoding/decoding the history of the conversation.
Now, here is a simple shell script (vulnerable to JSON injection 🙂) showing how ollama
can be used to generate text with a prompt:
#!/bin/bash
if [; then
fi
prompt=""
response=
|
Example project
Let's use ollama
to generate git messages.
We can use the following Modelfile
to create a llama3
model with a system prompt specifying the goal of the model:
FROM llama3
SYSTEM """
Your only goal is to output git commit messages. You will be given git diff outputs and should exclusively return a git commit message which can be piped directly into the git commit command.
Never include notes or remarks on the commit message you generated.
""""
The model is then created using ollama create git -f Modelfile
.
Now we can use the following script to generate git commit messages from staged changes:
#!/bin/bash
diff_output=
prompt="Write a git commit message given this git diff.\n\nGit Diff:\n"
escaped_prompt=
response=
commit_message=
|
Theory behind LLMs
Attention mechanism for RNNs
Transformer architecture
- An Introduction to Transformers
- Stanford NLP Notes on Self Attention and Transformers
- Attention is All You Need
- The Annotated Transformer
Models
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Encoder-only transformer)
- Generating Wikipedia by Summarizing Long Sequences (Decoder-only transformer)
- Improving Language Understanding by Generative Pretraining (GPT, Decoder-only transformer)
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (Encoder-decoder transformer)
- Let's build GPT: from scratch, in code, spelled out.