DSPyを用いたシンプルなRAGシステムの構築

井元剛

前回はDSPyについての基本的な紹介を致しました。
今回は、実際にDSPyにデータを取得させ、それについての回答をChatGPT-3.5にさせる、という少し具体的な使用方法を紹介しつつ、DSPyについての理解を深めて行きたいと思います。

これから出てくるサンプルソースコードは、JupyterNotebookでの逐次実行形式での実行を想定しています。
但し、最初から最後までのソースコードを一つのファイルとして保存して実行してもColBERTv2のデータ読み込みに時間がかかる場合、読込部分の同期処理などを行わないとエラーで落ちてしまいますので、一工夫が必要となります。
Google Colabの無料枠などでも動くかと思いますので、環境が手元に無い方はそちらを試して見て下さい。

RAGとは?

Retrieval-augmented generation(RAG)とは、大規模言語モデル(LLM)が与えられた大規模な知識コーパス(社内文章やソースコードなど)にアクセスし、その文章を検索して関連する段落やコンテンツを見つけ、自分では事前に知り得ない知識に対しての回答を生成できるようにするアプローチです。

RAGは、LLMが元々トレーニングされていない知識についても、リアルタイムの知識を動的に活用し、精度の高い回答を提供できるようにします。しかし、洗練されたRAGシステムには、LLMが知り得ない知識を参照・検索するパイプラインのセットアップが必要で、構築するシステムには複雑さが伴います。これらの複雑さを軽減するために、私たちはDSPyに注目します。DSPyは、上記のパイプラインシステムをシームレスにセットアップする方法を提供してくれるからです。

言語モデル(LM)と検索モデル(RM)の設定

まず、言語モデル(LM)と検索モデル(RM)をセットアップします。DSPyは複数のLMとRMのAPIとローカルモデルのホスティングをサポートしています。

以下の例では、GPT-3.5(gpt-3.5-turbo)とColBERTv2検索器(2017年のWikipediaの「要約」検索インデックスを無料でホスティングするサーバーです)を使用します。LMとRMをDSPy内で設定し、DSPyが生成や検索のために必要に応じて各モジュールを内部的に呼び出せるようにします。

import dspy

turbo = dspy.OpenAI(model='gpt-3.5-turbo')
colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')

dspy.settings.configure(lm=turbo, rm=colbertv2_wiki17_abstracts)

データセットの読み込み

今回の例では、DSPyに標準で用意されてるHotPotQAデータセットを使用します。これは通常、マルチホップ方式という複雑な質問と回答のペアのコレクションです。DSPyが提供するこのデータセットは、HotPotQAクラスを通じて読み込むことができます。

from dspy.datasets import HotPotQA

# データセットを読み込みます。
dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0)

# DSPyに'question'フィールドが入力であることを伝えます。他のフィールドはラベルやメタデータです。
trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]

len(trainset), len(devset)

上記の実行結果は下記になります。

(20, 50)

シグネチャの構築

データを読み込んだので、パイプラインのサブタスクのシグネチャを定義します。
シンプルな入力質問と出力回答を識別できますが、RAGパイプラインを構築しているため、ColBERTコーパスからの文脈情報を利用する事にします。

class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

contextとanswerフィールドに小さな説明を含めて、モデルが受け取るものと生成すべきものについて堅牢なガイドラインを定義します。

パイプラインの構築

RAGパイプラインをDSPyモジュールとして構築します。これには2つの関数が必要です。

  1. __init__関数は、必要なサブモジュール(dspy.Retrieveとdspy.ChainOfThought)を単純に宣言します。後者はGenerateAnswerシグネチャを実装するように定義されています。
  2. forward関数は、質問に答えるために使用するモジュールの制御フローを記述します。質問が与えられたら、上位3つの関連する段落を検索し、それらを文脈として回答生成に供給します。
class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
    
    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

パイプラインの最適化

RAGパイプラインを定義したので、次にコンパイルします。プログラムをコンパイルすると、各モジュールに格納されているパラメータが更新されます。今回の設定では、主にプロンプト内に含めるための良いデモンストレーションを収集し選択する形で行われます。

コンパイルは3つの要素に依存します。

  1. トレーニングセット。上記のtrainsetから20の質問-回答の例を使用します。
  2. 検証のためのメトリック。予測された回答が正しいこと、および検索された文脈が実際にその回答を含んでいることをチェックする簡単なvalidate_context_and_answerを定義します。
  3. 特定のテレプロンプター。DSPyコンパイラには、プログラムを最適化できる多数のテレプロンプターが含まれています。
from dspy.teleprompt import BootstrapFewShot

# 検証ロジック:予測された回答が正しいことをチェックします。
# また、検索された文脈が実際にその回答を含んでいることもチェックします。
def validate_context_and_answer(example, pred, trace=None):
    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
    answer_PM = dspy.evaluate.answer_passage_match(example, pred)
    return answer_EM and answer_PM

# RAGプログラムをコンパイルする基本的なテレプロンプターを設定します。
teleprompter = BootstrapFewShot(metric=validate_context_and_answer)

# コンパイルします。
compiled_rag = teleprompter.compile(RAG(), trainset=trainset)

テレプロンプターについては、DSPy公式のドキュメントからの引用を下記に補足説明として記します。

テレプロンプターは、任意のプログラムを取り、そのモジュールの効果的なプロンプトをブートストラップし選択することを学習できる強力な最適化ツールです。そのため、「遠隔でのプロンプティング」を意味する名前が付けられています。

異なるテレプロンプターは、コストと品質の最適化などの観点でさまざまなトレードオフを提供します。上の例では、単純なデフォルトのBootstrapFewShotを使用しました。

アナロジーが好きな方には、これを標準的なDNN教師あり学習セットアップにおけるトレーニングデータ、損失関数、最適化器と考えることができます。SGDが基本的な最適化器である一方で、AdamやRMSPropのようなより洗練された(そしてより高価な)最適化器があります。

パイプラインの実行

RAGプログラムをコンパイルしたので、試してみます。

# この単純なRAGプログラムに好きな質問をしてください。
my_question = "What castle did David Gregory inherit?"

# 予測を取得します。これには`pred.context`と`pred.answer`が含まれています。
pred = compiled_rag(my_question)

# 文脈と回答を出力します。
print(f"Question: {my_question}")
print(f"Predicted Answer: {pred.answer}")
print(f"Retrieved Contexts (truncated): {[c[:200] + '...' for c in pred.context]}")

出力は下記になります。

Question: What castle did David Gregory inherit?

Predicted Answer: Kinnairdy Castle

Retrieved Contexts (truncated): ['David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinn...', 'Gregory Tarchaneiotes | Gregory Tarchaneiotes (Greek: Γρηγόριος Ταρχανειώτης , Italian: "Gregorio Tracanioto" or "Tracamoto" ) was a "protospatharius" and the long-reigning catepan of Italy from 998 t...', 'David Gregory (mathematician) | David Gregory (originally spelt Gregorie) FRS (? 1659 – 10 October 1708) was a Scottish mathematician and astronomer. He was professor of mathematics at the University ...']

では、どのようなプロンプトが生成され実行されたのか?を見てみます。

turbo.inspect_history(n=1)

出力は下記になります。かなり長いプロンプトが生成されていることが分かります。


Answer questions with short factoid answers.

---

Question: At My Window was released by which American singer-songwriter?
Answer: John Townes Van Zandt

Question: "Everything Has Changed" is a song from an album released under which record label ?
Answer: Big Machine Records

Question: The Victorians - Their Story In Pictures is a documentary series written by an author born in what year?
Answer: 1950

Question: Which Pakistani cricket umpire who won 3 consecutive ICC umpire of the year awards in 2009, 2010, and 2011 will be in the ICC World Twenty20?
Answer: Aleem Sarwar Dar

Question: Having the combination of excellent foot speed and bat speed helped Eric Davis, create what kind of outfield for the Los Angeles Dodgers?
Answer: "Outfield of Dreams"

Question: Who is older, Aleksandr Danilovich Aleksandrov or Anatoly Fomenko?
Answer: Aleksandr Danilovich Aleksandrov

Question: The Organisation that allows a community to influence their operation or use and to enjoy the benefits arisingwas founded in what year?
Answer: 2010

Question: Tombstone stared an actor born May 17, 1955 known as who?
Answer: Bill Paxton

Question: In what year was the club founded that played Manchester City in the 1972 FA Charity Shield
Answer: 1874

Question: which American actor was Candace Kita guest starred with
Answer: Bill Murray

Question: Which is taller, the Empire State Building or the Bank of America Tower?
Answer: The Empire State Building

Question: Which company distributed this 1977 American animated film produced by Walt Disney Productions for which Sherman Brothers wrote songs?
Answer: Buena Vista Distribution

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the answer}. We ...

Answer: often between 1 and 5 words

---

Context:
[1] «Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to the martial art of taekwondo, and is published in the United States of America. While the title suggests that it focuses on taekwondo exclusively, the magazine also covers other Korean martial arts. "Tae Kwon Do Times" has published articles by a wide range of authors, including He-Young Kimm, Thomas Kurz, Scott Shaw, and Mark Van Schuyver.»
[2] «Kwon Tae-man | Kwon Tae-man (born 1941) was an early Korean hapkido practitioner and a pioneer of the art, first in Korea and then in the United States. He formed one of the earliest dojang's for hapkido in the United States in Torrance, California, and has been featured in many magazine articles promoting the art.»
[3] «Hee Il Cho | Cho Hee Il (born October 13, 1940) is a prominent Korean-American master of taekwondo, holding the rank of 9th "dan" in the martial art. He has written 11 martial art books, produced 70 martial art training videos, and has appeared on more than 70 martial arts magazine covers. Cho won several national and international competitions as a taekwondo competitor, and has appeared in several films, including "Fight to Win", "Best of the Best", "Bloodsport II", and "Bloodsport III". He founded the Action International Martial Arts Association (AIMAA) in 1980, and is its President. Cho is a member of both "Black Belt" magazine's Hall of Fame and "Tae Kwon Do Times" magazine's Hall of Fame.»

Question: Which magazine has published articles by Scott Shaw, Tae Kwon Do Times or Southwest Art?

Reasoning: Let's think step by step in order to produce the answer. We know that "Tae Kwon Do Times" has published articles by Scott Shaw, as mentioned in the context.

Answer: Tae Kwon Do Times

---

Context:
[1] «Rosario Dawson | Rosario Isabel Dawson (born May 9, 1979) is an American actress, producer, singer, comic book writer, and political activist. She made her film debut in the 1995 teen drama "Kids". Her subsequent film roles include "He Got Game", "Men in Black II", "25th Hour", "Rent", "Sin City", "Death Proof", "Seven Pounds", "", and "Top Five". Dawson has also provided voice-over work for Disney and DC.»
[2] «Sarai Gonzalez | Sarai Isaura Gonzalez (born 2005) is an American Latina child actress who made her professional debut at the age of 11 on the Spanish-language ""Soy Yo"" ("That's Me") music video by Bomba Estéreo. Cast as a "nerdy" tween with a "sassy" and "confident" attitude, her performance turned her into a "Latina icon" for "female empowerment, identity and self-worth". She subsequently appeared in two get out the vote videos for Latinos in advance of the 2016 United States elections.»
[3] «Gabriela (2001 film) | Gabriela is a 2001 American romance film, starring Seidy Lopez in the title role alongside Jaime Gomez as her admirer Mike. The film has been cited as an inspiration behind the Premiere Weekend Club, which supports Latino film-making.»

Question: Which American actress who made their film debut in the 1995 teen drama "Kids" was the co-founder of Voto Latino?

Reasoning: Let's think step by step in order to produce the answer. We know that Rosario Dawson made her film debut in the 1995 teen drama "Kids" and is also known for her political activism.

Answer: Rosario Dawson

---

Context:
[1] «Battle of Kursk | The Battle of Kursk was a Second World War engagement between German and Soviet forces on the Eastern Front near Kursk (450 km south-west of Moscow) in the Soviet Union during July and August 1943. The battle began with the launch of the German offensive, Operation Citadel (German: "Unternehmen Zitadelle" ), on 5 July, which had the objective of pinching off the Kursk salient with attacks on the base of the salient from north and south simultaneously. After the German offensive stalled on the northern side of the salient, on 12 July the Soviets commenced their Kursk Strategic Offensive Operation with the launch of Operation Kutuzov (Russian: Кутузов ) against the rear of the German forces in the northern side. On the southern side, the Soviets also launched powerful counterattacks the same day, one of which led to a large armoured clash, the Battle of Prokhorovka. On 3 August, the Soviets began the second phase of the Kursk Strategic Offensive Operation with the launch of Operation Polkovodets Rumyantsev (Russian: Полководец Румянцев ) against the German forces in the southern side of the Kursk salient.»
[2] «Operation Mars | Operation Mars, also known as the Second Rzhev-Sychevka Offensive Operation (Russian: Вторая Ржевско-Сычёвская наступательная операция), was the codename for an offensive launched by Soviet forces against German forces during World War II. It took place between 25 November and 20 December 1942 around the Rzhev salient in the vicinity of Moscow.»
[3] «Kholm Pocket | The Kholm Pocket (German: "Kessel von Cholm" ; Russian: Холмский котёл ) was the name given for the encirclement of German troops by the Red Army around Kholm south of Leningrad, during World War II on the Eastern Front, from 23 January 1942 until 5 May 1942. A much larger pocket was simultaneously surrounded in Demyansk, about 100 km to the northeast. These were the results of German retreat following their defeat during the Battle of Moscow.»

Question: What is the code name for the German offensive that started this Second World War engagement on the Eastern Front (a few hundred kilometers from Moscow) between Soviet and German forces, which included 102nd Infantry Division?

Reasoning: Let's think step by step in order to produce the answer. We know that the German offensive that started the Battle of Kursk was called Operation Citadel (German: "Unternehmen Zitadelle").

Answer: Operation Citadel

---

Context:
[1] «Kerry Condon | Kerry Condon (born 4 January 1983) is an Irish television and film actress, best known for her role as Octavia of the Julii in the HBO/BBC series "Rome," as Stacey Ehrmantraut in AMC's "Better Call Saul" and as the voice of F.R.I.D.A.Y. in various films in the Marvel Cinematic Universe. She is also the youngest actress ever to play Ophelia in a Royal Shakespeare Company production of "Hamlet."»
[2] «Corona Riccardo | Corona Riccardo (c. 1878October 15, 1917) was an Italian born American actress who had a brief Broadway stage career before leaving to become a wife and mother. Born in Naples she came to acting in 1894 playing a Mexican girl in a play at the Empire Theatre. Wilson Barrett engaged her for a role in his play "The Sign of the Cross" which he took on tour of the United States. Riccardo played the role of Ancaria and later played Berenice in the same play. Robert B. Mantell in 1898 who struck by her beauty also cast her in two Shakespeare plays, "Romeo and Juliet" and "Othello". Author Lewis Strang writing in 1899 said Riccardo was the most promising actress in America at the time. Towards the end of 1898 Mantell chose her for another Shakespeare part, Ophelia im Hamlet. Afterwards she was due to join Augustin Daly's Theatre Company but Daly died in 1899. In 1899 she gained her biggest fame by playing Iras in the first stage production of Ben-Hur.»
[3] «Judi Dench | Dame Judith Olivia "Judi" Dench, {'1': ", '2': ", '3': ", '4': "} (born 9 December 1934) is an English actress and author. Dench made her professional debut in 1957 with the Old Vic Company. Over the following few years, she performed in several of Shakespeare's plays in such roles as Ophelia in "Hamlet", Juliet in "Romeo and Juliet", and Lady Macbeth in "Macbeth". Although most of her work during this period was in theatre, she also branched into film work and won a BAFTA Award as Most Promising Newcomer. She drew strong reviews for her leading role in the musical "Cabaret" in 1968.»

Question: Who acted in the shot film The Shore and is also the youngest actress ever to play Ophelia in a Royal Shakespeare Company production of "Hamlet." ?

Reasoning: Let's think step by step in order to produce the answer. We know that the actress in question played Ophelia in a Royal Shakespeare Company production of "Hamlet" and appeared in the short film "The Shore."

Answer: Kerry Condon

---

Context:
[1] «David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory's use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although he was never convicted.»
[2] «Gregory Tarchaneiotes | Gregory Tarchaneiotes (Greek: Γρηγόριος Ταρχανειώτης , Italian: "Gregorio Tracanioto" or "Tracamoto" ) was a "protospatharius" and the long-reigning catepan of Italy from 998 to 1006. In December 999, and again on February 2, 1002, he reinstituted and confirmed the possessions of the abbey and monks of Monte Cassino in Ascoli. In 1004, he fortified and expanded the castle of Dragonara on the Fortore. He gave it three circular towers and one square one. He also strengthened Lucera.»
[3] «David Gregory (mathematician) | David Gregory (originally spelt Gregorie) FRS (? 1659 – 10 October 1708) was a Scottish mathematician and astronomer. He was professor of mathematics at the University of Edinburgh, Savilian Professor of Astronomy at the University of Oxford, and a commentator on Isaac Newton's "Principia".»

Question: What castle did David Gregory inherit?

Reasoning: Let's think step by step in order to produce the answer. We know that David Gregory inherited Kinnairdy Castle.

Answer: Kinnairdy Castle

このような詳細なデモンストレーションを書いていないにもかかわらず、DSPyは非常にシンプルに書かれたプログラム内で、3,000トークンのプロンプトを3ショットの検索拡張生成用にブートストラップし、Chain-of-Thoughtの理由づけを行うことができたことがわかります。

これは、構成と学習の力を示しています。ただ、これは特定のテレプロンプターによって生成されただけであり、各設定で完璧であるとは限りません。DSPyでは、プログラムの品質とコストに関して最適化および検証するための大規模だが体系的な選択肢があることがわかります。

学習されたオブジェクト自体も簡単に調べることができます。

for name, parameter in compiled_rag.named_predictors():
    print(name)
    print(parameter.demos[0])

出力は以下の通りです。

generate_answer
Example({'augmented': True, 'context': ['Tae Kwon Do Times | Tae Kwon Do Times is a magazine devoted to the martial art of taekwondo, and is published in the United States of America. While the title suggests that it focuses on taekwondo exclusively, the magazine also covers other Korean martial arts. "Tae Kwon Do Times" has published articles by a wide range of authors, including He-Young Kimm, Thomas Kurz, Scott Shaw, and Mark Van Schuyver.', "Kwon Tae-man | Kwon Tae-man (born 1941) was an early Korean hapkido practitioner and a pioneer of the art, first in Korea and then in the United States. He formed one of the earliest dojang's for hapkido in the United States in Torrance, California, and has been featured in many magazine articles promoting the art.", 'Hee Il Cho | Cho Hee Il (born October 13, 1940) is a prominent Korean-American master of taekwondo, holding the rank of 9th "dan" in the martial art. He has written 11 martial art books, produced 70 martial art training videos, and has appeared on more than 70 martial arts magazine covers. Cho won several national and international competitions as a taekwondo competitor, and has appeared in several films, including "Fight to Win", "Best of the Best", "Bloodsport II", and "Bloodsport III". He founded the Action International Martial Arts Association (AIMAA) in 1980, and is its President. Cho is a member of both "Black Belt" magazine\'s Hall of Fame and "Tae Kwon Do Times" magazine\'s Hall of Fame.'], 'question': 'Which magazine has published articles by Scott Shaw, Tae Kwon Do Times or Southwest Art?', 'rationale': 'produce the answer. We know that "Tae Kwon Do Times" has published articles by Scott Shaw, as mentioned in the context.', 'answer': 'Tae Kwon Do Times'}) (input_keys=None)

パイプラインの評価

コンパイルされたRAGプログラムを評価できるようになりました。当然ですが、このわずかなデータセットは信頼性のあるベンチマークとなることを意味していません。しかし、説明のために使用するのは有益であると思います。

それでは、予測された回答の正確性(完全一致)を評価してみます。

from dspy.evaluate.evaluate import Evaluate

# `evaluate_on_hotpotqa`関数を設定します。これを以下で何度も使用します。
evaluate_on_hotpotqa = Evaluate(devset=devset, num_threads=1, display_progress=False, display_table=5)

# `compiled_rag`プログラムを`answer_exact_match`メトリックで評価します。
metric = dspy.evaluate.answer_exact_match
evaluate_on_hotpotqa(compiled_rag, metric=metric)

実行結果は下記となります。

Average Metric: 27 / 50  (54.0%)

精度としては小さなデータセットで訓練したにも関わらず良い結果になりました。

リトリーバル(検索能力)の評価

検索の精度を見る事もできます。これには複数の方法がありますが、単純に検索された段落が回答を含んでいるかどうかをチェックできます。

開発セットには検索されるべきゴールドタイトルが含まれているので、これを利用できます。

def gold_passages_retrieved(example, pred, trace=None):
    gold_titles = set(map(dspy.evaluate.normalize_text, example['gold_titles']))
    found_titles = set(map(dspy.evaluate.normalize_text, [c.split(' | ')[0] for c in pred.context]))

    return gold_titles.issubset(found_titles)

compiled_rag_retrieval_score = evaluate_on_hotpotqa(compiled_rag, metric=gold_passages_retrieved)

実行結果は下記となります。

Average Metric: 13 / 50  (26.0%)

こちらもあまり良い精度とは言えません。
このシンプルなcompiled_ragプログラムは質問のかなりの割合を適切に回答できていますが(この小規模セットで50%以上)、リトリーバルの精度ははるかに低くなっています。

これは、LMが多くの場合、質問に回答するためにトレーニングの過程で記憶された知識に依存することを潜在的に示している事が分かります。

このように、DSPyを使用する事で、本来は開発者が試行錯誤すべきプロンプトの訓練なども自動実行する事が出来、更には評価する事も出来るようになります。
今回はColBEARTの公開Wikiデータセットを使用しましたが、この記事の評価が良ければ、ChromaやQdrantといったベクターDBをリトリーバーとして設定し、RAGで検索対象とすべき文章を読み込ませる、より実践的な方法の紹介を行って行きたいと思います。

<< 大規模言語モデルを使いやすくするPythonフレームワーク、DSPyの紹介

関連記事

Webサイト運用の課題解決事例100選 プレゼント

Webサイト運用の課題を弊社プロダクトで解決したお客様にインタビュー取材を行い、100の事例を108ページに及ぶ事例集としてまとめました。

・100事例のWebサイト運用の課題と解決手法、解決後の直接、間接的効果がわかる

・情報通信、 IT、金融、メディア、官公庁、学校などの業種ごとに事例を確認できる

・特集では1社の事例を3ページに渡り背景からシステム構成まで詳解