Agents and Tools in LangChain

Machine Learning Natural Language Processing Artificial Intelligence Digital Transformation Image Processing Reinforcement Learning Probabilistic Generative Modeling Deep Learning Python Navigation of this blog

Introduction

This section continues the discussion of LangChain, as described in “Overview of ChatGPT and LangChain and its use“. In the previous article, we described ChatGPT using GPT model described in “Overview of GPT and examples of algorithms and implementations” and LangChain, a framework for using ChatGPT and LangChain. This time, I would like to describe Agent, which has the ability to autonomously interfere with the outside world and transcend the limits of language models.

Agents and Tools in LangChain

One of the goals of LangChain is to be able to handle tasks that language models like ChatGPT cannot, such as answering questions about information outside the scope of knowledge learned by the language model, or tasks that are logically complex or computationally demanding, etc. The Agent module is a powerful tool to achieve these goals.

To realize such functions, the Agent module has two sub-modules, “Tool” and “Agent”. Each is described below

<Tool>.

Tool” is a module that enables the language model to do what it cannot do on its own, and has various functions to suit different purposes. A tool is an interface that an agent can use to interact with the world, and is actually a combination of the following information

(1) The name of the tool
(2) A description of what the tool is
(3) JSON schema of what the input to the tool is
(4) The function to call
(5) Whether the results of the tool should be returned directly to the user.

The individual Tools are described below. Currently, 61 tools are listed on the official page.

1.Alpha Vantage: Provide real-time and historical financial market data through a set of powerful, developer-friendly data APIs and spreadsheets

2.Apify: It is a cloud platform for web scraping and data extraction, providing an ecosystem of over 1000 ready-made apps called Actors for various web scraping, crawling, and data extraction use cases. For example, it can be used to extract Google search results, Instagram and Facebook profiles, Amazon and Shopify products, Google Maps reviews, etc.

3. ArXiv: ArxivAPIWrapper can be used to retrieve scientific articles and information about articles

4.AWS Lambda: The list of tools provided to agents includes AWS
Lambda can be included to give agents the ability to invoke code running in the AWS cloud for any purpose they need.

5.Bash: Giving agents access to the shell

6.Bearly Code Interpreter: Bearly Code Interpreter allows remote execution of code. This makes it ideal for agent code sandboxing and secure implementations such as code interpreters.

7.Bing Search: Access bing, Microsoft’s web search engine.

8.Brave Search: Visit the Brave Web site to search.

9.ChatGPT Plugins: Use ChatGPT plugin within LangChain abstraction.

10.Connery Action: Connery will be an open source plugin infrastructure for AI Connery makes it easy to create custom plugins containing a set of actions and seamlessly integrate them into LangChain agents Connery handles runtime, authorization, confidentiality, access control, audit logs, and other critical aspects of functionality.

11.Dall-E Image generator: OpenAI Dall-E generates images from prompts synthesized using OpenAI LLM, a text-to-image model developed by OpenAI to generate digital images from natural language descriptions called “prompts” using deep learning methodology.

12.DataForSeo: DataForSeo provides comprehensive SEO and digital marketing data solutions via API

13.duck-duck-go search: duck-duck-go search

14.E2B data analysis: With the E2B Data Analysis Sandbox you can – Run Python code – Generate charts via matplotlib – Dynamically install Python packages while running – Dynamically install system packages while running – Run shell commands – Upload and download files

15.EdenAI:The inclusion of the Edenai tool in the list of tools provided to agents could give agents the ability to perform multiple tasks, including. (1) speech to text, (2) text-to-speech, (3) explicit content detection of text, (4) explicit content detection of images, (5) object detection, (6) analysis of OCR invoices, (7) analysis of OCR IDs

16.Eleven Labs Text2Speech: Interact with ElevenLabs API to achieve text-to-speech functionality

17.Exa search: Exa (formerly Metaphor Search) will be a search engine completely designed for use by LLMs. It will search documents on the Internet using natural language queries and retrieve clean HTML content from the target documents. Unlike keyword-based search (Google), Exa’s neural search capabilities will be able to understand the query semantically and return relevant documents.

18.File System Tools: Accesses files in the PC and reads/writes files in the specified path. Used for primary storage of processing information and reading information.

19.Golden query:The Golden Knowledge Graph is used to provide a series of natural language APIs for querying and enrichment. For example, the following queries: products from OpenAI, generative ai companies with a series of funding, and rappers who invest can be used to retrieve structured data about related entities.

20.Google Cloud Text-to-Speech:Google Cloud Text-to-Speech allows developers to synthesize natural-sounding speech with over 100 voices available in multiple languages and variants.

21.Googledrive: Connect LangChain to Google Drive API.

22.GoogleFinance: Use Google Finance Tools to retrieve information from the Google Finance page.

23.Google Jobs: Use Google Job Tools to retrieve current job postings.

24.Google Lens: Use the Google Lens Tool to obtain information about the image.

25.Google Place: Connect with Google Places API.

26.Google Scholar: Connect with Google Scholar.

27.Google Search: Connect to the Google search component.

28.Google Serper API: Search the web using the Google Serper component.

29.Google Trend: Use the Google Trend tool to obtain trending information.

30.Gradio: gradio-tools will be a Python library to convert the Gradio app into a tool that allows Large Language Model (LLM)-based agents to complete tasks. For example, an LLM could use the Gradio tool to transcribe an audio recording found online and summarize it. Or, another Gradio tool could be used to apply OCR to a Google Drive document and answer questions about it.

31.Graph QL: By including BaseGraphQLTool in the list of tools provided to agents, you can give agents the ability to query data from the GraphQL API for any purpose they need.

32.HuggingFace Hub Tools: Huggingface Tools that support text I/O can be loaded directly using the load_huggingface_tool function.

33.Human as a tool: Since humans are AGI, they can be used as a tool to help AI agents when they are confused.

34.IFTTT WebHooks: To store the data measured by IoT devices in a Google Spreadsheet or to send emails based on the measured data, we use a web service called “IFTTT”. IFTTT stands for “IF This Then That,” which means “if you do this, then do that. By assigning the desired web services to “This” and “That,” respectively, new services can be created without programming.

35.ionic: A plug-and-play e-commerce marketplace for AI assistants that makes it easy for users to get a cut of the deal by including Ionic tools in the agent, giving them the ability to shop and transact directly within the agent.

36.Lemon agent: Lemon Agent helps you build a powerful AI assistant and automate your workflow in minutes by enabling accurate and reliable read and write operations in tools such as Airtable, Hubspot, Discord, Notion, Slack, and Github. It helps. Most connectors available today focus on read-only operations, limiting LLM possibilities, and Lemon AI allows agents to access a clearly defined API for reliable read and write operations.

37.LLMath: TOOL that performs calculations that language models are not good at.

38.memorize: Tweak the LLM itself to remember information using unsupervised learning. This tool requires an LLM that supports fine-tuning. Currently, only langchain.llms import GradientLLM is supported.

39.Nuclia:The Nuclia Understanding API supports processing of unstructured data such as text, web pages, documents, and audio/video content. It extracts all text wherever it is located (using speech-to-text or OCR when necessary), identifies entities, extracts metadata, embedded files (e.g., PDF images), and Web links as well, and also provides content summaries.

40.OpenWeatherMap : Use the OpenWeatherMap component to obtain weather information.

41.Polygon Stock Market API: The Polygon Polygon.io Stocks API provides a REST endpoint that can query the latest market data from all US stock exchanges.

42.PubMed:PubMed consists of more than 35 million citations of biomedical literature from MEDLINE, life science journals, and online books. Citations include links to full-text content from PubMed Central and publisher websites.

43.Python REPL:In some cases, for complex calculations, it may be better for the LLM to generate code that calculates the answer and then execute that code to get the answer, rather than for the LLM to generate the answer directly. To easily do that, we provide a simple Python REPL to execute the commands. This interface returns only what is printed. Therefore, if you use it to compute an answer, be sure to print the answer.

44.Reddit search:Connect to the Reddit search tool.

45.Requests: Sends a request to a specified URL. Mainly retrieve information from a website or from an API. There is a subreddit dedicated to that topic. By accessing it, you can find out what people who are interested in the topic, i.e., those who may be included in the target market, are talking about. The Web contains a lot of information that LLMs do not have access to, and to make it easier for LLMs to interact with that information, we provide a wrapper for a Python request module that takes in a URL and retrieves data from that URL.

46.SceneXplain:SceneXplain is an ImageCaptioning service accessible through the SceneXplain tool.

47.Search Tools:Use of different search tools.

48.SearchApi:Search the Web using SearchApi.

49.SearxNG Search API : Search the web using the self-hosted SearxNG search API.

50.SerpAPI: It works with a web service called SerpApi, which performs Goole and Yahoo searches from its API, and searches the web.

51.semantic scholar API:Using semantic scalar tools with agents

52.SQL database:utilities to access SQLite databases.

53.StackExchange:Use of stack replacement components.

54.Tavily Search:Tavily’s Search API is a search engine built specifically for AI agents (LLMs) to quickly deliver real-time, accurate, fact-based results.

55.Twilio:Twilio Messaging Channels facilitates integration with third-party messaging apps and allows users to send messages through WhatsApp Business Platform (GA), Facebook Messenger (public beta), and Google Business Messages (private beta).

56. Wikidata: Wikidata is a free and open knowledge base that can be read and edited by both humans and machines. Wikidata will be one of the largest open knowledge bases in the world.

57.Wikipedia : It is a free and open knowledge base that can be read and edited by both humans and machines. Wikidata will be one of the largest open knowledge bases in the world.

58.Wolfram Alpha: Use of Wolfram Alpha components.

59.Yahoo Finance News:Use the yahoo_finance_news tool with your agent.

60.Youtube: The YouTube search package searches YouTube videos using a significantly rate-limited API.

61.Zapier Natural Language Actions API: With Zapier Natural Language Actions, you can access over 5k apps and 20k actions on the Zapier platform via a natural language API interface NLA supports Gmail, Salesforce, Trello, Slack, Asana, HubSpot, Google Sheets, Microsoft Teams, and thousands of other apps.

<Agent>

The Agent functions as a robot that uses different tools depending on the prompt and automatically generates a solution. The basic steps are as follows

1. receive a task from the user
2. decide which of the available tools to use and what information to input
3. obtain results using the Tool
4. verify from the results obtained in step 3 whether the task has been accomplished
Repeat steps 2 to 4 until the Agent judges that the task has been accomplished.

At present, there are four types of Agents.

1. zero-shot-react-description: A zero-shot-react-description is “an Agent that decides which tool to use based on the description of the tool, etc.” To use this tool, the description of each tool must be well written. In order to use this tool, the description of each tool must be well written. 2. react-docstore: react-docstore

2. react-docstore: react-docstore is “an Agent specialized in handling documents,” specifically, “the Search tool, which searches for documents themselves,” and “the Lookup tool, which searches for terms within documents.

3. self-ask-with-search: self-ask-with-search is “an Agent that factually looks up answers to questions,” using tools such as Google’s search API to generate answers to questions by using intermediate answers that serve as some kind of basis, such as search results.

4. conversational-react-description: conversational-react-description is “an Agent that specializes in handling conversations” that generates optimal answers using past exchanges on Chat.

Here, “react” means “Reasoning + Acting” as described in “Overview of ReAct (Reasoning and Acting) and Examples of its Implementation“.

Example of Tool and Agent implementation

The following code is an Agent with a Tool to access URLs. Please refer to the previous “”Overview of ChatGPT and LangChain and their use”” for details on setting up the environment and the necessary libraries.

from langchain.agents import AgentType, initialize_agent, load_tools
from langchain_community.chat_models import ChatOpenAI

chat = ChatOpenAI(
    temperature=0,  #← Set temperature to 0 to reduce output diversity
    model="gpt-3.5-turbo"
)

tools = load_tools(  #← Load the Tool provided in LangChain.
    [
        "requests",  #← Load requests, a Tool that allows you to retrieve results for a specific URL.
    ]
)

agent = initialize_agent(  #← Initialize Agent
    tools=tools,  #← Sets the array of Tools that can be used by the Agent
    llm=chat,  #← Specify the language model used by the Agent
    agent=AgentType.CHAT_ZERO_SHOT_REACT_DESCRIPTION,  #←Set to work with the ReAct method
    verbose=True  #← Displaying the log during execution
)

result = agent.run("""Please access the following URL to find out the weather in Tokyo and answer in Japanese.
https://www.jma.go.jp/bosai/forecast/data/overview_forecast/130000.json
www.jma.go.jp
""")

print(f"Execution Result: {result}")

このコードを実行することにより、以下のような出力を得る。

> Entering new AgentExecutor chain...
Question: Please access the following URL and find out the weather in Tokyo in Japanese.
Thought: I should use the requests_get tool to access the provided URL and retrieve the weather information for Tokyo.
Action:
```
{
  "action": "requests_get",
  "action_input": "https://www.jma.go.jp/bosai/forecast/data/overview_forecast/130000.json"
}
```

Observation: {"publishingOffice":"気象庁","reportDatetime":"2024-02-29T10:45:00+09:00",
"targetArea":"東京都","headlineText":"","text":"　東日本は高気圧に覆われています。
一方、東シナ海には低気圧があって、北東へ進んでいます。nn　東京地方は、曇りや晴れとなっています。
nn　２９日は、はじめ高気圧に覆われますが、低気圧が西日本の南岸を東北東へ進み、湿った空気の影響を
受ける見込みです。このため、晴れのち曇りで夜は雨となるでしょう。伊豆諸島では、雨で雷を伴う所がある見込み
です。nn　３月１日は、低気圧が東日本の南岸から日本の東へ進み、次第に西高東低の気圧配置となる見込みです。
このため、曇りで明け方まで雨となるでしょう。伊豆諸島では、雨で、雷を伴って激しく降る所がある見込みです。nn
【関東甲信地方】n　関東甲信地方は、晴れや曇りとなっています。nn　２９日は、はじめ高気圧に覆われますが、
低気圧が西日本の南岸を東北東へ進み、湿った空気の影響を受ける見込みです。このため、曇りや晴れで、夕方から
次第に雨や雪となり、雷を伴う所があるでしょう。nn　３月１日は、低気圧が東日本の南岸から日本の東へ進み、
次第に西高東低の気圧配置となる見込みです。このため、はじめ雨や雪となり、雷を伴い激しく降る所があるでしょう。
その後は、曇りや晴れで、関東地方北部や長野県では雪の降る所がある見込みです。nn　関東地方と伊豆諸島の海上では、
２９日から３月１日にかけて、うねりを伴いしけとなるでしょう。船舶は高波に注意してください。"}
Thought:Final Answer: 東京地方は、曇りや晴れとなっています。２９日は、はじめ高気圧に覆われますが、
低気圧が西日本の南岸を東北東へ進み、湿った空気の影響を受ける見込みです。このため、晴れのち曇りで夜は
雨となるでしょう。伊豆諸島では、雨で雷を伴う所がある見込みです。３月１日は、低気圧が東日本の南岸から
日本の東へ進み、次第に西高東低の気圧配置となる見込みです。このため、曇りで明け方まで雨となるでしょう。
伊豆諸島では、雨で、雷を伴って激しく降る所がある見込みです。

> Finished chain.
実行結果: 東京地方は、曇りや晴れとなっています。２９日は、はじめ高気圧に覆われますが、
低気圧が西日本の南岸を東北東へ進み、湿った空気の影響を受ける見込みです。このため、
晴れのち曇りで夜は雨となるでしょう。伊豆諸島では、雨で雷を伴う所がある見込みです。
３月１日は、低気圧が東日本の南岸から日本の東へ進み、次第に西高東低の気圧配置となる見込みです。
このため、曇りで明け方まで雨となるでしょう。伊豆諸島では、雨で、雷を伴って激しく降る所がある見込みです。

Here, the “request” Tool of Langchain is specified to obtain information from a specific URL. Also, since verbose in the code is set to True, logs are output during processing.

Next, a sample that combines multiple Tools is shown. In this sample, a service called “SerPAPI” is used to obtain search results from Google and other search engines via API (SerpApi (https://serpapi.com/にてsubscribeしてAPIキーを取得して) must be set in the environment variable in advance. Also, you need to install a library for handling google search results (pip install google-search-result)) and output the results using a tool called “File”.

from langchain.agents import AgentType, initialize_agent, load_tools  #←Add import load_tools
from langchain.chat_models import ChatOpenAI
from langchain.tools.file_management import WriteFileTool  #←Importing Tools that can write files

chat = ChatOpenAI(
    temperature=0,
    model="gpt-3.5-turbo"
)

tools = load_tools(
    [
        "requests_get",
        "serpapi" #←Add serpapi
    ],
    llm=chat
)

tools.append(WriteFileTool( #←Added a Tool that can write files.
    root_dir="./"
))

agent = initialize_agent(
    tools,
    chat,
    agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,  #←Change Agent type
    verbose=True
)

result = agent.run("Please look up the specialty products of Hokkaido and save them in Japanese in a file named result.txt.") #←Instructs the user to save the execution results to a file

print(f"実行結果: {result}")

When this code is run, the following results are obtained.

> Entering new AgentExecutor chain...
Thought: I will use the Search tool to find information about Hokkaido's 
famous products in Japanese and then save the results in a file named "result.txt".

Action:
```
{
  "action": "Search",
  "action_input": "北海道の名産品"
}
```
Observation: ['北海道には、鮭やホタテ、昆布などの海産物、牛肉や乳製品、じゃがいもなどの農産物ともに、
日本国内トップクラスの生産量を誇る特産品がたくさんあります ...', '北海道の名産品・グルメ人気お取り寄せ
ランキング · 北海道産 カニ · 北海道産 鮭 · 北海道産 山わさび · 北海道産 米 · 北海道産 メロン · 
北海道産 じゃがいも.', '北海道はオホーツク海・日本海・太平洋という3種類の海に囲まれているため、
各地で獲れる海産物の種類が豊富です。 さらに、北海道は土地が広いので、牧場も多く、新鮮な乳製品も有名な
特産品です。 北海道はカニの漁獲量が日本一です。 タラバガニ、ズワイガニ、毛ガニ、花咲ガニなど、
さまざまな種類の ...', '北海道の名物グルメ5選【ご当地グルメ】 · ジンギスカン · ザンギ · スープカレー
 · エスカロップ · 豚丼. 今や全国区で知られる豚丼です ...', '豊かな食材と各地の風土が生んだ「ラーメン」
u200bu200b 北海道三大ラーメンといえば札幌の「味噌ラーメン」、函館の「塩ラーメン」、旭川の
「醤油ラーメン」。 このほかにも、釧路や富良野、稚内、帯広など、道内の各地に地域の風土にマッチした麺や
スープを楽しめる、さまざまなご当地ラーメンが存在します。', '北海道のおすすめのご当地グルメや名物・名産品、
お土産をご紹介します。ちゃんちゃん焼き、ルイベ、阿寒のアイヌ料理など情報満載。', '北海道グルメの代表格と
言えばウニやイクラをはじめとする新鮮な海産物。 それらを使った丼や寿司、郷土料理などを楽しみたい。 
回転寿司店も侮ることなかれ、北海道はレベルが高いのだ。 本場のジンギスカンや出来立ての乳製品、ラーメンも
外せない。', '北海道グルメ④【ご当地グルメ】 · 郷土料理「ザンギ」 · ソウルフード「ジンギスカン」 · 
冬の定番鍋「石狩鍋」 · 帯広名物「豚丼」 · 釧路名物「 ...', '函館真昆布 春採り昆布(食べる真昆布) 
ガゴメ昆布 スルメイカ 活〆戸井マグロ 一本釣り活〆戸井ブリ 一本釣り活〆えさん真鱈 海峡根ボッケ「バキバキ」
 鮭 ...']
Thought:I have found information about Hokkaido's famous products in Japanese. Now,
 I will save this information in a file named "result.txt".

Action:
```
{
  "action": "write_file",
  "action_input": {
    "file_path": "result.txt",
    "text": "北海道には、鮭やホタテ、昆布などの海産物、牛肉や乳製品、じゃがいもなどの農産物ともに、
日本国内トップクラスの生産量を誇る特産品がたくさんあります ... 北海道の名産品・グルメ人気お取り寄せ
ランキング · 北海道産 カニ · 北海道産 鮭 · 北海道産 山わさび · 北海道産 米 · 北海道産 メロン · 
北海道産 じゃがいも. 北海道はオホーツク海・日本海・太平洋という3種類の海に囲まれているため、
各地で獲れる海産物の種類が豊富です。 さらに、北海道は土地が広いので、牧場も多く、新鮮な乳製品も
有名な特産品です。 北海道はカニの漁獲量が日本一です。 タラバガニ、ズワイガニ、毛ガニ、花咲ガニなど、
さまざまな種類の ... 北海道の名物グルメ5選【ご当地グルメ】 · ジンギスカン · ザンギ · スープカレー · 
エスカロップ · 豚丼. 今や全国区で知られる豚丼です ... 豊かな食材と各地の風土が生んだ「ラーメン」
u200bu200b 北海道三大ラーメンといえば札幌の「味噌ラーメン」、函館の「塩ラーメン」、旭川の
「醤油ラーメン」。 このほかにも、釧路や富良野、稚内、帯広など、道内の各地に地域の風土にマッチした麺や
スープを楽しめる、さまざまなご当地ラーメンが存在します。 北海道のおすすめのご当地グルメや名物・名産品、
お土産をご紹介します。ちゃんちゃん焼き、ルイベ、阿寒のアイヌ料理など情報満載。 北海道グルメの代表格と
言えばウニやイクラをはじめとする新鮮な海産物。 それらを使った丼や寿司、郷土料理などを楽しみたい。 
回転寿司店も侮ることなかれ、北海道はレベルが高いのだ。 本場のジンギスカンや出来立ての乳製品、ラーメンも
外せない。 北海道グルメ④【ご当地グルメ】 · 郷土料理「ザンギ」 · ソウルフード「ジンギスカン」 · 
冬の定番鍋「石狩鍋」 · 帯広名物「豚丼」 · 釧路名物「 ... 函館真昆布 春採り昆布(食べる真昆布) 
ガゴメ昆布 スルメイカ 活〆戸井マグロ 一本釣り活〆戸井ブリ 一本釣り活〆えさん真鱈 海峡根ボッケ「バキバキ」
 鮭 ...",
    "append": false
  }
}
```

Observation: File written successfully to result.txt.
Thought:{
  "action": "Final Answer",
  "action_input": "I have successfully saved information about Hokkaido's famous products in Japanese to a file named 'result.txt'."
}

> Finished chain.
実行結果: {
  "action": "Final Answer",
  "action_input": "I have successfully saved information about Hokkaido's famous products in Japanese to a file named 'result.txt'."
}

If it works without problems, the result is output to the result.txt file.

Finally, we will describe a sample code for adding a tool of your own creation. You can use your own tool by defining its name, function description, and execution function as follows.

tool{
   name="Name",
   description="Function Description",
   func=executable function
}

This method is similar to WoT, an IOT technology that uses Semantic Web technology, or Semantic Web services, as described in “Semantic Web Technology.

The sample code is shown below. Here, we have defined a homebrew tool that generates random numbers.

import random  #←Import modules needed to generate random numbers
from langchain.agents import AgentType, Tool, initialize_agent  #←Import Tool
from langchain.chat_models import ChatOpenAI
from langchain.tools import WriteFileTool

chat = ChatOpenAI(
    temperature=0,
    model="gpt-3.5-turbo"
)

tools = [] #← Delete other tools once they are no longer needed.

tools.append(WriteFileTool( 
    root_dir="./"
))

def min_limit_random_number(min_number): #←Function to generate a random number for which a minimum value can be specified
    return random.randint(int(min_number), 100000)


tools.append(  #←Add tool
    Tool(
        name="Random",  #←Tool Name
        description="It can generate random numbers above a certain minimum.",  #←Tool Description
        func=min_limit_random_number  #←Function called when the tool is executed
    )
)

agent = initialize_agent(
    tools,
    chat,
    agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,  
    verbose=True
)

result = agent.run("Generate 10 or more random numbers and save them in a file named random.txt.")

print(f"Execution Result: {result}")

These can be used to easily construct very powerful solutions. However, these chains are directed acyclic graphs (DAGs) and do not have loops. To construct more complex and controlled flows, it is necessary to construct loops using state machines such as those described in “Overview and Implementation of Automata Theory, Reference Book” and “Overview and Implementation of Finite State Machines (FSM), Reference Book“.

The multi-agent approach using these state machines will be discussed in the next section.

Reference Information and Reference Books

“LangChain Crash Course: Build OpenAI LLM powered Apps: Fast track to building OpenAI LLM powered Apps using Python (English Edition)“

“Mastering LangChain: From Beginner to Expert“