Generative machine learning tools text-generation-webui and AUTOMATIC1111
There are open source tools such as text-generation-webui and AUTOMATIC1111 that allow codeless use of generative modules such as ChatGPT described in “Overview of ChatGPT and LangChain and their use” and Stable Diffusion described in “Stable Diffusion and LoRA Applications“. In this article, we describe how to use these text generation/image generation modules.
First, let’s look at text-generation-webui.
text-generation-webui
The “Text generation web UI” is a tool that makes it easy to use language models such as GPT and LLaMA with a web app-like UI. By using this tool, you can easily download new models and switch between multiple models.
First, set up the homebrew environment, see “Getting started with Clojure (1) Setting up the environment (spacemacs and leiningen)” for details. 3.10 python is required. pyenv is used to maintain puthon’s version. pyenv is After installing with homebrew (brew install pyenv), make sure you see the version through the path (.zshrc, .bash_profile, etc.) (pyenv –version), pyenv install 3.10.xx to Install and change with pyenv global 3.10.xx to finish setting up the environment.
Then copy from git and go to the top of the folder
Download and unzip the above zip and double click on “start”. They will install the web UI and all dependencies in the same folder. After that, you can use it as in the case of mac by running “start_windows.bat” in the downloaded oobabooga-windows folder.
GPT: GPT (Generative Pre-trained Transformer) will be a language model based on the Transformer architecture. GPT is pre-trained on large data sets and has the ability to predict the next word or sentence. GPT-4 is the most famous version and has demonstrated high performance on many tasks. See detail in “Overview of GPT and examples of algorithms and implementations“
DialoGPT: DialoGPT is an interactive language generation model and will be developed by OpenAI. DialoGPT is based on the GPT (Generative Pre-trained Transformer) architecture. DialoGPT is trained on a large dataset, understands the dialogue context, and can generate responses; the model considers previous utterances and context, and generates a sequence of tokens to generate responses. It is also possible to output multiple candidate responses to a single dialogue turn.
BERT: BERT (Bidirectional Encoder Representations from Transformers) is a language model that uses bidirectional transformer encoders. BERT is widely used, especially for natural language processing tasks. For more information on BERT, see “BERT Overview, Algorithms, and Example Implementations“.
XLNet: XLNet will be based on the Transformer architecture, a model that can be trained using both bidirectional and forward information. This allows the language model to handle context more flexibly and make more accurate predictions.
T5: T5 (Text-to-Text Transfer Transformer) is a model that can be applied to a variety of natural language processing tasks, such as machine translation, summarization, question answering, and document classification. T5 can be applied to a variety of natural language processing tasks such as machine translation, summarization, question answering, and document classification.
PaLM: PaLM is a model released by Google in 2022, which stands for “Pathways Language Model. is 540 billion. (OpenAI’s GPT-3, which was released before PaLM, has 170 billion parameters.)
LLaMA: LLaMA (Large Language Model Meta AI) is a large-scale language model released by Meta in February 2023. Because LLaMA achieves high accuracy while keeping the number of parameters low, researchers around the world can explore the possibilities of various large-scale language models based on LLaMA.
OpenFlamingo : The model “Flamingo” developed by DeepMind was open-sourced by the German non-profit organization LAION.
Vicuna 13B : Open source chatbot based on LLaMA and trained on ChatGPT and user conversations, with performance close (90%) to ChatGPT despite $300 training cost.
Alpaca 7B : Fine tuning using the results of instruction-following (generate your own training data) based on LLaMA
NEMO LLM : A large-scale language model developed by NVIDIA that, like GPT-4, supports document generation, image generation, translation, coding, etc.
Claude : Founded by engineers who were involved in the development of GPT-2/3 at OpenAI, the model developed there
AUTOMATIC1111
The AUTOMATIC1111 version of Stable Diffusion Web UI will be a fork of the most feature-rich “Stable Diffusion” that has been developed as open source. In addition to easy operation through the Web UI, it includes almost all features, such as additional training models, use of additional training such as LoRA, face restoration using GFPGAN, and high quality image enhancement.
The installation procedure is described below.
<Starting up on a Mac>
First, launch the Homebrew environment as well as Text generation web UI. Next, install the necessary tools and AUTOMATIC1111 from git using the following commands.
This will create a stable-diffusion-webui folder in your home folder. Next, download the training model (e.g. stable-diffusion-v-1-4-original) and move it to the stable-diffusion-webui/models/Stable-diffusion folder.
Now go to the stable-diffusion-webui folder and run the script “webui.sh”, which will open the browser’s http://127.0.0.1:7860/.
All that remains is to enter the prompt, decide on the options and select “Genegate”, then wait a few moments and the image will be generated and output.
<Starting up Windows>
Windows can be started up in much the same way as the procedure for macs.
コメント