Clojure and Python integration and machine learning

Machine Learning Natural Language Processing Artificial Intelligence Digital Transformation Image Processing Reinforcement Learning Probabilistic Generative Modeling Deep Learning Python Clojure Navigation of this blog

About Clojure

Clojure has a framework that connects across various languages. For example, ClojureScript (cljs) has a mechanism to compile cljs code into bare Javascript using google closure (similar to current AltJavascript such as typescript). It also has a mechanism to convert CSS as well, so that the front end can be written using only Clojure.

This mechanism and the fact that various libraries of Java, the base platform of Clojure ((50-60% of existing systems are built on Java), can be used natively in Clojure, and that Clojure itself is also an excellent framework (as used in microservices as described below). Clojure itself has excellent frameworks (datomic, pedestal, duct, etc.) that can be used for microservices, which will be discussed later, and so it is widely used worldwide, especially in the field of building web systems (examples in Japan are limited to some advanced companies).

Machine learning using Clojure in conjunction with Python and R

In contrast, in the area of machine learning, environments with rich libraries such as Python and R are used and have become almost de facto. However, it was not at a level where the user could freely use the libraries of the other party, and there were hurdles in making full use of the latest algorithms.

In contrast, in recent years (since 2018), frameworks that can interoperate with the Python environment, such as libPython-clj, have appeared, and mathematical frameworks that utilize Java and C libraries, such as fastmath, deep learning framework Cortex, Deep The development of frameworks such as fastmath, a mathematical framework that utilizes Java and C libraries, and deep learning frameworks such as Cortex and Deep Diamond has led to active discussions on approaches to machine learning, such as scicloj.ml, a well-known machine learning community on Clojure.

Furthermore, the fourth generation AI technology, which is attracting attention as the next generation AI technology, has been proposed to fuse the third generation technology of deep learning with the first and second generation technologies of knowledge and symbolic reasoning, and Clojure, whose ancestor is Lisp used for the first and second generation knowledge and symbolic reasoning technologies, has The possibility of building fourth-generation AI technology on a single platform by having machine learning technology fused with Pyhton and other technologies is expected, and is being actively studied mainly in Europe and the United States.

Implementation with libpython-clj and use of Python library in Clojure

In this article, we describe the launch and use of libpython-clj.

libpython-clj was developed to integrate Python into Clojure at a deep level. This means that they want to be able to load/use Python modules as if they were in the Clojure namespace, and also to be able to extend Python objects using Clojure. Their detailed vision can be found in their talk at Clojure Conj 2019.

More information can be found on their git page; the examples page includes GPT2 text generation from hugging-face, MXNet MNIST classification using the Module API, Pytorch MNIST, Matlib PyPlot, NLTK, SpaCy, Sci SpaCy, Seaborn, UMAP, TRIMAP, Igraph, Leiden, Sklearn, Facebook Prophet, Pygal, Bokeh, OpenCV, psutildiffprivlb, transformers  are described.

The specific implementation is described below. First of all, please refer to the following page for information on setting up a development environment for Clojure, and to install libpython-clj on M1 mac, you need to use JDK17. The default installation is JDK18, but there seems to be some errors, and the author of libpython-clj says he will support JDK18 in the next 19 years. For example, if you use brew, you can install JDK17 by “brew install openjdk@17” and pass it through, or you can install it using jenv and switch to it.

Next, create a new project with leiningen.

lein new linpython-test01

The project file should look like this

;; project.clj
(defproject libpython-clj-test01 "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :license {:name "EPL-2.0 OR GPL-2.0-or-later WITH Classpath-exception-2.0"
            :url "https://www.eclipse.org/legal/epl-2.0/"}
  :dependencies [[org.clojure/clojure "1.11.1"]
                 [clj-python/libpython-clj "2.018"]
                 ]
  :repl-options {:init-ns libpython-clj-test01.core}
  :jvm-opts ["--add-modules" "jdk.incubator.foreign"
                       "--enable-native-access=ALL-UNNAMED"])

The latest version of the libpython-clj library is “2.018”. Note that the jvm option “:jvm-opts [“–add-modules” “jdk.incubator.foreign” “–enable-native-access=ALL-UNNAMED”]” must be added.

See “Development Environment of Clojure in SublimeText 4” and so on for the REPL development of Clojure.

The namespace settings and configurations are as follows.

(ns libpython-clj-test01.core
  (:require 
      [libpython-clj2.python :as py :refer [py. py.. py.-]]
      [libpython-clj2.require :refer [require-python]])) 

(py/initialize! :python-executable "/Users/XXX/python3"
                :library-path "/Users/XXX/lib/python3.9")

Set up the python environment in “py/initialize!…”. First, “:python-executable “/Users/xxx/python3” will be displayed as “which python3”. Copy the path of the python to run. Next, for the library path “:library-path “/Users/xxx/lib/python3.9”, after running python once, check the library path with “import sys””print(sys.path)” and copy it.

First, start with Hello World.

(py/run-simple-string "print ('Hello World!')")  ;;Hello World!

When the REPL is applied, the python code “print(‘Hello Wolrd!’)” is executed and “Hello World!

Next, import and run numpy.

(require-python '[numpy :as np])   ;ok

(py/from-import numpy average)     ;;#'libpython-clj-test01.core/average

(average [1, 8, 4, 10])            ;;5.75

For a more advanced library, zero-shot-classification using a generic architecture for natural language understanding and generation (BERT, GPT-2, etc.) and transformers described in Overview of Transformer Models, Algorithms, and Examples of Implementations, a library that provides thousands of pre-trained, smear models.

(require-python '[transformers :bind-ns])

(def classifier (py. transformers "pipeline" "zero-shot-classification"))

(def text "French Toast with egg and bacon in the center with with maple syrup on top. 
           Sprinkle with powdered sugar if desired.")

(def labels ["breakfast" "lunch" "dinner"])

(classifier text labels) ;;{'sequence': 'French Toast with egg and bacon in the center with with maple syrup on top. Sprinkle with powdered sugar if desired.',
                         ;; 'labels': ['breakfast', 'lunch', 'dinner'], 
                         ;;'scores': [0.9893278479576111, 0.00738490978255868, 0.0032872725278139114]}

If you enter the text “French Toast with egg and bacon in the center with maple syrup on top. Sprinkle with powdered sugar if desired.” and the category [“breakfast” “lunch” “dinner”], the classification is calculated using the original facebook/bart-large-mnli model and determined to be Breakfast.

The next application of “lime” introduced as a machine learning tool that can be explained (explaining the aforementioned TRANSFORMER data) is as follows.

(require-python '[lime.lime_text :as lime])

(require-python 'numpy)

(def explainer (lime/LimeTextExplainer :class_names labels))

(defn predict-probs
  [text]
  (let [result (classifier text labels)
        result-scores (get result "scores")
        result-labels (get result "labels")
        result-map (zipmap result-labels result-scores)]
    (mapv (fn [cn]
            (get result-map cn))
          labels)))

(defn predict-texts
  [texts]
  (println "lime texts are " texts)
  (numpy/array (mapv predict-probs texts)))

(predict-texts [text])

(def exp-result
  (py. explainer "explain_instance" text predict-texts
       :num_features 6
       :num_samples 100))

(py. exp-result "save_to_file" "explanation.html")

As a result, the following are obtained

The text “French Toast with egg and bacon in the center with maple syrup on top. Sprinkle with powdered sugar if desired. Breakfast”, “lunch”, “dinner”] and the category [“breakfast”, “lunch”, “dinner”], the result was “breakfast”, but when the lime was used to classify them, the most influential word was “Toast”.

 

コメント

タイトルとURLをコピーしました