Statistical learning by linking Clojure and R

Machine Learning Technology   Artificial Intelligence Technology  R language  Digital Transformation Technology   Deep Learning  Stochastic Generative Models    Clojure

Clojure has a framework that connects across various languages. In the previous article, we discussed frameworks for linking Python and Clojure from a machine learning perspective, and described an implementation of the latest python machine learning library running on Clojure using these frameworks.

There are several tools for accessing R libraries from Clojure, which are summarized in the following links.

Each tool has (a) API and parsing provided, (b) type of R backend used (JRI+REngine / Rserve+REngine / Opencpu / Run R from a shell, etc.), (c) whether Clojure concepts are used as equivalent to R “data frames” or “matrices”, and (d) how to use Clojure’s concepts to access the R library. Is the concept of Clojure used, and if so, what kind of concept.

  • First, Rincanter, which was introduced by Joel Boehland allow in early 2010, uses JRI (R from Java using JNI) for the R backend and R for Clojure, the then-popular R for data abstraction. Incanter, a then-popular R-inspired library, is used for data abstraction. (2012)
  • This Rincanter fork by Vladimir Kadychevski changes the R backend to Rserve via a REngine layer on the Java side that supports both Rserve and JRI as backends. This change allows R to run in a separate process, making R-interop more robust and production-friendly. (2015)
  • This ongoing fork by the skm-ice group continued work on the API. (not currently available).
  • Rojure, presented by Carsten Behring, changed the data abstraction from Incanter to core.matrix. At the time, core.matrix was becoming the standard data abstraction layer used by several libraries and eventually included Incanter itself. This allowed Rojure to be completely independent of Incanter while still being available from Incanter. (2017)
  • clj-jri by SAWADA Takahiro / Gugen Koubou LLC will be one of the simple R wrappers using JRI. does not support data frame-like structures on the Clojure side. (2014)
  • rashinban by Takahiro Noda is another simple Clojure library for calling R through Rserve, with a clean and simple API based on wrapping R functions in Clojure functions. It does not support concepts like data frames. (2015)
  • Opencpu-clj will be another early project by Carsten Behring, using Opencpu for the R backend. (Some experiments to generalize to Clojuresctipt have been done here, but are now broken.) (2016)
  • gg4clj by Jony Hudson will be the immediate follow-up to Opencpu-clj. Its main purpose is a wrapping of R’s famous “graphic grammar” ggplot2 library (but actually introduces some ideas that can be applied to more general use). This is accomplished by performing every calculation in R as a completely new process. An interesting innovation of this library will be the EDN-like syntax that will be translated into R. (2014)
  • huri by Simon Belak is a general library for data science that does much more than just call R. One of its components is a collection of functions for data visualization, building on the methods provided by gg4clj and adding a simple and compaqe way to create ggplot2 plots in Clojure. (2017)
  • graalvm-interop is a project by David Pham that will allow GraalVM to interoperate with FastR. (2019)
  • clojisr in sciloj is a relatively new tool that uses R’s server, Rserve, to connect to Clojure. (2022)

Among these, we will discuss Clojisr and Rojure, which are relatively stable and available.

First, let’s talk about Clojisr. It rhymes with “kisser. Since it is still in its infancy, even the git page says it is stil evolving and not recommended for production use. However, it is already being used in some scientific day sciences. The final goal of the development is to make it a beginner-friendly tool in the Clojure ecosystem, but the technical goal is to combine the best of the tools that have been released so far.

The first step is to install Rserve, the server version of R. To do so, the R application must be installed on the command line. To do so, run the R application on the command line and install “Rserve.

>install.packages("Rserve", repos="http://cran.rstudio.com/")

Next, start Rserve on the command line

>R CMD Rserve
R version 4.2.1 (2022-06-23) -- "Funny-Looking Kid"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: aarch64-apple-darwin21.5.0 (64-bit)

R is free software and "completely free of warranty. 
You are free to redistribute it under certain conditions. 
For more information on distribution terms, type 'license()' or 'licence()'. 
R is a collaborative project with many contributors. 
Type 'contributors()' for more information. 
Also enter 'citation()' for the format in which R and R packages are cited in publications. 
Type 'demo()' to see a demo. 
Type 'help()' to get online help. 
You can get help in an HTML browser by typing 'help.start()'. 
Type 'q()' to exit R.

Rserv started in daemon mode.
>

If Rserve is running, the following command will tell you the server port

ps ax | grep Rserve 
 3021   ??  Ss     0:00.00 /opt/homebrew/Cellar/r/4.2.1_2/lib/R/bin/Rserve
 3059 s001  R+     0:00.00 grep Rserve

To stop Rserve, enter the following command

> kill 3021

Next, create a template file by typing “lein new cljisr-tes01” in a terminal in any directory, and add [scicloj/clojisr “1.0.0-BETA20”] and the machine learning data library, [ techascent/tech.ml.dataset “6.094”].

(defproject cljisr-test01 "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :license {:name "EPL-2.0 OR GPL-2.0-or-later WITH Classpath-exception-2.0"
            :url "https://www.eclipse.org/legal/epl-2.0/"}
  :dependencies [[org.clojure/clojure "1.11.1"]
                 [scicloj/clojisr "1.0.0-BETA20"]
                 [techascent/tech.ml.dataset "6.094"]]
  :repl-options {:init-ns cljisr-test01.core}
  :jvm-opts ["-Dclojure.tools.logging.factory=clojure.tools.logging.impl/jul-factory"])

Let’s actually try to work with R. First, let’s make it work by assigning the R code itself.

(ns cljisr-test01.core)

(require '[clojisr.v1.r :as r :refer
            [r eval-r->java r->java java->r java->clj java->native-clj
             clj->java r->clj clj->r ->code r+ colon require-r]]
          '[clojisr.v1.robject :as robject]
          '[clojisr.v1.session :as session]
          '[tech.v3.dataset :as dataset])


(r "mean(rnorm(100000,mean=1.0,sd=3.0))").  ;; 0.9851342

(r
   "abc <- runif(1000);
          f <- function(x) {mean(log(x))};  ;; -1.001936
          f(abc)")

Next, try combining Clojure and R operations.

(def x (r "1+2"))

(->> x
      r->clj).  ;; [3.0]

(def f (r "function(x) x*10"))

(->> 5
      f
      r->clj).    ;; [50.0]

(->> "5*5"
      r
      f
      r->clj).    ;; [250.0]

The first function (r “1+2”) evaluates 1+2 in r and the answer is 3.0. The next function evaluates “function(x) x*10” in r and assigns 5 in Clojure, so the answer is 50.0. The last function evaluates “5+5” in r and assigns it to f, so the answer is 250.

Finally, a plot is performed. Both normal plot and ggplot generate a svg file and output it to the out folder. The necessary libraries are installed (library() in R) using require-r, and must be installed beforehand using install.packages.

(require-r '[graphics :refer [plot hist]])
 (require-r '[ggplot2 :refer [ggplot aes geom_point xlab ylab labs]])
 (require '[clojisr.v1.applications.plotting :refer
            [plot->svg plot->file plot->buffered-image]])

(spit "out/out-test01.svg" (plot->svg (fn []
              (->> rand
                   (repeatedly 30)
                   (reductions +)
                   (plot :xlab "t" :ylab "y" :type "l")))))


(spit "out/pout-test02.svg" (plot->svg
   (let [x (repeatedly 99 rand)
         y (map + x (repeatedly 99 rand))]
     (-> {:x x, :y y}
         dataset/->dataset
         (ggplot (aes :x x :y y :color '(+ x y) :size '(/ x y)))
         (r+ (geom_point) (xlab "x") (ylab "y"))))))

The output results are as follows

Output result by R graphics

Output result by ggplot

Rojure is also described below for reference, but its drawing library is unstable, and we basically recommend the use of Clojisr.

Rojure uses Rserve for the backend and changes the data abstraction from the R-based Incanter used by Rincanter, etc. to core.matrix, which is the data type of Clojure. Rserve must be implemented in the same way as for Clojisr.

For a concrete implementation, add {roujure “0.2.0”] to the project.clj file.

(defproject roj-test01 "0.1.0-SNAPSHOT"
  :description "FIXME: write description"
  :url "http://example.com/FIXME"
  :license {:name "EPL-2.0 OR GPL-2.0-or-later WITH Classpath-exception-2.0"
            :url "https://www.eclipse.org/legal/epl-2.0/"}
  :dependencies [[org.clojure/clojure "1.11.1"]
                 [rojure "0.2.0"]].      ;;<-add
  :repl-options {:init-ns roj-test01.core})

Next, make the R code work in rojure.

(ns roj-test01.core)

(use '(rojure core))

(def r (get-r))

(r-eval r "data(iris)")

(r-eval r "iris")

(r-eval r "x11()")

(r-eval r "plot(Sepal.Length ~ Sepal.Width, data = iris)")

(r-eval r "dev.off()")

The iris data shall be plotted in the same manner as the process in R.

コメント

タイトルとURLをコピーしました