Thoughts On Native, GPU, Groovy, Java and Clojure

One interest of mine is math and scientific computing, although I admit right now my interest is far greater than my knowledge.

It seems like for a lot of math and scientific computing, you either need to do stuff in C, C++ or even Fortran, or use a library in a higher-level language that lets you interface with libraries in C/C++/Fortran. Another new trend is to use code that runs on the GPU, like OpenCL.

Lately I have been looking at a few Java libraries that interface with lower level code. I think it is a good idea to keep learning new things. I think AI and machine learning will become more important over time.

I contributed briefly to the SciRuby project a few years ago. One of their main subprojects is a Ruby wrapper around BLAS, ATLAS and LAPACK. I think some of the instructions for compiling BLAS, ATLAS and LAPACK might still have a few sentences written by yours truly. (It’s a very involved process.) Another one of their projects is a Ruby wrapper/interface around the GNU Scientific Library, or GSL.

One huge language in AI and data science these days is Python, with the NumPy and SciPy libraries. Again, some of the functionality is provided by linking to ATLAS and LAPACK. If it worked for them, maybe it will work for the JVM family of languages.

There is at least one project in Clojure that does this: Uncomplicate. It has sub-projects for OpenCL and BLAS and ATLAS. Along with Java and Groovy, Clojure is a language I am interested in. But for the time being I cannot use these projects. I think my graphics card is a bit too old and the drivers are too old for many of these libraries.

Uncomplicate is part of the reason I am looking at this area again. Somewhere on the site, the author points at the for development, it is okay to use pre-baked packages for ATLAS and LAPACK. One of these projects that I mention says for the best performance, you may want to hire someone to install ATLAS and LAPACK and other native libraries optimized for your production hardware. I did not know people made a living doing just that.

Another reason I am looking at this is GPU computing was mentioned in one of the side talks at the most recent ClojureConj. This is second- or third-hand, but apparently Goldman Sachs runs a nightly report analyzing the market (or some part of it, I am not too clear). It used to take nine hours to run. It was refactored to run on the GPU, and now it takes 20 minutes. I think all developers should become familiar with some of these technologies.

Will any of this gain traction in the Groovy space? I know things have been rough for Groovy for the past couple of years, especially after getting dropped by Pivotal. It seems like a lot of Groovy developers are happy making web apps with Grails. I like Grails, and it has worked out well for a lot of people. I will go through the Groovy Podcast backlog; perhaps Groovy developers are looking into this and I am just not aware of it.

Groovy developers can use the Java wrappers for native and GPU computing. No need to re-invent the wheel. Maybe going from Groovy to Java to native is not quite as fast as going from Python or Ruby to native, but I don’t think so. Even Ruby developers joke that Ruby is pretty slow. They say that sometimes the speed of the programmer is more important than the speed of the program. (But for some reason the Ruby devs who are moving to Elixir are now saying performance and concurrency are important after years of saying they were not. Whatevs.)

I do not have much experience in native or GPU programming. I am not too clear how NumPy and SciPy are used. I don’t want to swallow the Python ecosystem to get the answers to just a few questions, but: Are NumPy and SciPy programs (or bare-metal math/science apps in general) rewritten frequently? Or do they stay the same for years at a time? It seems to me if a GPU program does not change, you might be better off biting the bullet and doing things in C or C++.

But given the success of NumPy and SciPy, perhaps Groovy and Java can be used in this space. Maybe it’s not the end of the world if a bank’s nightly report takes 22 minutes. (How successful is SciPy? Not only are there conventions for general Python, there are also conferences for just SciPy). I doubt any technology will dethrone Python anytime soon, but it is an area to look at.

Having looked a small bit at some of these libraries, I do wonder how much can the unique features of Groovy be used. A lot of the tests deal with arrays of primitive types.  Someone on the Groovy mailing list wrote: I’ve found that you can really shrink down the lines of code just by using the basics that Groovy provides around collections, operator overloading, Groovy truthiness, etc.. Maybe that is enough reason to use Groovy for high performance computing (or at least higher performance). Can we make code calling native and GPU libraries better with features from Groovy, or will we just get Java without semicolons? Some of these functions do not want an ArrayList of objects, they want a fixed-length array of integers. Granted, you can make primitive arrays in Groovy:

but I think that is more keystrokes than in Java.

Just looking at a small part of TensorFlow, here is a line from the original Java test:

Here are two Groovy versions that both work (meaning the tests still pass):

Granted, it’s one line, but I’d say that’s idiomatic Groovy. Perhaps this could be a (small) market opportunity for the Groovy community. Or maybe I’m just crazy.

If you google “groovy jni” the first hit is this page from Object Partners. It mentions JNA and BridJ which I think are used by a lot of the libraries mentioned below. Frankly, it sounds like a lot of work making this sort of thing happen.

Regardless, even if I have to stick with Java and Clojure, I will still keep an eye on these libraries and this space. I don’t think I can become an expert in all of these, but as I said, I think it is important for developers to keep an eye on this spae. I might start some pages about the different libraries, and perhaps share my thoughts on them. Below are a few notes on what I have found so far.

I started by googling “Java BLAS” or “Java GPU”, and at first I only got a few libraries. But as I kept looking, I kept finding more.

The first two I found where jblas and Aparapi. From the jblas website, jblas “is essentially a light-wight wrapper around the BLAS and LAPACK routines.” (And from what I can gather, the correct capitalization is “jblas”.)

Aparapi is a wrapper around GPU and OpenCL code. If you cannot get your GPU drivers installed properly, it will do everything on the CPU. Sometimes I get core dumps running the tests on my Ubuntu laptop. It is about six years old, and has 6 GB of memory, so perhaps that is the issue. But on my 8 GB Windows laptop two of the tests fail. I plan on getting a new laptop soon, so I will try this again when I do. I bet getting all this set up on an Apple laptop is really easy; for the first time in my life I am thinking about buying an Apple product.

TensorFlow has some Java APIs, although they warn that there may be API changes and breakages for languages other than Python and C. This can use either the CPU or the GPU, but for GPUs it can only work with NVidia cards.

The Lightweight Java Game Library works with OpenCL, OpenGL, Vulkan, EGL and a lot of other stuff I have never heard of. As the name suggests, it is primarily for game development, and the wrappers around the underlying native libraries and standards require more knowledge of those underlying native libraries and standards than most libraries.

There is ND4J (N-Dimensional Arrays for Java) and Deeplearning4j. I am not sure if they are related somehow, but they were both started by developers at a company called Skymind.

And then there is the granddaddy: ByteDeco. There is a lot of stuff here. Some of the contributors work on ND4J and Deeplearning4j, and also work at Skymind. This project provides Java interfaces to 21 different C/C++ libraries: video, math, science, AI, robotics, facial recognition, lots of good stuff here. And after looking at their list, yes, I think “Skymind” is a bit too close to “Skynet”. But at least they’re giving you a hint.

You’re welcome.