coding beacon

[programming & visualization]

Tag Archives: HPC

How to start using OpenCL ASAP

The following is written assuming the computer has Intel CPU:

Supported Targets

3rd Generation Intel Core Processors
Intel “Bay Trail” platforms with Intel HD Graphics
4th Generation Intel Core Processors, need kernel patch currently, see the “Known Issues” section.
5th Generation Intel Core Processors “Broadwell”.

To start programming right away, do the following:

1. Get Beignet.

Beignet is an open source implementation of the OpenCL specification – a generic compute oriented API. This code base contains the code to run OpenCL programs on Intel GPUs which basically defines and implements the OpenCL host functions required to initialize the device, create the command queues, the kernels and the programs and run them on the GPU.

In terms of the OpenCL 1.2 spec, beignet is quite complete now (at the time of writing, 28/03/2015).

2. Get OpenCL Studio


The OpenCL Programming Book

Eclipse: prepare for OpenCL programming

High Performance Computing Libraries (to be updated)

1. IRC channel #opencl at freenode network


3. Reference on installing pre-requisites (hardware drivers)

CUDA vs OpenCL

ArrayFire (open source)

“ArrayFire supports both CUDA-capable NVIDIA GPUs and most OpenCL devices, including AMD GPUs/APUs and Intel Xeon Phi co-processors. It also supports mobile OpenCL devices from ARM, Qualcomm, and others. We want your code to run as fast as possible, regardless of the hardware.”

“ArrayFire is a blazing fast software library for GPU computing. Its easy-to-use API and array-based function set make GPU programming simple. A few lines of code in ArrayFire can replace dozens of lines of raw GPU code, saving you valuable time and lowering development costs.”

Getting Started

(written prior to being open-source):

Overview by NVidia:

* Download here & install, and append “%AF_PATH%/lib;” to your PATH env.variable

* Sources

* Files required to use ArrayFire from R (prerequisites: source files above)

HPC Hardware & Software


The Parallella project will make parallel computing accessible to everyone.


The Producer:

Starting at mere $99…



A widely-available, generic benchmark specifically targeted at the processor core. Introducing CoreMark — Developed by EEMBC, this is a simple, yet sophisticated, benchmark that is designed specifically to test the functionality of a processor core. Running CoreMark produces a single-number score allowing users to make quick comparisons between processors.

HPC tools and services

Distributed Computing in C++
-Combine the above two answers – MPI to communicate across clusters, OpenMP to parallelise for cores on clusters. If you have graphics cards, throw CUDA etc into the mix too. That’s what our distributed clusters do at work.-Checkout zNet is a C++ framework that is intended for multi-core and distributed core programming, supports streaming of build-in and custom types without any inheritance, transparently supports auto-discovery and load balancing and specifically oriented towards making application scalable on any hardware.
-CloudIQ Engine from Appistry. It allows you to distribute your C++ algorithms across any number of servers for processing. It also provides for process flow management for tasks. As part of the framework, failover is included, so if a task dies midstream (say someone pulls the plug on a machine), that task is automatically restarted on another node. And if that happens as part of a process flow, the whole flow does not have to be restarted, only the latest task. The framework automatically checkpoints your work at each step.

OpenMPI and/or OpenMP combinations work the best. We use OpenMPI on our supercomputing cluster to process large scientific jobs that require weeks of computing time. As an additional note, MPI has C++ bindings from Boost::MPI which supports lovely stuff like serialization of STL types (valarray, vectors, strings, etc.) to allow easier message-passing on your part.
Buying Cluster/Grid/Cloud Time?
-You might want to check out Amazon’s EC2 service:
Some people have already done some work in regards to clustering with EC2:*&ie=UTF-8&oe=UTF-8&startIndex=&startPage=1
Additionally, Microsoft has offered Windows Azure, which has native hooks for .NET, allows you to run anything, really (Java, php), given that you are able to load a runtime and code from storage (or deployed with your app, but that has it’s own set of pros/cons).
-Amazon’s Elastic Compute Cloud is very interesting. You pay for what you use (Memory, CPU, Persisted storage) many OS options.

-There is a new service called Amazon Elastic MapReduce which runs on top of EC2 cluster. It has APIs in many of the programming languages including Ruby and PHP. Also, if you need more established service, checkoutGreenPlum