Posts

R Programming - Training Ensemble Models with Full CPU Cores

R supports parallel computations with the core parallel package. What the doParallel package does is provide a backend while utilizing the core parallel package. The caret package is used for developing and testing machine learning models in R. This package as well as others like plyr support multicore CPU speedups if a parallel backend is registered before the supported instructions are called.

Build a distributed system with multi nodes of Spark Kafka Hadoop Yarn on Linux

PC hardware is now so cheap that buying a couple of extra machines and wiring them into the same computing pool could make a very cost-effective expansion. This is what we are going to build, and we're going to use Ubuntu Linux to do it. Linux can take cluster computing tasks like these in its stride, and you don't need to fork out for a licence for every machine.

Easy Apache Airflow Installation

Platform created by the community to programmatically author, schedule and monitor workflows.