Play with Spark: Building Spark MLLib in a Play Spark Application

Originally posted on Knoldus:

In our last post of Play with Spark! series, we saw how to integrate Spark SQL in a Play Scala application. Now in this blog we will see how to add Spark MLLib feature in a Play Scala application.

Spark MLLib is a new component under active development. It was first released with Spark 0.8.0. It contains some common machine learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as some optimization primitives. For detailed list of available algorithms click here.

To add Spark MLLib feature in a Play Scala application follow these steps:

1). Add following dependencies in build.sbt file

1

The dependency - “org.apache.spark”  %% “spark-mllib” % “1.0.1″ is specific to Spark MLLib.

As you…

View original 406 more words

Posted in Nhật ký | Để lại bình luận

Download the New Impala e-Book from O’Reilly Media

See on Scoop.itpdg-technologies.com

Cloudera offers enterprises a powerful new data platform built on the popular Apache Hadoop open-source software package.

See on blog.cloudera.com

Posted in Nhật ký | Để lại bình luận

rebound

See on Scoop.itPDG Mobile Tech

Rebound is a java library that models spring dynamics. Rebound spring models can be used to create animations that feel natural by introducing real world physics to your application.

Rebound is not a general purpose physics library; however, spring dynamics can be used to drive a wide variety of animations. The simplicity of Rebound makes it easy to integrate and use as a building block for creating more complex components like pagers, toggles, and scrollers.

See on facebook.github.io

Posted in Nhật ký | Để lại bình luận

In-Stream Big Data Processing

Originally posted on Highly Scalable Blog:

The shortcomings and drawbacks of batch-oriented data processing were widely recognized by the Big Data community quite a long time ago. It became clear that real-time query processing and in-stream processing is the immediate need in many practical applications. In recent years, this idea got a lot of traction and a whole bunch of solutions like Twitter’s Storm, Yahoo’s S4, Cloudera’s Impala, Apache Spark, and Apache Tez appeared and joined the army of Big Data and NoSQL systems. This article is an effort to explore techniques used by developers of in-stream data processing systems, trace the connections of these techniques to massive batch processing and OLTP/OLAP databases, and discuss how one unified query engine can support in-stream, batch, and OLAP processing at the same time.

At Grid Dynamics, we recently faced a necessity to build an in-stream data processing system that aimed to crunch about 8 billion events daily providing…

View original 5 219 more words

Posted in Nhật ký | Để lại bình luận

Log Analysis System Using Hadoop and MongoDB | CUBRID Blog

See on Scoop.itpdg-technologies.com

See on www.cubrid.org

Posted in Nhật ký | Để lại bình luận

Thoughts on NoSQL & Big Data Architecture

Originally posted on Adam's Big Data Discoveries:

I recently web a webpage forwarded to me by someone at work. It’s a very complex diagram of a ‘typical’ Big Data architecture. It also contains a couple of NoSQL databases. I decided to do a critique of it from a pure NoSQL standpoint. The diagram should (if we in the Computer industry are doing our jobs right) be able to be simplified if we use the correct products and approaches. I’ll detail my thoughts in this article…

View original 4 147 more words

Posted in Nhật ký | Để lại bình luận

FalconProposal – Incubator Wiki

See on Scoop.itpdg-technologies.com

Kun Le‘s insight:

Falcon is a data processing and management solution for Hadoop designed for data motion, coordination of data pipelines, lifecycle management, and data discovery. Falcon enables end consumers to quickly onboard their data and its associated processing and management tasks on Hadoop

See on wiki.apache.org

Posted in Nhật ký | Để lại bình luận