Apache Spark a fast engine for large-scale data processing

KBleivik
Site Admin
Posts: 178
Joined: Tue Sep 29, 2009 6:25 pm
Location: Moss Norway
Contact:

Apache Spark a fast engine for large-scale data processing

Postby KBleivik » Wed May 28, 2014 3:46 pm

1. Project site.

http://spark.apache.org/

Apache Spark™ is a fast and general engine for large-scale data processing and run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.


http://databricks.com/

2. What is Spark?

Apache Spark is a powerful open source processing engine for Hadoop data built around speed, ease of use, and sophisticated analytics. It was originally developed in 2009 in UC Berkeley’s AMPLab, and open sourced in 2010.

In subsequent years it has seen rapid adoption, used by enterprises small and large across a wide range of industries. It has quickly become one of the largest and most active developer communities in big data, with over 100 contributors from 30+ organizations.

Source: http://databricks.com/spark

RDDs shard the data over a cluster, like a virtualized, distributed collection (analogous to HDFS). They support intelligent caching, which means no naive flushes of massive datasets to disk. This feature alone allows Spark jobs to run 10-100x faster than comparable MapReduce jobs! The “resilient” part means they will reconstitute shards lost due to process/server crashes.

Source: http://polyglotprogramming.com/

See also:

https://gigaom.com/2014/02/27/as-mapred ... l-project/

http://hadoop.apache.org/docs/r2.3.0/ha ... /YARN.html

http://mesos.apache.org/

https://www.usenix.org/system/files/con ... nal138.pdf

http://spark.apache.org/docs/0.8.1/api/ ... d/RDD.html

3. Spark, node.js and backbone.js

Litterature: http://www.amazon.com/Anatomy-applicati ... 00HRME7NA/

http://nodejs.org/

http://backbonejs.org/

http://backbonetutorials.com/

https://nodejsmodules.org/tags/spark

http://telruptive.com/tag/node-js/

http://alexdong.com/fanout-architecture-design/

http://nodejsdb.org/

http://strata.oreilly.com/2014/01/learn ... mesos.html

http://howtonode.org/deploying-node-with-spark

http://addyosmani.github.io/backbone-fundamentals/

4. Links

http://www.zdnet.com/faster-more-capabl ... 000026149/

http://java.dzone.com/articles/apache-s ... t-big-data

http://blog.mikiobraun.de/2014/01/apache-spark.html

http://blog.polyglotprogramming.com/

https://github.com/apache/spark

http://planetcassandra.org/blog/post/fa ... -datasets/

5. Literature

http://www.amazon.com/Big-Data-Analytic ... 133837947/

http://www.amazon.com/Learning-Spark-Li ... 449358624/

http://polyglotprogramming.com/papers/S ... eModel.pdf

http://www.amazon.com/Node-js-Action-Mi ... 617290572/

http://www.amazon.com/Node-js-Right-Way ... 937785734/

http://www.amazon.com/Fast-Processing-S ... 782167064/

Return to “Big data”

Who is online

Users browsing this forum: No registered users and 1 guest