What is big data?

Post Reply
KBleivik
Site Admin
Posts: 184
Joined: Tue Sep 29, 2009 6:25 pm
Location: Moss Norway
Contact:

What is big data?

Post by KBleivik »

1. Introduction.

Help I have 1 Tb of sales data in a database. Is that big data? If you have computer power to handle, mine and analyze that data it is not big data according to the definition of big data in point 2. below. What is big data to me and my company need not be big data to another company. So what we here call big data is dependent of the context and environment.

2. Big data defined and explained.
Big data defines a situation in which data sets have grown to such enormous sizes that conventional information technologies can no longer effectively handle either the size of the data set or the scale and the growth of the data set. In other words, the data set has grown so large that it is difficult to manage and even harder to garner value out of it. The primary difficulties are the acquisition, storage, searching, sharing analytics and visualization of the data.
Source: Frank J. Ohlhorst (january 2013) "Big Data Analytics: Turning Big Data into Big Money" ISBN: 978-1-118-14759-7 page 1.

Even though that is not a definition in the strict sense it is good enough for us. In short, big data is a data set so big that it is difficult or impossible for you or your company to store and handle. The data is for you so big that it is more or less invaluable. To another person or company, the data set may have value.

3. Storage.

I am from Norway, and some years ago there was a Norwegian company Opticom ASA (query the name if you need English sources) that told us that the company that cooperated with Intel, was working on a storage technology so compact that all information on the internet around 2005 could be stored on a medium the size of a credit card. The technology did not succeed and Opticom was in 2006 bought by its more successful daughter company Fast Search & Transfer ASA that was again bought by Microsoft in 2008. The research was from 1999 addressed through its subsidiary Thin Film Electronics (TFE). The commercial breakthrough technology has so far been long in coming. So what remains of the research and the technology is unclear.

Links:

http://www.thinfilm.no/

Opticom ASA (OPC) One of the best kept secret or just another "to much talk" company?

http://www.crmz.com/Report/ReportPrevie ... Id=5488396

4. In memory databases and computing technologies.

http://www.forbes.com/sites/ciocentral/ ... verything/

http://hortonworks.com/blog/a-modern-da ... in-memory/

http://www.informationweek.com/software ... id/1114088

http://www.information-age.com/technolo ... mainstream

https://www.sqlite.org/inmemorydb.html

http://www.memsql.com/

http://www.sas.com/high-performance-ana ... emory.html

http://www.informationweek.com/big-data ... id/1113609

http://www.ibmbigdatahub.com/blog/speed ... processing

http://blog.cloudera.com/blog/2013/11/p ... lications/

http://www.sap.com/pc/tech/in-memory-co ... -hana.html

http://www.oracle.com/us/corporate/feat ... index.html

http://www.webopedia.com/TERM/I/in-memory_database.html

5. Node.js solutions

http://nodejs.org/

https://www.youtube.com/watch?v=SAc0vQCC6UQ

http://java.dzone.com/articles/using-no ... ms-massage

http://codedependant.net/blog/posts/20- ... tic-searc/

http://www.joyent.com/developers/videos ... nd-node-js

http://www.alolo.co/blog/2013/10/13/bui ... th-node-js

https://github.com/louischatriot/nedb

http://ejdb.org/

http://genomu.com/

https://github.com/petersirka/nosql

http://globalsdb.org/

http://www.macwright.org/presentations/nodedc

https://blog.codecentric.de/en/2014/01/ ... hiecharts/

http://www.linkedin.com/groups/How-woul ... .185579081

http://blog.treasure-data.com/post/3667 ... plications

6. Online articles and reports.

How Cloudera plans to stand out from the Hadoop herd

FICO Buys Hadoop Firm to ‘Democratize Analytics’

http://events.pentaho.com/EMA-big-data-buzz-report.html

7. Litterature.

Frank J. Ohlhorst (january 2013) "Big Data Analytics: Turning Big Data into Big Money" ISBN: 978-1-118-14759-7

8. Where can big datasets be found?

http://www.quora.com/Data/Where-can-I-f ... the-public

http://www.kdnuggets.com/datasets/index.html

http://stackoverflow.com/questions/3818 ... c-datasets

http://hadoopilluminated.com/hadoop_ill ... _Sets.html

http://aws.amazon.com/publicdatasets/

http://blog.revolutionanalytics.com/201 ... for-r.html

http://www.freebase.com/

http://www.infochimps.com/ more specifically here http://www.infochimps.com/tags/bigdata

http://www.datasciencecentral.com/profi ... e-for-free

http://www.bigdata-startups.com/

https://www.kaggle.com/

http://www.crunchbase.com/

https://timetric.com/

https://www.google.com/publicdata/directory

http://lemire.me/blog/archives/2012/03/ ... -research/

9. Other resources.

http://www.ibm.com/smarterplanet/us/en/ibmwatson/

http://nosql-database.org/

http://www.datacenterknowledge.com/arch ... /big-data/

http://www-01.ibm.com/software/ebusines ... bigsheets/

http://80legs.com/

http://www.mapr.com/

http://www.cloudera.com/

http://linfo.org/grep.html

https://www.mturk.com/

http://events.pentaho.com/analyst-forrester.html

http://hadoop.apache.org/

http://cassandra.apache.org/

http://hibernate.org/orm/

http://research.google.com/archive/bigtable.html

https://github.com/OpenRefine

http://www.sas.com/en_us/software/sas-hadoop.html

https://pig.apache.org/

https://hbase.apache.org/

http://www.hazelcast.com/

http://extractiv.com/

https://www.mozenda.com/

http://www.tableausoftware.com/

http://www.openheatmap.com/

https://gephi.org/

http://commoncrawl.org/

https://books.google.com/ngrams

10 Related forum threads.

http://www.oopschool.com/phpBB3/viewtop ... p=355#p355

http://www.oopschool.com/phpBB3/viewforum.php?f=55

Post Reply

Who is online

Users browsing this forum: No registered users and 11 guests