Blog

Big Data Correlations (May 07, 2014)

Big Data is about «what» instead of «why».

We don’t have to worry about the causality anymore. Instead of exploring cause and effects, we will be working on finding correlations between the factors.

The word Correlation is made of Co (meaning "together") and Relation. Correlation is a statistical technique that can show whether and how strongly pairs of variables are related. When two sets of data are strongly linked together we say they have a High Correlation.

We use the term, «correlation» in Big Data; to identify «what» instead of «why». Correlations does not state «certainty» but «high probability».

Correlations with Big Data, takes place of causal researches and empowers the non-causal researches by revealing the non-linear relations and the hidden patterns inside the data which have never seen before.

Analogous vs Digital Data Trends (April 16, 2014)

In 2000, only the 25% of all the data in the world was digital. The other 75% was films, papers, magnetic tapes and others. For the last few years the situation turned upside down quickly. In 2007, only the 7% of all the data was analogous. All the other data was digital. According to this trend, we can see that the analogous data is not increasing but the digital data doubles in every 3 years. As of 2014, all the data stored in the world will be 1.200 exabyte and only the 2% will be analogous.

Apache Casssandra (March 10, 2014)

Apache Cassandra is a high performance, extremely scalable, fault tolerant (i.e. no single point of failure), distributed post-relational database solution. Cassandra combines all the benefits of Google Bigtable and Amazon Dynamo to handle the types of database management needs that traditional RDBMS vendors can not support.

Cassandra also places a high value on performance. University of Toronto researchers studying NoSQL systems concluded that "In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments."

Cassandra is essentially a hybrid between a key-value and a column-oriented (or tabular) database. Each key in Cassandra corresponds to a value which is an object. Each key has values as columns; columns are grouped together into sets called column families.

http://cassandra.apache.org/

Cassandra Query Language (CQL) (March 07, 2014)

Cassandra Query Language (CQL) is a SQL (Structured Query Language) - like language for querying Cassandra. CQL is the default and primary interface into the Cassandra DBMS. CQL provides a new API to Cassandra that is simpler than the Thrift API for new applications.

The Thrift API, the Cassandra Command Line Interface (CLI), and legacy versions of CQL expose the internal storage structure of Cassandra. CQL adds an abstraction layer that hides implementation details and provides native syntaxes for CQL collections and other common encodings.

MongoDB (March 03, 2014)

MongoDB is a cross-platform document-oriented database system. The name MongoDB comes from the word "humongous", which means extremely large, enormous. Written in C++, MongoDB is an open-source document database and one of the most popular NoSQL databases.

Classified as a NoSQL database, MongoDB uses the JSON-like documents with dynamic schemas (MongoDB calls the format BSON) instead of the traditional table-based relational database structure, making the integration of data in certain types of applications easier and faster.

https://www.mongodb.org/

Cassandra vs. HBase vs. MongoDB (March 01, 2014)

A good benchmarking between the most popular NoSQL database systems with respect to database model, partitioning, replication, consistency, API and programming languages support, scripting and other technical concepts.
http://db-engines.com/en/system/Cassandra%3BHBase%3BMongoDB