The V’s of Big DataJuly 2, 2018
Most businesses today encounter large amounts of data which can be processed and analyzed to see trends and improve techniques. Smaller transactions can be handled through excel sheets or other similar software, but as the data starts compounding, it’s time to think of big data and its analysis.
Big data is all around and will not fade away anytime soon, Thus, it becomes important to break it down into 5 V’s which properly describe it.
Obviously, velocity refers to the speed at which this huge data set is being generated, collected and analyzed. According to a research, every day almost 900 million photos are uploaded on Facebook, 0.4 million hours of video uploaded on YouTube, 500 million tweets are posted on Twitter and a whopping 3.5 billion searches are performed on Google. Imagine where all this data goes, and how these platforms are still able to perform their tasks so efficiently. Every second, the amount of data is increasing, and big data methods have to be used. Big data helps these companies to accommodate this inflow, accept it and process it very fast so that there are no glitches.
Also Read : Impact of Big Data on the World
Volume refers to the incredible amount of data generated each second. To an ordinary person, it may seem like a nuclear explosion of data. There is no sense to focus on minimum storage units because the amount of data is growing exponentially every year. There is data I phones, laptops. Social media platforms, credit cards, photographs, videos. Facebook has 2 billion users, YouTube has1 billion users, Instagram has 700 million users while Twitter has 350 million users. These users continuously add to the amount of data through uploads, Collecting and analyzing this data presents a huge challenge for engineers and data scientists.
When we refer to value, we refer to the net worth of all this data which is present. Having huge amounts of data is great, but unless you can utilize it for your profit, it is useless. As we know that huge sets of data do not correspond to helpful insights always. It needs to be monitored for the key function of your organization. Whether the data can help in launching a new product line, whether it presents a cross-sell opportunity or whether a cross-cutting measure, it has to be figured out. The cost and benefit of big data need to be kept in mind.
Variety in big data refers to the structured and unstructured data which is generated by machines or humans. Data today is very different from data in the past. 80% of the data today is unstructured and cannot be fit into a table. It includes photos, videos, email, voicemails, handwritten text, social media, etc. There are no rules with unstructured data while structured data has to fit into metamodels. Variety is all about being able to classify all this data into categories which can be stored, analyzed and used simultaneously.
Veracity refers to the quality or trustworthiness of data. It basically tells us whether the data is accurate or not for us. This is one of the disadvantages of big data and there are always some discrepancies which might be present. As the above-mentioned properties increase, the veracity of big data decreases. It determines more the meaningfulness and the reliability of data. Think of social media posts with hashtags, typing errors, abbreviations, etc. They increase the bulk but decrease the quality of data. The knowledge of the veracity of data can help in understanding the risks associated with future business plans and prevent past mistakes.