There is a lot of buzz around Big Data, around the fact that there is an ever-increasing explosion of the amount of data in the world.
Just think about your own growing collection of family photos. When we were young (or at least when I was young) we did not have more photos than what we could assemble in a photo album, to take out several times a year to look over.
Now with modern digital cameras, we easily shoot so many pictures that we will come to point where we can use the rest of our life just looking at pictures from the past. Scary right?
Big Data is the same thing.
The amount of data growing on the Internet and inside corporations is exploding so fast that we at some point cannot do much else than manage the data of the past, unless we find a way to deal with it.
Most people who write about Big Data focus on the explosion of the volume of data being added to the world; for example, the data, which is counted in petabytes by the day, created on social sites like FaceBook and Twitter. These are bigger numbers than most of us can even comprehend.
Some, like the CTO of Informatica, James Markarian, take it a bit further in this interview, “Informatica Gears Up for Big Data in Business.” James describes Big Data as “about volume, velocity and variety.”
But Big Data is equally about something else — the fourth parameter…
The explosion of the spread of data
Not only is the amount of data exploding, so are the number of places where data reside. We are experiencing an explosion of data silos, so to speak, most notably on the Internet where thousands of new domains go live every day (according to DomainTools), and where existing websites keep adding more breadth and depth.
It is one thing to handle Big Data volumes like the specialized databases for the big search engines such as Google and Bing, for example, or like databases from vendors such as Apache Hive, GreenPlum, Infobright, ParAccel, SAND, and VoltDB.
It’s equally challenging (maybe even more so) to handle the explosion of data sources. This is something most people who write about Big Data conveniently forget to address.
The explosion in the spread of data also means an explosion in data access points, so we need a much more effective way to access data on the Internet. Not to mention that a majority of them won’t have documented APIs.
Today’s data-access methods of coding interfaces, using pre-build adaptors or connectors, or copying and pasting the data manually will never be able to handle the data connecting and data harvesting demands of tomorrow.
We need to start a discussion on this topic, because if we don’t deal with it, we will end up with the biggest data-silo problem we can imagine, and it’s not going to be pretty.
I’ll start the discussion:
More and more data are accessed over the Internet, internally in companies, within business networks, in the cloud, and on the Internet. There is only ONE common denominator for all of these data silos: they are accessed from a web browser, whether Internet Explorer, Firefox, Safari, or Chrome.
I believe the answer to dealing with this Big Data “access” problem lies in the current access method and we need to leverage browser technology to solve it!
Unfortunately, current browser technologies are not built for this, so we need to develop new scalable browser-based data-access technologies. We need to develop the “ETL” of tomorrow.
Do you agree?
Stefan Andreasen, Founder and CTO
Stefan Andreasen is a true entrepreneur, innovator and net-worker with more than 25 years experience in software. He spent five years in Boston with Advanced Visual Systems working on cutting-edge Java and visual programming projects. In 1998 he started Kapow as the largest European marketplace for cars, real estate and boats for sale. The Web ETL software is based on a scalable automation browser with full AJAX support, using flow-chart based visual navigation and programming, to allow for hyper-agile integration and automation between applications on the web (inside and outside firewall). In 2001 Stefan Andreasen sold the marketplace to the largest bank in Denmark and changed Kapow into a software company - Kapow Software. The software was developed into a full powered Enterprise-class Cloud and Application integration platform now serving more than 500 companies worldwide. In 1995 Stefan Andreasen relocated himself and the company headquarter to Palo Alto in California where he is now the Founder & CTO of Kapow Software.
Pressure has been growing in the past few weeks for politicians and regulators to clamp down on the monopoly power of Big Tech. Indeed, scrutiny is gr…
Are you unknowingly working for someone else and is Big Tech making vast gains at our expense?
As businesses continue to accumulate data that has the potential to improve operations and increase revenue, dashboard design is becoming a key compon…
Artificial intelligence (AI) is one of the most talked about and debated topics of conversation happening today. It is touching every industry.
Practically every organization has vast amounts of "dark data" in the form of weblogs, machine logs, and logs from sensors on everything from oil rigs…