There is a lot of buzz around Big Data, around the fact that there is an ever-increasing explosion of the amount of data in the world.
Just think about your own growing collection of family photos. When we were young (or at least when I was young) we did not have more photos than what we could assemble in a photo album, to take out several times a year to look over.
Now with modern digital cameras, we easily shoot so many pictures that we will come to point where we can use the rest of our life just looking at pictures from the past. Scary right?
Big Data is the same thing.
The amount of data growing on the Internet and inside corporations is exploding so fast that we at some point cannot do much else than manage the data of the past, unless we find a way to deal with it.
Most people who write about Big Data focus on the explosion of the volume of data being added to the world; for example, the data, which is counted in petabytes by the day, created on social sites like FaceBook and Twitter. These are bigger numbers than most of us can even comprehend.
Some, like the CTO of Informatica, James Markarian, take it a bit further in this interview, “Informatica Gears Up for Big Data in Business.” James describes Big Data as “about volume, velocity and variety.”
But Big Data is equally about something else — the fourth parameter…
The explosion of the spread of data
Not only is the amount of data exploding, so are the number of places where data reside. We are experiencing an explosion of data silos, so to speak, most notably on the Internet where thousands of new domains go live every day (according to DomainTools), and where existing websites keep adding more breadth and depth.
It is one thing to handle Big Data volumes like the specialized databases for the big search engines such as Google and Bing, for example, or like databases from vendors such as Apache Hive, GreenPlum, Infobright, ParAccel, SAND, and VoltDB.
It’s equally challenging (maybe even more so) to handle the explosion of data sources. This is something most people who write about Big Data conveniently forget to address.
The explosion in the spread of data also means an explosion in data access points, so we need a much more effective way to access data on the Internet. Not to mention that a majority of them won’t have documented APIs.
Today’s data-access methods of coding interfaces, using pre-build adaptors or connectors, or copying and pasting the data manually will never be able to handle the data connecting and data harvesting demands of tomorrow.
We need to start a discussion on this topic, because if we don’t deal with it, we will end up with the biggest data-silo problem we can imagine, and it’s not going to be pretty.
I’ll start the discussion:
More and more data are accessed over the Internet, internally in companies, within business networks, in the cloud, and on the Internet. There is only ONE common denominator for all of these data silos: they are accessed from a web browser, whether Internet Explorer, Firefox, Safari, or Chrome.
I believe the answer to dealing with this Big Data “access” problem lies in the current access method and we need to leverage browser technology to solve it!
Unfortunately, current browser technologies are not built for this, so we need to develop new scalable browser-based data-access technologies. We need to develop the “ETL” of tomorrow.
Do you agree?
Stefan Andreasen, Founder and CTO
Stefan Andreasen is a true entrepreneur, innovator and net-worker with more than 25 years experience in software. He spent five years in Boston with Advanced Visual Systems working on cutting-edge Java and visual programming projects. In 1998 he started Kapow as the largest European marketplace for cars, real estate and boats for sale. The Web ETL software is based on a scalable automation browser with full AJAX support, using flow-chart based visual navigation and programming, to allow for hyper-agile integration and automation between applications on the web (inside and outside firewall). In 2001 Stefan Andreasen sold the marketplace to the largest bank in Denmark and changed Kapow into a software company - Kapow Software. The software was developed into a full powered Enterprise-class Cloud and Application integration platform now serving more than 500 companies worldwide. In 1995 Stefan Andreasen relocated himself and the company headquarter to Palo Alto in California where he is now the Founder & CTO of Kapow Software.
To make 5G possible, everything will change. The 5G network will involve new antennas and chipsets, new architectures, new KPIs, new vendors, cloud di…
The digital transformation of business is generating a lot of value, through more automation, more intelligence, and ultimately more efficiency.
Last week, at the Open Source Summit, hosted by The Linux Foundation, the Open Mainframe Project gave birth to Zowe, introduced a new open source soft…
Not many of us love going to see the dentist, and one company working across unified voice, productivity and even IoT systems is out to make the exper…
Moogsoft Observe advances the capabilities of AIOps to help IT teams better manage their services and applications in the face of a massive proliferat…