For those unfamiliar with the ‘R’ technology, it’s a programming language and software environment that is used extensively for statistical programming and graphics. It is a favorite of what we are now calling “Data Scientists.” For those interested, as a GNU project, it is freely available under the GNU General Public License.
The reason for the brief background is occasioned by the announcement by Big Data solutions provider 1010data, which touts its Big Data platform as currently being the fastest, of its new R1010 solution package that turbo-charges R. For the very important emerging group of enterprise analytical data scientist wizards, this is significant news.
The R1010 solution integrates directly with 1010data’s (News - Alert) Big Data Discovery platform. It provides an interface to use the data and advanced analytics within 1010data directly via the R console—unifying the power of both technologies.
The power of R and Big Data
This new package is aimed directly at an historical challenge for R for use with large data sets because it runs as a single thread on a computer. 1010data’s R1010 package combines 1010data’s ability to analyze unlimited volumes of data with the broad set of statistical functions familiar to the R community. This enables data scientists to build analytical models on large scale data at an unprecedented rate.
The package includes functions to easily establish and manage 1010data sessions, as well as to browse 1010data folders from within the R interactive console. R1010 allows users to crunch through massive volumes of raw data in 1010data’s Big Data Discovery platform and integrate their favorite CRAN packages using the full R feature set to perform complex statistical analysis. In addition, the R1010 package has R Query Interface (RQI) functions, which provide a native R experience for query development.
As 1010data points out, many data scientists got their start with R and still rely on its open source functions. The company believes the combination of R with 1010data’s native ability to handle Big Data satisfies two pressing data scientist needs, i.e., a compendium of statistical functions and a massively parallel Big Data Discovery platform. This means that modelers comfortable with the R environment can easily apply their models to Big Data by storing queries as R-strings and executing them against 1010data connected data frames.
“We are excited to bring our Big Data Discovery platform to R users,” said Sandy Steier (News - Alert) Co-founder and CEO of 1010data. “Combining 1010data’s ability to analyze unlimited volumes of data with the broad set of statistical functions familiar to the R community enables data scientists to build analytical models on large scale data at an unprecedented rate.”
This development comes at an exciting time for R. Recent developments include:
For those not familiar with 1010data, the New York, NY-based company has deep experience in the maturing Big Data arena. As obtaining actionable insights from Big Data requires access to all relevant data and the best analytical tools to analyze it, the company early on architected its solutions to provide the growing list of data insight needs ranging from Big Data Discovery to enterprise reporting to the sharing of Big Data among organizations. Its client list includes over 700 of the world's largest retail, manufacturing, telecom, and financial services companies, and they manage and analyze over 19 trillion rows of data.
This announcement highlights two important industry trends. The first is the emergence of the critical need enterprises see in employing what really is a new breed of data scientist. The profile is someone who is not only computationally and statistically adept but also has substantial business acumen. These are a rare breed and highly prized. The second is that once you have such a person(s), they must be equipped with the best tools available to enable them to optimize the value locked up in the data. This means not just big computing but Big Data discovery and analytics.
“R U Ready?” is going to be an increasingly hot topic of conversation. No list of big trends for 2015 and beyond is missing the need for Big Data and sophisticated analytics that require massive number crunching by data scientists to extract usable business intelligence. The role of R in moving ahead is clearly something to keep an eye on.