As businesses begin adopting Big Data and predictive analytics in their current workflows, there are tools available that make the process easier for analysts without much experience in data science.
Apache Spark is a Big Data development platform and processing engine that is growing ever more popular. It provides both higher performance and more flexibility for applications than existing Big Data platforms such as Hadoop.
Partnering the performance of Spark with XLMiner makes its benefits far more accessible for business analysts who aren't expert data scientists. Instead of writing code in a programming language such as Scala, Python or R, you can pull sampled or summarized data directly from a Spark Big Data cluster right into the Excel spreadsheets that your analysts use every day.
Apache Spark advantages
Apache Spark was conceived in the University of California Berkeley AMPLab in 2009, and open sourced in 2010; Spark 1.5 appeared in September 2015. Since its introduction to the market, many companies - including Cloudera, HortonWorks, Amazon, eBay, Yahoo! and recently IBM - have implemented the Spark open source cluster computing framework into their own products or internal systems.
InfoQ, a professional software information source, compared Spark to other open source platforms like Hadoop and Storm. Spark is not only able to run certain applications 100 times faster in memory than Hadoop, but it boasts other benefits for ease of use and sophisticated analytics.
Thanks to the 80 high-level operators Spark provides, users can quickly write code in Scala, Python and recently R. The data processing framework integrates with common sources like HDFS, Amazon S3, Hive, HBase, Cassandra and Hadoop.
Spark's core functionality greatly extends the "MapReduce" paradigm originally offered by Hadoop, making it easy to implement more complex processing workflows, and to utilize main memory as well as disk. Developers can often write code without worrying about the details of scheduling operations and optimizing access to data across the cluster – Spark takes care of that automatically. Spark also offers additional components so you can query your datasets, apply machine learning algorithms, process streaming data in real time, and graph results.
When it comes to practical application, tech startup Toptal offered the example of businesses using Spark to mine information pouring in from social media feeds like Twitter. Users can trigger marketing actions based on specific mentions of a company's products or events, and separate data by positive or negative comments. In a broader example, a developer was able to use Spark to perform predictive analytics of who would survive a Titanic-like disaster or when an earthquake might occur.
The XLMiner difference
Once you've used Spark and one of its distributed file system options (which include Hadoop) to store your data, you need a way to make the information easily available to your business analysts. XLMiner provides the tools you need to access your data, and take advantage of innovative data mining, in the Excel spreadsheet environment they feel most comfortable using.
XLMiner makes Spark Big Data simple, by bringing results into Microsoft Excel – and it does more than just provide an SQL query or ODBC connection to Spark. A Frontline Systems server actually runs in the Apache Spark cluster, and is able to run code on each of the computers in the cluster, to efficiently perform operations such as drawing a statistically representative sample across billions of records. As XLMiner communicates with Spark, users just have to point and click to get the data they want.
Your employees can pull sampled or summarized data from Spark clusters to drive data mining, text mining or forecasting models, or drive simulation or optimization models on their desktops in Microsoft Excel. Once they perform the task they want to in Excel, they can simply push the results to Tableau and Microsoft BI with the click of a button.