Like few more folks I think that there was some kind of misunderstanding in mixing Big Data and SAP HANA into one bag. We touched on this topic in the recent podcast “Debating the Value of SAP HANA”, but I would like to spend few more minutes here to explain my thoughts.
SAP HANA has been created with traditional SAP Business Suite and Business Warehouse (BW) customers in mind. How big is the biggest single SAP software installation in the world in terms of single-store data size? I do not know exactly. The times of the proud “Terabyte Club” are in the past. Four years ago it was loud about 60TB BW test SAP did. The biggest customer I worked with had 72TB database of BW data. So, I would assume that the biggest SAP instance is somewhere close to 120 TB. That’s still a lot of data not just to process, but as well to manage (think back-ups, system upgrades, copies, disaster recovery etc)… Besides current technical limitations – 8TB biggest certified hardware configuration and 2 billion records limit in a single table partition – SAP HANA is on the way to help SAP ERP and BW customers with those challenges. But those are not what the industry calls “Big Data”.
Here are main differences as I see them:
- Data sizes we are discussing with SAP HANA are in the ballpark of few terabytes, while Big Data currently is something in single digit petabytes. E.g. HP Vertica has 7 customers with a petabyte or more of user data each accordingly to Monash Research.
- Current focus of SAP HANA is structured data, while Big Data issues are generated by mostly unstructured data: web, scientific, machine-generated. Fair to mention though that SAP is working on Enterprise Search powered by HANA, as Stefan Sigg, VP In-Memory Platform in SAP, told me during this TechEd Live interview.
- Currently Big Data processing is almost a synonym with a MapReduce software framework, where huge data sets are processed by a big cluster of rather cheap computers. On the other hand SAP in-memory technology requires “a small number of more powerful high-end [servers]” accordingly to Hasso Plattner’s “In-Memory Data Management: An Inflection Point for Enterprise Applications” book.
- Related to the point above is that in SAP HANA the promise is the real-time, where fact is available for analysis subseconds after occurrence. In Big Data algorithms processing is mostly batch based. My previous blog’s post became available in results of the Google Search and in Google Alert only 4 days after being posted – not quite real-time, huh?
- SAP HANA data analyses are most often paired with SAP BusinessObjects Explorer – modeless visual data search and exploration. Use of MapReduce libraries on top of Big Data requires advanced programming skills.
During SAPPHIRE’11 USAkeynote speech Hasso Plattner mentioned MapReduce as a road map feature for SAP HANA, but since then I haven’t gotten any specifics what it means. Instead silently announced Release 15.4 of Sybase IQ has introduced some features focused on analyses of Big Data in their original meaning. Is there a silent revolution in SAP going on the Sybase side, while all eyes are on the HANA product?