Tag Archives: Big Data

Calculating number π by throwing darts: digitally in SAP HANA

In my previous blog I promised exploration of Big Data on SAP HANA, express edition. But remember, the Big Data is not only about the Volume, but about Variety (of data types) as well. And this is the route I chose first to look at the fun stuff you can do with spatial data processing in SAP HANA.

Ever since I enjoyed “Calculating Pi with Darts” video from Physics Girl and Veritasium [which you should watch too!] I have thought about repeating it. The world is going digital, so obviously I meant using SAP HANA for that. I know I should have done it during the PI Day (3/14/16, or on the 14th of March 2016), but better later than never!

Calculating pi number with darts is one of the Monte Carlo methods of getting its approximate value. Accordingly to Wikipedia “[…] method for computing π is to draw a circle inscribed in a square, and randomly place dots in the square. The ratio of dots inside the circle to the total number of dots will approximately equal π/4”

Looked like SAP HANA’s spatial capabilities would fit perfectly for that. If you are not familiar with spatial processing I prepared four introductory tutorials that should not take more than 20 minutes for your to complete and understand all basic concepts needed to follow the rest of the blog. And if you do not have SAP HANA Express yet, then it is 10 minutes to get it. Alternatively you can use as well SAP HANA MDC instance in your HCP Trial account as we are still not talking about huge volumes of data here.

  1. Points: http://www.sap.com/developer/tutorials/hana-spatial-intro1-point.html
  2. Lines and strings: http://www.sap.com/developer/tutorials/hana-spatial-intro2-string.html
  3. Areas and polygons: http://www.sap.com/developer/tutorials/hana-spatial-intro3-polygon.html
  4. Spatial columns in tables: http://www.sap.com/developer/tutorials/hana-spatial-intro4-columns.html

Virtual dart hits are points with random X and Y coordinates (objects ST_Point). The dartboard is a disk (ST_Buffer() around a ring’s center point). And then calculation of the average of hits within an area of the disk (ST_Within() method).

First I need a table with a spatial column, which will store coordinates of my digital hits, plus a procedure to populate this table with required number of attempts.

CREATE SCHEMA "TESTSGEO";
SET SCHEMA "TESTSGEO";

--DROP TABLE "TESTSGEO"."SPATIAL_CALCPI";
CREATE COLUMN TABLE SPATIAL_CALCPI(
	POINT ST_POINT
);

--DROP PROCEDURE "TESTSGEO"."COLLECT_HITS";
CREATE PROCEDURE collect_hits (IN attempts INT)
 LANGUAGE SQLSCRIPT AS
 iter INTEGER;
 BEGIN
    iter := 1; 
    WHILE iter<=attempts DO
        INSERT INTO "TESTSGEO"."SPATIAL_CALCPI" VALUES (new st_point(RAND(), RAND()));
        iter := iter+1;
    END WHILE;
    MERGE DELTA OF "TESTSGEO"."SPATIAL_CALCPI";
 END;

Now let’s check the result of throwing 2000 virtual darts and what the PI number approximation will be!

--TRUNCATE TABLE "TESTSGEO"."SPATIAL_CALCPI"; 
CALL "TESTSGEO"."COLLECT_HITS"(ATTEMPTS => 2000);

--Check the results of throwing: coordinates and if hit dartboard
SELECT
  POINT.ST_asWKT(), 
  POINT.ST_Within(NEW ST_Point(0.5,0.5).ST_Buffer(0.5)) as IN_CIRCLE 
FROM "TESTSGEO"."SPATIAL_CALCPI";

--Calculating PI using Monte Carlo formula
SELECT
  4*AVG(POINT.ST_Within(NEW ST_Point(0.5,0.5).ST_Buffer(0.5))) as PI 
FROM "TESTSGEO"."SPATIAL_CALCPI";

Results I got in my system were between 3.11 and 3.21. Well, very rough approximation of number π 🙂

Let’s visualize the results by generating SVG with a dartboard and all generated hits.

SELECT
  ST_UnionAggr(POINT).ST_Union(NEW ST_CircularString('CIRCULARSTRING(0 0.5, 1 0.5, 0 0.5)')).ST_asSVG() AS DARTBOARD 
FROM "TESTSGEO"."SPATIAL_CALCPI";

I did a minor modification of the SVG to have a circle in red.


Then I tried 50000 attempts, but the result was 3.1168. So, no much improvement over previous attempts.

PS. Obviously using below SAP HANA spatial method calculating a circle’s circumference when diameter is 1 would be much faster and precise way to get the pi. But – hey! – it would take away all the fun of throwing digital darts 😉

SELECT 
  NEW ST_CircularString ('CircularString (0 0.5, 0 1.5, 0 0.5)').ST_Length() as PI 
FROM DUMMY;

--Result is PI 3.141592653589793

Please let me know what pi numbers you got by throwing digital darts in your SAP HANA instances.

PS. Republished from my blog https://blogs.sap.com/2016/12/14/calculating-number-%CF%80-by-throwing-darts-digitally-in-sap-hana/

Advertisements

Leave a comment

Filed under HANA

Days 3 and 4 of ASUG SBOUC’12: More Education, Predictive Analysis and … see you next year

I am back home in Wrocław after ASUG SAP BusinessObjects User Conference (SBOUC) in Orlando, USA. Three and a half intensive days, yet I still feel like there was not enough time to discuss everything with everyone. But the first SAP InnoJam is just around the corner, so the time to pack and see folks again will come soon.

For the moment, let me go back to SBOUC to share highlights of the last two days.

More about Networking

Spammers attack at #SBOUC

In my previous post, I mentioned about the importance of face to face interactions. Social networking tools – no matter if we like them or not – play their important role too.  When during his Thursday’s keynote Don Tapscott asked how many in the audience are using Twitter, about one third raised their hands.

I am not an addicted fan of Twitter, but it became a handy tool for me. I don’t like the one player dominance on the market, I wish App.netPath and alike good luck. Just remember “A fool with a tool is still a fool”.

And remember there are different kinds of “smarts” too. There was a moment, when spammers found that #SBOUC is trending on Twitter, and started their attack. First time I saw something like this (see picture). New tweets and handlers were coming faster than you were able to get what’s going on.

My session on SAP HANA

My session was obviously on the topic of SAP HANA. It was an updated version of the last year’s session, but now with the focus on where and how you can learn more about in-memory data management and where you can practice to gain hands-on experience.

If SlideShare content is not properly shown as embedded above, you can review it directly here.

Surprising that so many people want to learn SAP HANA and are asking for access to the software, yet so few are familiar with two SAP HANA Developer Center offers:

  1. 30-days Test&Evaluation access to pre-configured virtual desktop hosted by CloudShare. The benefit of that option is that the desktop has as well popular and new SAP BusinessObjects BI tools: Explorer, Analysis for MS Office, Visual Intelligence.
  2. Free developer edition of SAP HANA database hosted in the cloud. For the moment the only choice is Amazon WebServices.

Our Developer Experience team as SAP is responsible for Developer Center. One of our new projects is to add BusinessObjects to the family of DevCenters. If you have any suggestions or comments on BObj, HANA, Sybase or any other SAP technology on the Developer Center – please let me know in comments, via twitter @Sygyzmundovych, or just by sending old good e-mail to my SAP address.

SAP Analytics and Hadoop

It became usual to hear (mostly from the same people) that SAP’s innovations focused only on SAP HANA. Well, there were very good sessions during SBOUC showing integration of SAP Analytics products with Hadoop too. Below are some captures with session numbers, so you can download full sessions from ASUG Online yourself.

Reporting on top of a universe sourcing from Hadoop (session 1210 “SAP BusinessObjects BI 4.0 FP3 on Apache Hadoop Hive”)

SAP DataServices and Hadoop (session 202 “Another Buzz Word – Hadoop! Or is That Something a Regular Person Can Use?”)

Text Processing by SAP HANA and Hadoop (session 211 “Unstructured Data: Taming the Textual Tsunami with SAP HANA & Hadoop”)

New SAP BusinessObjects BI tools

In my previous post I mentioned new SAP BusinessObjects BI product called Visual Intelligence. Sessions around two other new products got no less interest:

SAP Predictive Analytics:

SAP Predictive Analysis positioned in the context (session 805 “Demonstration of SAP BusinessObjects Predictive Analysis 1.0 and Its Consumption from SAP BI Clients”)

SAP BusinessObjects Design Studio (aka SAP Zen):

Building BI app with SAP BO Design Studio (session 109 “SAP ZEN – BI Applications and Dashboard Designer”)

Custom Development with SAP BusinessObjects

Another topic interesting for me was the custom development with SAP BusinessObjects. Unfortunately not many people are aware of SAP BO SDKs and APIs, and even at SBOUC there were only two sessions on this topic. I hope to promote and to see more next year.

Use of SAP BusinessObjects SDK in BP (Session 501 “Free SDK Utilities to Help Manage Your Business Objects”)

Best Practices (Session 1307 “Introduction to the SAP BusinessObjects BI 4.0 RESTful and Crystal Reports JavaScript Viewer SDKs”)

SAP Analytics Forum

Unfortunately, because of my flight schedule I attended only one session during the SAP Analytics Forum at the day 4 of the conference. The session was Deloitte’s internal implementation of reporting using Crystal Reports and SAP HANA. Nothing speaks better than an example.

Deloitte presentation of using Crystal Reports with SAP HANA (click to enlarge)

See you next year…

… at ASUG SAP BusinessObjects User Conference 2013 in Anaheim, California.

In the meantime…

… if you feel that my posts haven’t covered the conference enough, please have a look at other blogs:

Still not enough? Then you need to experience and describe it yourself 🙂

Leave a comment

Filed under ASUG, BusinessObjects, Hadoop, HANA, SAP

Big Data and SAP HANA? Or Sybase IQ?

Like few more folks I think that there was some kind of misunderstanding in mixing Big Data and SAP HANA into one bag. We touched on this topic in the recent podcast “Debating the Value of SAP HANA”, but I would like to spend few more minutes here to explain my thoughts.

SAP HANA has been created with traditional SAP Business Suite and Business Warehouse (BW) customers in mind. How big is the biggest single SAP software installation in the world in terms of single-store data size? I do not know exactly. The times of the proud “Terabyte Club” are in the past. Four years ago it was loud about 60TB BW test SAP did. The biggest customer I worked with had 72TB database of BW data. So, I would assume that the biggest SAP instance is somewhere close to 120 TB. That’s still a lot of data not just to process, but as well to manage (think back-ups, system upgrades, copies, disaster recovery etc)… Besides current technical limitations – 8TB biggest certified hardware configuration and 2 billion records limit in a single table partition – SAP HANA is on the way to help SAP ERP and BW customers with those challenges. But those are not what the industry calls “Big Data”.

Here are main differences as I see them:

  • Data sizes we are discussing with SAP HANA are in the ballpark of few terabytes, while Big Data currently is something in single digit petabytes. E.g. HP Vertica has 7 customers with a petabyte or more of user data each accordingly to Monash Research.
  • Current focus of SAP HANA is structured data, while Big Data issues are generated by mostly unstructured data: web, scientific, machine-generated. Fair to mention though that SAP is working on Enterprise Search powered by HANA, as  Stefan Sigg, VP In-Memory Platform in SAP, told me during this TechEd Live interview.
  • Currently Big Data processing is almost a synonym with a MapReduce software framework, where huge data sets are processed by a big cluster of rather cheap computers. On the other hand SAP in-memory technology requires “a small number of more powerful high-end [servers]” accordingly to Hasso Plattner’s “In-Memory Data Management: An Inflection Point for Enterprise Applications” book.
  • Related to the point above is that in SAP HANA the promise is the real-time, where fact is available for analysis subseconds after occurrence. In Big Data algorithms processing is mostly batch based. My previous blog’s post became available in results of the Google Search and in Google Alert only 4 days after being posted – not quite real-time, huh?
  • SAP HANA data analyses are most often paired with SAP BusinessObjects Explorer – modeless visual data search and exploration. Use of MapReduce libraries on top of Big Data requires advanced programming skills.

During SAPPHIRE’11 USAkeynote speech Hasso Plattner mentioned MapReduce as a road map feature for SAP HANA, but since then I haven’t gotten any specifics what it means. Instead silently announced Release 15.4 of Sybase IQ has introduced some features focused on analyses of Big Data in their original meaning. Is there a silent revolution in SAP going on the Sybase side, while all eyes are on the HANA product?

5 Comments

Filed under HANA, SAP