Caveman's Blog

My commitment to learning.

Panasonic BLT – Better Living Tomorrow

leave a comment »


Last week Panasonic has unveiled a new invisible TV [1] at the CEATEC electronics expo in Japan. Check out the amazing prototype in the Youtube video below.The glass screen is transparent allowing to see the wall, shelving behind the TV. Space optimization is the first thing that comes to mind when thinking of this cool  looking (or not) TV such as this one. The interactive glass door on the wine cooler is fantastic and could potentially be very powerful IOT (Internet of things) [2] marketing tool for user engagement at brick and mortar businesses. Towards the end of the demo there is a showcase of an interactive tabletop dining table that allows for localized cooking/warning of your food. Enjoy !

References:

  1. #IoT & the Future of Living | #Panasonic LIVE@CEATEC 2016
  2. Intel IoT — What Does The Internet of Things Mean?

Written by cavemansblog

October 9, 2016 at 1:19 pm

Posted in Technology

Ignite 2016 – Adobe and Microsoft partner on Azure

leave a comment »


Adobe and Microsoft Corp. announced plans for a strategic partnership to help enterprise companies embrace digital transformation and deliver compelling, personalized experiences through every phase of their customer relationships. Together, the two companies will enable businesses to dramatically strengthen their brands through solutions with Microsoft Azure, Adobe Marketing Cloud and Microsoft Dynamics 365 [1]

Having successfully completed a few Digital Transformation initiatives in the past and present for a few brands and also working with a Salesforce partner to integrate SFMC with Azure cloud base systems, I am curious learn more about the Adobe Marketing Cloud platform and understand how it is different from Salesforce Marketing Cloud (SFMC) integration. Very exciting indeed !

My curiosity with this announcement is specifically around the Adobe Marketing Cloud and Azure as its preferred cloud platform. Cloud marketing is the process of an organisations efforts to market their goods and services online through integrated digital experiences, by which they are specialized for every single end user.[2]

There are several popular marketing cloud platforms from companies such as Salesforce, Adobe, Oracle and IBM, available at this time for businesses who intend to implement and make progress with their digital transformation/disruption efforts. Joining the customers on a one-one journey to personalize, interact and communicate with the customer at the an appropriate time using predictive intelligence seems to be next step big step with employing marketing cloud clouds.

References:

  1. Adobe and Microsoft partner on Azure – Youtube
  2. Cloud Marketing – Wikipedia

Written by cavemansblog

September 27, 2016 at 9:49 pm

Posted in Azure

BCP: export data to azure database

leave a comment »


BCP (bulk copy program) is a great way to transfer huge data (millions of rows) to an Azure database. I found that it is very efficient in terms of ease of use and speedy with which the data could be transferred. The bcp utility is a command line utility that ships with Microsoft SQL Server. It bulk copies data between SQL Azure (or SQL Server) and a data file in a user-specified format. The bcp utility that ships with SQL Server 2008 R2 is fully supported by SQL Azure.

Following are the steps to use this utility to export the data to a flat file and then exporting the data to the Azure database:

  • Copy data from customer table into a temp table
    • select * into [temp_employee] from employee
  • Copy data to local file from a table in a local database.
    • BCP [server_name].[database_name].[schema_name].[temp_employee] OUT c:\temp\employee.txt -T –c
  • Create the temp table in the destination database in the cloud
  • Copy data into the azure database.
    • BCP [azure_database_name].[schema_name].[employee] IN c:\temp\employee.txt -S servername.database.windows.net; -U <username>@<servername> -P <password>

Following is a sample of the Bulk copy output form the command window for reference. The most interesting aspect to observer is that data transfer rate stood at 5206 rows per second for this particular attempt. I believe that higher transfers could be a possibility depending on the available network bandwidth.

Exporting Data out of SQL Azure

Data can be retrieved in a a huge set from the in SQL Azure by running the following statement at the Windows command prompt:

bcp [database_name].[schema_name].[table_name] out C:\temp\[file_name].txt -c -U username@servername -S tcp:servername.database.windows.net -P password

References:

  1. BCP and SQL Azure
  2. BCP Utility

Written by cavemansblog

September 27, 2014 at 8:07 pm

Predictive Analysis with Monte Carlo Simulation

leave a comment »


Introduction

I got interested in Monte Carlo simulation during one of my weekend reading, the more I read the more I appreciated the computational algorithm. This method is used to obtain a result (with a degree of probability) using random input with a knowledge of boundaries. The modern version of the Monte Carlo method was invented in the late 1940s by Stanislaw Ulam, while he was working on nuclear weapons projects at the Los Alamos National Laboratory. It was named by Nicholas Metropolis, after the Monte Carlo Casino, where Ulam’s uncle often gambled. [1] In this post I will briefly cover the basics of Monte Carlo Simulation and then I will detail a scenario where I was able to apply the basic method using MS-Excel for business capacity planning for a fictitious company.

Monte Carlo Simulation

Monte Carlo methods (or Monte Carlo experiments) are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results; typically one runs simulations many times over in order to obtain the distribution of an unknown probabilistic entity. [1] This is a stochastic simulation as there is a degree of randomness involved.  The outcome of the simulation most likely will not be same every time. Since the outcome varies with every simulation, the question that begs an answer would be; how many times does the simulation needs to be run to gain confidence in the outcome that is generated. A randomly selected samples will exhibit the same behavior as the entire population from which it is selected is the assumption behind this method. By running the simulation using more number of times the experiment becomes more reproducible. Inferential statistics obtained by running the simulation several thousand times will produce a generalized result.. The range of the results gets narrower as the number of random samples increases. Using all the results of the simulations we can run perform statistical analysis to generate the various outcomes and the corresponding probability of occurrence. Following are the four steps involved in applying this model for analysis of a given situation:

  1. Identify a mathematical model of the activity or process you want to explore.
  2. Define the parameters (like mean and standard deviation) for each factor in your model.
  3. Create random data according to those parameters.
  4. Simulate and analyze the output of your process.

Capacity Planning

In this example, we have a online business that has been in running for the last few years. This company has been seeing an increase in the number of orders it has been receiving year-over-year. The business would like to perform capacity planning so that they can prepare their operations expansion to handle the transaction load for the next five years. The key information they need at this time is the total number of orders that their business could potentially generate in the fifth year from now. Let us look at how we can use Monte Carlo as a computation model using using Excel to provide this key information.

The following table shows the number of orders that the business received every year for the last 10 years. The table also shows the rate of increase in the orders for the same period. We will use this input information to determine mean rate of increase in the orders and also the standard deviation in the year-over-year order count. The mean is computed to be the average of the growth rate for the last 10 years and has been computed to be at 17.95%. The Standard Deviation, which is the  the amount of variation or dispersion from the average, was calculated to be 53.98 %.

MonteCarlo - Input Data

Let us assume that rate of increase follows a normal distribution. I have used the NORM.INV(RAND(), Mean, SD) to introduce the randomness to the distribution. Now we can apply the same distribution for the next five years to obtain our first possible outcome using the mean and the standard deviation. The order count of 2014 has been computed using the formula p (1 + r) where “p” is the previous years order count  and “r” is the computed rate of growth for the year. When we apply the same formula for the next five years we get the following results and these results vary every time we run the simulation with a new randomness.

MonteCarlo - Iteration 1

Now we can simulate running the same experiment several hundred or thousand times using the Data Table feature of “What-if” analysis in MS-Excel. We can then compute the mean of all the outcomes from the simulations to arrive at a possible order count which could be the very close to the  actual value, if the same rate of growth is sustained for the next five years. In the following two pictures \some results from the simulation of 1000 iterations have been displayed with actual values and a graphical representation of the iterations plotted against the 1000 outcomes.

MonteCarlo - Sumulation

MonteCarlo - Sumulation Graph

The above graph shows the results of the 1000 iterations plotted against the computed order count for the year 2018 which is the output of each iteration. Analyzing the output of the simulation we can determine that the mean of all the 1000  outcomes appears to be spread between 1.0 and 1.1 million orders. I had later run the same simulation with 10,000 iterations and I had noticed that the mean outcome spread has tightened further. Depending on the application the number of iterations can be very high only limited by the computational power at-hand. This information could be used by the business to plan for handling future work load. We have seen how Monte Carlo simulation is one of the tools that can be used to predict future values from a known set of uncertainty in input with boundaries. This computation model is used in many fields of Mathematics, Physical Sciences, Engineering, Finance, Project Management, Computational Biology to name a few.

References:

1. Monte Carlo Methods – Wikipedia
2. Monte Carlo Is Not as Difficult as You Think – Eston Martz

Written by cavemansblog

August 21, 2014 at 1:45 pm

SQL Server Storage: Files, Filegroups and RAID

leave a comment »


Introduction

In this blog I will briefly write about the files and the filegroups of a SQL Server database, followed by describing the most popular data storage technology called RAID – Redundant Array of Independent Disks, along with covering some internals of this technology and its applicability. Having a understanding of the various database files, RAID, RAID levels will be crucial in building a SQL Server that can scale bigger, restore faster and query quicker.

SQL Server Files and Filegroups

SQL Server database stores data and transactions in separate files. Data associated with tables, indexes, and large object data is stored in data files. The transactions are sequentially stored in a log file. The Data and transactions of a databases can be stored in three types of files.

  • Primary data files – Every database has one primary data file. Recommended file name extension is .mdf. This is merely a best practice and your company can have a policy for a unique extension.
  • Secondary data files – Secondary data files make up all the data files, other than the primary data file. This file will exist only when it has been defined in the database setup.  Recommended file name extension is .ndf.
  • Log files – Log files contain the information that is required to recover all transactions in the database. Log files are sequential.  There must be at least one log file for each database, although there can be more than one. Recommended file name extension is “.ldf”. The log files are truncated automatically when using the SIMPLE recovery model, but not when using BULK LOGGED or FULL recovery

RAID

The most likely part to fail in a computer or a storage device, because it probably is the only moving part. A hard drive typically is spinning anywhere from 5400 to 15000 rotations per minute (RPM) and stands of chance for failure. RAID is the technology that is the solution to the above problem. RAID stands for Redundant Array of Independent Disks, is a redundant system with multiple hard drive that is used to store your data. Basically it allow you to turn multiple physical hard drives into one logical hard drive. Depending on the type of configuration employed in designing the RAID, failure of one or more drives will not impact your data. This will act as a critical system that can save your data from loss and should be considered for data storage. There are many RAID levels , however three basic types of RAID levels are widely used by companies and they are RAID 0, RAID 1 and RAID 5. RAID 0 is called as disk striping, RAID 1 is called disk mirroring and RAID 5 is called disk striping with parity.

RAID 0 – Disk Striping

Disk striping in RAID 0 combines physical drives into one large logical drive with the maximum capacity being the sum of individual hard disk sizes. For example if you have 2 disks with individual capacity of 1TB in your RAID, you would effectively have 2 TB of usable disk space to store data. Data is distributed across all the drives and where data stripes are distributes evenly across all the physical disks. The size of the data stripe can vary by the design determined by your DBA or the one demanded by your application. Let us assume that you have a stripe size of 16 bits, when a data block of size 64 bits is written on a RAID 0 with 4 disks, bits 1 – 16 are written on disk 1, bits 16 – 32 are written on disk 2, bits 17 – 48 are written on disk 3 and bits 49 – 64 are written on disk 4. Basically this raid level is striping your data across all the disks evenly, thus making writing and reading you data faster. However one big concern with this design is that if a hard disk were to fail, you will not be able to recover the data because you do not have a redundancy to reconstruct all the data.

RAID 1 – Disk Mirroring

Disk mirroring creates identical copies of your data. There would only be two drives involved in this type of RAID. Disk mirroring would make an identical image of the hard drive. In other words the data on one drive is automatically written on to the other hard drive. The primary advantage of this type configuration is in a scenario when one of the hard  drive fails, your server is still up and running because you have a complete backup all the data on another hard disk in the RAID. At this point while the sever is still running, you can swap the failed disk with a another hard disk and let the RAID rebuild the mirrored disk. RAID 1 is mainly for liability by introducing redundancy of data. Keep in mind that only 50 % of the total disk space can be used for data storage. For example if you have 2 disks with individual capacity of 1TB in your RAID, you would effectively have 1 TB of usable disk space to store data.

RAID 5 – Disk Striping with Parity

Disk striping with parity can be achieved with atleast 3 hard drives for its simplest configuration. In this case, the stripe 1 will be written to dive 1, stripe 2 will be written to drive 2 and the parity for the data written on drive 1 would be written to the drive 3, followed by the parity for the data written on drive 2 written on drive 1 followed by stripe 4 on drive 2 and stripe 5 on drive 3. It is hard to visualize the last sentence, but you have to understand that this allows for a failure of one hard drive. The parity introduced in this type of RAID will be used to rebuild the failed drive, because the other two drive contain all the data. Another advantage it offers is that you have more disk space when compared with RAID 1 and not as much as RAID 0 though. For example if you have 3 disks with individual capacity of 1TB in your RAID, you would effectively have 2 TB of usable disk space to store data and if you were to have 4 disks on your RAID with 1 TB capacity on each of then, the total usable disk space would be 3 TB . Basically you will be using the the size equivalent of one hard disk for redundancy.

RAID Level 2, 3, 4, 6, 7, 8, 9, 10, ……

All the other RAID levels are a combination of a basic levels 0, 1, 5. For example RAID 10 is a combination of RAID 1 and RAID 0, where your data is mirrored in addition to being striped across the disks. RAID 6 is a combination of RAID 5 and RAID 1, where your data is striped with parity along with it being mirrored for additional redundancy.

Summary

Redundant array of independent disks is a system that allows for configuring hard disks to store data efficiently and also provide fault tolerance. Raid level 0 can ideally be used for fast read and write capability with the ability to combine several smaller disks into one large logical drive.  Caution should be exercised in maintaining a backup scheme for your data as there is no redundancy provided by this level. RAID level 1 is purely a play on providing redundancy for your data. RAID level 5 provides improved performance with writing and reading data in addition to providing some redundancy. RAID 1 and 5 allows for hot swappable disks, meaning to say, the failed drives can be replaced and rebuild while the server is still running. RAID is the right tool for business because it delivers flexibility and scalability so that you can create and manage enterprise data storage systems in the shortest possible time.

References:
1. Files and Filegroups Architecture
2. RAID Levels and SQL Server

Written by cavemansblog

June 12, 2014 at 9:44 am

Posted in Uncategorized