Wednesday, October 13, 2010

Divide and Conquer Reporting By Scaling Out SQL Server with Replication

This is my first blog post and surely won’t be my last. I have talked about creating a SQL BI blog for a few years and over the last few weekends I finally gathered my thoughts and documents I wrote over the years to share, so here goes the RTM version.  If there are any errors, they will be fixed in SP1 :-)

A common situation I encounter at various clients is that they perform heavy reporting out of the same transactional databases that power revenue driving sites.  This usually hinders the performance and decreases revenue generation, not a good situation.  The cost of additional hardware or additional SQL Server licenses is sometimes hard to justify, but if its possible to show that slow transactional throughput hinders revenue generation, or if the lack of timely reporting is limiting visibility for additional revenue opportunities, it might be a good place to start.

So now that you have wowed the folks with budgets that can buy servers and SQL Server licenses with this amazing information, how do you take the data and have it duplicated to your new SQL Server in real time so reporting can be performed there? Real time was the requirement for these particular clients The answer is simple, transactional replication.  I am using transactional replication in this example because the last two times I provided this architectural solution to my clients, one was still using SQL 2000 and the other only needed a few tables of a very large SQL 2005 database. You can use snapshot replication, database mirroring for SQL 2005 or later, or even log shipping. It all depends on the requirements but that won’t be talked about in this example and you should research each option for your situation.

To implement SQL Server Replication you need to setup the database server that you would like to replicate as a publisher, you need a distributor this manages the transactions in terms of reading, storing and sending, and you also need a subscriber this contains your duplicated database(s) you will use for future reporting. 

Clients that I have provided consulting services for struggled with the next step of the replication architecture on more than one occasion; that is where to configure the distributor and subscriber. This is where it gets interesting, you may or may not need a separate server for each. As a general rule of thumb for smaller publisher database implementations small in size, small number of concurrent users, small number of transactions, I usually place the distributor and subscriber on one server separate from the publisher. For larger database implementations large in size, large number of concurrent users, large number of transactions, I usually place the distributor on its own server and the subscriber on its own server both separate from the publisher.  I’ll say it again, always separate the distributor and subscriber from the publisher as that usually avoids additional overhead on the publisher. Your original goal was to take load off the publisher with this architectural solution and there is a reason you are dividing and conquering, not adding then dividing and conquering in my opinion. You should test your scenario thoroughly before deciding on the final replication architecture and see what works best for you.

Another transactional replication recommendation I have is that after your publisher, distributor, and subscriber are all setup and you begin to add the tables as articles you would like to replicate I would ensure that you DO NOT use separate distribution agents per publication if you set up your transactional replication as per table publications The reason is parent child relationships. For example, a parent table is heavily transactional and the child table is lightly transactional, the child records can be delivered to the subscriber database before the parent records are delivered and reports that need to join the two for a complete data set will not be able to make the join and lead to missing results.

I feel like I could write a lot more on this topic with code examples etc. but am trying to keep it architectural in nature.  Also, there are always exceptions to the above but generally speaking in my experience this is what worked for my clients.  I hope this helps you divide and conquer your transactional database reporting.

No comments:

Post a Comment