Feeds:
Posts
Comments

Archive for February, 2011

Multi Server Administration

Multiserver administration is the process of automating administration across multiple instances of Microsoft® SQL Server™.

With multiserver administration, you must have at least one master server and at least one target server. A master server distributes jobs to and receives events from target servers. A master server stores the central copy of job definitions for jobs run on target servers. Target servers connect periodically to their master server to update their list of jobs to perform. If a new job exists, the

target server downloads the job and disconnects from the master server. After the target server completes the job, it reconnects to the master server and reports the status of the job. In this topics we will talk about how to set up Multi Server Administration with SQL Server 2005.

Steps to setup Multi Server Administration:

(1)  Check if Master and Target Server have installed SQL Server 2005 SP2.

(2)  SQL Server Services and SQL Agent Services should not be running with local account.

(3)  In case if u want to use default Encryption (no certificate) or Encryption with Certificate Certificates

Open SQL Server Configuration, navigate to SQL Server 2005 Network Configuration and Select the Instance You want to Set Encryption.

Click to Open Properties Window

To enable default encryption select option (yes) and click OK .In Case if u want to enable

Encryption with Certificate. First Install the certificate on master and target server, navigate to the

Certificate Tab of Instance Protocol Properties and select the Certificate.

Note: Encryption needs to enable on both Master and Target Server.

Once Encryption is enabled at Master and Target Server, restart the services of master and target server.


Configure Multiserver Administration

  • Register the Master Server and Target Servers in Consol Management of SQL Server 2005.

In this Example:-

Master Server: snmain-db (172.20.0.12)

Target Server: REHDB (172.20.1.131)

Add these Servers to Registered Servers Using SSMS in toolbar Click View-> Registered Servers

Right Click Database Engine Click New->Server Registration->Type Server Name Or IP Address with Authentication


  • Expand Master Server and Navigate to SQL Server Agent–>Right Click–>Select Multi Server Administration –>Make this a Master

Master Server Wizard will pop up.

Click Next

In Master Server Operator Wizard Provide email address for notification. As master server

operator will be created on the master server and each of the target servers.

Click Next

Select the target server from the registered server window and move the same to target server

List box. If need you can provide the description of the target server. Connect the target server from

The connect box under target server list box. Click Next to Check the version compatibility of the

Master and target server.

Click Next

Click Close after finished Checking Server version Compatibility.

  • You will be navigated to Master Server Login Credential window.

I have unchecked the option to create new login for master server, as i am logged in with my domain id with full permission.

Click next for Summary Wizard

Click Finish to setup the Multi Server Administration Task. Enlist Task should will pop up an

error message. (MSX enlist failed for Job Server ‘xyz’)

Restart Agent Services in Master and Target Servers

After Getting Error then to Configure registry Settings

If u read the error message it states “The Target Server cannot establish an encrypted connection

to the master server.”

By default SQLServer Registry values have setup encryption of level 2 (Enables full SSL

encryption and certificate validation between target server and the master server.)

In our case we are not using SSL encryption and Certificate validation, change the value from 2 to 0.

To configure the appropriate level of security required for a specific master server/target server

communication channel, set the SQL Server Agent registry subkey

\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL

Server\<instance_name>\SQLServerAgent\MsxEncryptChannelOptions(REG_DWORD)

on the target server.If you are not using a certificate for Secure Sockets Layer (SSL) encrypted

communications bet

Check Default Value for SQL Server Agent registry subkey

Change the Default value of SQL Server Agent registry key (MsxEncryptionOptions) to 0

After Successful Setup Master Server SQL Agent will show (MSX) and Target Server SQLAgent will show (TSX).

Create Multi Server Scheduled JOBs

Lets expand Master Server SQL Agent

SQL Server Agent(MSX)–> Jobs–>

You will see two new folder gets created by the name Local Jobs and Multi-Server Jobs. Local Job is for Master Server Only, where as Multi-Server will be responsible for all Target Server. We can   create the Job at Master and it will get deployed to the Targeted Server

Right Click on Multi Server Job-New Job

Select the Target Server for Job to be deployed.

 

Manage Target Servers

Master will take few second to deploy or modify the job at target server.

Lets Check the Target Server

Job can be executed from Master Server and Target Server. Last Execution history can be viewed

from Master server also.

The Step detail can only be seen from the Target Server only.

 

After Download job will run in Target Server.

 

Remove Target Server from Master Server

sp_msx_defect (Transact-SQL)

Removes the current server from multiserver operations.

Caution:
sp_msx_defect edits the registry. Manual editing of the registry is not recommended because inappropriate or incorrect changes can cause serious configuration problems for your system. Therefore, only experienced users should use the Registry Editor program to edit the registry. For more information, see the documentation for Microsoft Windows.

Syntax

sp_msx_defect [@forced_defection =] forced_defection

Arguments

[ @forced_defection =] forced_defection

Specifies whether or not to force the defection to occur if the Master SQLServer Agent has been permanently lost due to an irreversibly corrupt msdb database, or no msdb database backup. forced_defectionis bit, with a default of 0, which indicates that no forced defection should occur. A value of 1 forces defection.

After forcing a defection by executing sp_msx_defect, a member of the sysadmin fixed server role at

EXECUTE msdb.dbo.sp_delete_targetserver @server_name = ‘tsx-server’, @post_defection =  0

 

Read Full Post »

When we talk about statistics in SQL Server, we’re not discussing a database to store your baseball card collection. No, statistics in SQL Server refers specifically to information that the server collects about the distribution of data in columns and indexes. This data in turn is used by the query optimizer to determine the plan of attack for returning results when you run a query. While in the majority of cases SQL Server takes care of this statistical recordkeeping for you automatically, it’s useful to have some understanding of what’s going on, and to know when you might need to intervene.

Automatic Statistics

By default, SQL Server databases automatically create and update statistics. Although you can disable this (by setting a property of the database), you should almost certainly leave it on. SQL Server needs these statistics to do a good job with query processing, and it’s likely to be smarter than you about what and when to update. The information that gets stored includes:

  • The number of rows and pages occupied by a table’s data
  • The time that statistics were last updated
  • The average length of keys in a column
  • Histograms showing the distribution of data in a column
  • String summaries that are used when performing LIKE queries on character data

SQL Server maintains statistics on index and key columns of all of your tables – in other words, the columns that the query processor is able to use in various ways in queries depending on the data that they contain.

Viewing Statistics

You can get an idea of the statistics that SQL Server is maintaining on a table from either T-SQL or the SQL Server Management Studio user interface. Let’s look at the T-SQL solution first. The place to begin is with the sp_helpstats system stored procedure, which will list all of the statistics being maintained on a table. For example, in the AdventureWorks sample database, you can find the statistics on the Sales.SalesOrderDetail table this way:

sp_helpstats 'Sales.SalesOrderDetail', 'ALL'
 
statistics_name                                      statistics_keys
---------------------------------------------------- ---------------------------------
_WA_Sys_00000006_245D67DE                            SpecialOfferID
_WA_Sys_0000000B_245D67DE                            ModifiedDate
AK_SalesOrderDetail_rowguid                          rowguid
IX_SalesOrderDetail_ProductID                        ProductID
PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID  SalesOrderID, SalesOrderDetailID

Given an individual statistic, you can use DBCC SHOW_STATISTICS to retrieve detailed information:

 
DBCC SHOW_STATISTICS('Sales.SalesOrderDetail', IX_SalesOrderDetail_ProductID)
 
Name                           Updated              Rows                 
------------------------------ -------------------- -------------------- 
IX_SalesOrderDetail_ProductID  Oct 14 2005  1:59AM  121317               
 
Rows Sampled         Steps  Density       Average key length String Index
-------------------- ------ ------------- ------------------ ------------
121317               197    0.007550886   12                 NO 
 
(1 row(s) affected)
 
All density   Average Length Columns
------------- -------------- -------------------------------------------
0.003759399   4              ProductID
8.242868E-06  8              ProductID, SalesOrderID
8.242868E-06  12             ProductID, SalesOrderID, SalesOrderDetailID
 
(3 row(s) affected)
 
RANGE_HI_KEY RANGE_ROWS    EQ_ROWS       DISTINCT_RANGE_ROWS  AVG_RANGE_ROWS
------------ ------------- ------------- -------------------- --------------
707          0             3083          0                    1
708          0             3007          0                    1
709          0             188           0                    1
710          0             44            0                    1
711          0             3090          0                    1
(additional rows removed)

The first section of the DBCC SHOW_STATISTICS output gives you general information on the statistic, including how many rows went into building the statistic (SQL Server may use only a sample of the data if the statistic is on a non-indexed column). The second section shows the sets of columns that this statistic can be used for, and statistical measures of how well this statistic will help a query locate data in those columns. The final section is the histogram showing the distribution of data. The columns in this section show the key value that demarcates the range, the number of rows that are in the range but not equal to the upper bound, the number of rows in the range that are equal to the upper bound, the number of distinct values in the range (not counting the upper bound), and the average number of duplicate values in the table for this range (which helps measure the selectivity of the statistic). Even without understanding the math behind query optimization in detail, it should be obvious that this is useful information.

Of course, these days there are many developers and administrators who prefer to work with the GUI administration tools rather than raw SQL. Fortunately, all of this information on statistics is also available in SQL Server Management Studio. Connect to a database and expand the Object Explorer tree until you find the table of interest, then drill into the Statistics folder underneath the table. There you’ll find a graphical representation of all of the statistics on the table, as shown in Figure 1.

Of course, these days there are many developers and administrators who prefer to work with the GUI administration tools rather than raw SQL. Fortunately, all of this information on statistics is also available in SQL Server Management Studio. Connect to a database and expand the Object Explorer tree until you find the table of interest, then drill into the Statistics folder underneath the table. There you’ll find a graphical representation of all of the statistics on the table, as shown in Figure 1.

If you right-click on one of these statistics and select Properties, you’ll see the general information on the statistic, as shown in Figure 2.

Selecting the Details page shows detailed information on the statistic, as shown in Figure 3. You may recognize this format – the SQL Server Management Studio developers seem to have simply piped theDBCC SHOW_STATISTICS output to the user interface.

Updating Statistics

Just as the data in your database isn’t static, statistics aren’t either. As you add, delete, and update rows, SQL Server needs to adjust the statistics so that they still provide good guidance for the query optimizer. When you run a query, SQL Server checks statistics used in that query’s plan to see whether they’re outdated. It uses these heuristics to determine if a statistic is outdated:

  • If the rowcount in the table has gone from zero to any nonzero value, statistics are outdated.
  • If the number of rows in the table was under 500, and at least 500 rows have been modified (added/deleted/edited), statistics are outdated.
  • If the number of rows in the table was over 500, and at least 500 plus 20% of the rowcount rows have been modified (added/deleted/edited), statistics are outdated.

Normally statistics are updated as soon as it’s discovered that they should be, which can hold up the execution of a query. You can set the AUTO_UPDATE_STATISTICS_ASYNC database property to have the update performed asynchronously instead, which may result in less-accurate query plans but faster start of execution.

Sometimes, you may find that the heuristics aren’t good enough for telling when statistics should be updated. For example, if you’re adding a lot of rows at the end of the range of a column (think about adding autonumbers, or timestamps), the histogram for that column will become outdated rapidly. You can force an immediate update from T-SQL by executing the UPDATE STATISTICS statement:

 

UPDATE STATISTICS table | view

[

{

{index | statistics_name}

| ({index |statistics_name} [,…n])

}

]

[WITH

[

[FULLSCAN]

|SAMPLE number {PERCENT | ROWS}]

|RESAMPLE

|<update_stats_stream_option>[,…n]

]

[[,] [ALL | COLUMNS | INDEX]

[[,] NORECOMPUTE]

] ;

 

<update_stats_stream_option> ::=

[STATS_STREAM = stats_stream]

[ROWCOUNT = numeric_constant]

[PAGECOUNT = numeric contant]

As you can see, you can update all of the statistics for a table or view, or pick a particular statistic to update by index name or statistic name. You can also specify some update options:

  • FULLSCAN forces a complete read of every row of data.
  • SAMPLE lets you specify how much of the data SQL Server should sample in building statistics. By default, it will figure out its own sampling coverage.
  • RESAMPLE uses the same sampling rate as the last time the statistic was gathered.
  • ALL, COLUMNS, or INDEX specify whether the update should be to all statistics, column statistics, or index statistics. The default is all statistics.
  • NORECOMPUTE turns off automatic statistics gathering for this table or statistic.

The Undocumented Options

If you check Books Online, you’ll find the update_stats_stream_option documented as “This syntax is for internal use only and is not supported. Microsoft reserves the right to change this syntax at any time.”

The STATS_STREAM option, so far as I know, remains undocumented. But we now know what the other two options do, thanks to the new blogging spirit of openness and a blog posting from the SQL Server Query Optimization Team. It turns out that ROWCOUNT and PAGECOUNT can be used to trick the query optimizer by letting you fake the internal metadata about a particular table – specifically the count of rows and count of pages that the table occupies.

If you’re going to experiment with this, please do not do so in a production database!

For example, you can vastly increase the apparent rowcount and pagecount of the Sales.SalesOrderDetail table by issuing this command:

 

UPDATE STATISTICS Sales.SalesOrderDetail

WITH ROWCOUNT=5000000, PAGECOUNT = 5000000

Why would you want to do this? SQL Server’s query optimizer does things differently on small tables and on large tables. During development, you likely don’t have a whole lot of data in your tables. Rather than spend time generating huge amounts of sample data, you can use the undocumented options ofUPDATE STATISTICS to see how the query optimizer will treat your tables after they’ve been in production for a while. This can be a quick and dirty method to uncover potential performance issues before they arise, and perhaps point out the need for additional indexes or other schema work. I’d recommend confirming any such changes by retesting with realistic amounts of sample data, though.

Using Statistics Wisely

In most cases, SQL Server 2005 is so well tuned in this area that you only need to remember a single best practice: leave the statistics set to automatically update and forget about them. If you’re having a problem with particular queries, there are a few things you can do to make statistics more accurate. These include running an update with FULLSCAN, creating multicolumn statistics if you have multiple-column WHERE clauses that aren’t covered by existing statistics, and updating statistics more frequently if you’re filling a table with ascending data. Armed with the ability to evaluate the quality of statistics, you should be able to decide whether you need to undertake any of these steps in your own databases.

 

 

Read Full Post »

Contents

  • Introduction
  • Replication topologies
    • Central Publisher
    • Central Subscriber
    • Publishing Subscriber
  • Replication Methods
    • How Replication Works
    • Snapshot Replication
    • Transactional Replication
    • Merge Replication
  • Replication Agents
    • SQL Server Agent
    • The Snapshot agent
    • The Log Reader Agent
    • Distribution Agent
    • Merge Agent
    • Queue Reader Agent
  • Monitoring Replication
  • Checking throughput
  • Validating
  • Changing the settings
  • Articles
  • Updating articles
  • Programming Replication Topologies
  • Further Reading

Introduction

Replication is intended to be a way of distributing data automatically from a source database to one or more recipient databases. As such, it can have obvious uses in a distributed system. It has also been used to implement high-availability systems. It is not useful for one-off synchronization, or for simply copying data. It is intended as a long-term data relationship between databases. Typical uses are in:

  • Data warehousing and reporting
  • Integrating data from several, possibly only partially connected, sites
  • Improving scalability and availability
  • Integrating heterogeneous data from other databases via OLE DB, integrating data from mobile users, getting data to and from Point-Of-Sale systems
  • Offloading/delegating batch processing tasks.

Examples of the use of replication within an application could be

  • Distributing data ‘owned’ by a particular application to other applications that are ‘consumers’ of that data. (For example, sales records to reporting services, manufacturing stock levels to purchasing systems.)
  • Creating several instances of a database to distribute load on it
  • Updating a central database with information from a number of remote Laptops that might be only partially connected, and to resynchronise the laptops.

There are three methods of replication: Snapshot, Transactional, and Merge. These are provided to try to meet the wide range of business requirements for replication.

Replication uses a ‘Magazine Publishing’ vocabulary. Anything from an ‘Article’ to an entire database can be replicated. An Article is the smallest component of distribution, and can be a table, procedure or function. If it is a table, then a filter can be applied to it so that only certain rows or columns are replicated. This ‘Magazine Publishing’ analogy can be confusing because, in replication, a Subscriber can sometimes make updates, and a Publisher usually sends out incremental changes to the articles in a publication.

A ‘Publisher’ maintains the original copy of the data. It holds the definition of the ‘Publication’, which defines the ‘articles’ that are to be ‘published’. (The database with the original location of the data determines what is to be distributed).

A ‘Subscriber’ receives the articles from a publisher. It can subscribe to one or more publications. Any database can take on either role or even both roles at once.

A Distributor is a specialist database that runs the ‘Replication agents’.

Replication is not part of the SQL Server engine, but an external application. This makes it much easier to involve other database systems in replication. Any SQL Server database, or other database system with an OLE DB provider, can be a publisher or subscriber in snapshot or transactional replication.

It is essential to plan out the replication in detail as a first stage, and to be very certain of the type of replication you wish to implement. A common mistake is to use replication in cases where a much less complex solution is possible.

A problem with production systems using replication is the difficulty of restoring the topology after a disaster.. This requires a fully documented recovery strategy, which has to be periodically tested and practiced. This means that the whole replication topology and configuration must be scripted so it can be re-created in an emergency, even if the system was originally built using the GUI tools. The Object Browser (or Enterprise Manager) makes the initial deployment relatively simple to do, and there are plenty of step-by-step guides, but it is not the best option when trying to restore an existing topology, and settings, in order to achieve a recovery from major failures.

Problems often follow from developers adding or dropping articles, changing publication properties, and changing schema on published databases. It is therefore best to create the replication once the design of the publisher is relatively stable.

Replication topologies

In most topologies, it makes sense for publishers, distributors, and subscribers to be on separate physical hardware.

Central Publisher

The commonest form of replication is to have a single publisher with the source data, with one or more subscribers. The Distributor database can be on the same, or preferably different, server.

Central Subscriber

Often, where the data from several databases need to be ‘warehoused’ centrally in an OLAP or reporting database, one will find a single ‘reporting’ database subscribing to several publications.

One can come across other topologies such as ‘bi-directional’ and ‘peer to peer’ which are really special cases of Central Publisher or Central Subscriber and use transactional replication

Publishing Subscriber

The distribution of data can be relayed to other subscribers via a publishing subscriber. This allows replication to be implemented over low-bandwidth WANs to a subscriber that, in turn, distributes it to other servers within its high-bandwidth LAN

Replication Methods

How Replication Works

Replication begins with the initial synchronization of the published objects between the Publisher and Subscribers, using a snapshot. A snapshot is a copy of all of the objects and data specified by a publication. After the snapshot is created on the publisher, it is delivered to the Subscribers via the distributor.

For Snapshot replication, this is sufficient. For other types of replication, all subsequent data changes to the publication flow to the Subscriber as they happen, in a queue, or on request.

Snapshot Replication

The snapshot replication process provides the initial synchronization for transactional and merge publications. However, in several cases, this initial synchronization is all that is necessary. This would include circumstances where data hardly changes, or if the latest version of the data is not essential to the subscriber, where the amount of data is small, or if a large number of changes takes place rapidly.

Snapshot replication involves copying the articles that make up the publication. Normally, if they exist already on the subscriber, they are over-written, though this behavior can be changed. Snapshot replication is more expensive in terms of overhead and network traffic and only takes place at intervals. Because locks are held during snapshot replication, this can impact other users of the subscriber database. It is therefore more suitable for static data and enumerations. In SQL Server 2005, several articles can be processed in parallel, and interrupted snapshots can be recommenced from the point of interruption. Snapshots can be queued or immediate.

Data changes are not tracked for snapshot replication; each time a snapshot is applied, it completely overwrites the existing data.

Transactional Replication

Transactional replication is used if:

  • Changes to the data must be propagated immediately
  • The database application taking out a subscription needs to react to every change
  • The Publisher has a very high volume of insert, update, and delete activity
  • The Publisher or Subscriber is a different database application reached via OLE DB.

Essentially, Transaction replication distributes data in one direction, but transactional replication does offer options that allow updates at the Subscriber. Once a snapshot replication has synchronized the subscribers with the publisher, all committed transactions on the publisher are then propagated to the subscribers in sequence, via distributed transactions. One can select a queued update or immediate, depending on requirements.

Peer-to-peer Replication

This is a special type of transactional replication in which every participant is both a publisher and subscriber (2005 Enterprise only) and is most useful for up to ten databases in a load-balancing or high-availability group.

Bidirectional Replication

This is where two databases replicate the same articles to each other via a distributor. There must be loopback detection. Data conflicts aren’t handled and the replication must be implemented in code, since the GUI doesn’t support it.

Transactional replication tracks changes through the SQL Server transaction log

Merge Replication

Merge replication allows various sites to work autonomously and later merge updates into a single, uniform result.

Merge Replication is complex, but provides the means to implement part of a high-availability system, as well as its original purpose of serving mobile and disconnected users. It is designed for cases where the publishers are not in constant communication with the subscribers. After the initial snapshot synchronization, subsequent changes are tracked locally with triggers, and the databases are merged when in contact, using a series of rules to resolve all possible conflicts.

Merge replication is used when several Subscribers might need to update the same data at various times and propagate those changes back to the Publisher and thence to other Subscribers. It is also required in applications that involve Subscribers receiving data, making changes offline, and finally reconnecting with the publisher to synchronize changes with the Publisher and other Subscribers.

To make this possible, each Subscriber requires a different partition of data and there has to be a set of rules to determine how every conflict that takes place in the update of the data is detected and resolved. These conflicts occur when the data is merged because there can be no ‘locking’ and so the same data may have been updated by the Publisher and by more than one Subscriber.

Merge Replication does not use transactions. Merge replication uses a set of conflict-resolution rules to deal with all the problems that occur when two databases alter the same data in different ways, before updating the subscribers with a ‘consensus’ version. It normally works on a row-by-row basis but can group rows of related information into a logical record. One can specify the order in which ‘articles’ are processed during synchronisation.

Merge replication tracks changes through triggers and metadata tables.

Replication Agents

Replication is done by several different agents, which are separate applications each responsible for part of the process. The replication agents should not be run under the SQL Server Agent account in a production system. Instead, they need the minimal permissions necessary to perform their function.

SQL Server Agent

This manages the overall replication process via SQL Server Agent jobs.

The Snapshot agent

Snapshot.exe executes on the Distributor. It extracts the schema and data defined by the publication, which is then sent to the subscriber via a ‘snapshot folder’. It also updates status information on the distribution database. . It is used in all forms of replication

The Log Reader Agent

LogRead.exe is used in transactional replication to extract relevant committed transactions from the publisher’s log, repackage them and send them to the distributor in the correct sequence.

Distribution Agent

Distrib.exe takes the snapshots, and log entries from the agents we’ve described, and dispatches them to the subscribers.

Merge Agent

ReplMer.exe is used only in Merge Replication to send a snapshot when the subscriber is initialized, and also exchanges transactions between publisher and subscriber

Queue Reader Agent

QrDrSvc.exe is used to queue the updates in transactional or snapshot replication when queuing has been specified.

Monitoring Replication

Many problems associated with replication can be avoided by .regular checks. The most obvious check is to make sure that the data has been transferred as expected. Periodic checks with SQL Compare and SQL Data Compare can be very useful in addition to the tools that come with Replication. Additionally the replication processes and jobs need to be checked to make sure they are working.

Checking throughput

The performance of replication must be regularly monitored, and performance-tuned as necessary.

The Replication Monitor is used to check on the operational state of publications, and inspect the history, and errors. Right-clicking the replication node in Object Explorer will gain access to it.

One of the most important concerns is the time delay, or latency, of transactions from the publications appearing in the subscriber database. At times of high transaction throughput on the publisher database, bottlenecks can occur. Whereas the stored procedure sp_browseReplCmds on the distribution database can tell you how far behind the synchronisation is at any particular time, one cannot determine where the problems lies just from the data. Tracer tokens are now used to measure the actual throughput of the replication architecture at any particular time to help diagnose such bottlenecks.

Validating

There is always an element of doubt as to whether the replication has entirely worked. There are stored procedures provided to compare the ‘articles’ on the publisher and subscribers to make sure they are the same.

The sp_publication_validation stored procedure validates the data associated with each article by calling sp_article_validation (after the articles associated with a publication have been activated). The sp_article_validation stored procedure invokes sp_table_validation stored procedure, which calculates the number of lines and, optionally, the checksum of the published table. It is considered good practice to perform a daily row-count and weekly checksum. SQL Data Compare is ideal for mending a broken replication.

The Distribution Agent raises the ‘20574’ system message if validation fails, or the ‘20575’ system message if it passes. The Distribution Agent will replicate changes to a subscriber even if the validation shows that the subscriber is out of synchronization. It is a good policy to configure the Replication Alert on the ‘20574’ message so as to send E-Mail, Pager, or Network notification.

This validation approach will only work within certain restrictions. For example, it will not work if certain filters have been applied. They should be used with caution.

Changing the settings

It is best to use the default replication settings unless there are clear performance gains to be made, or if the application design forces the issue. However, one cannot assume that the changes will be generally beneficial to the entire topology without comprehensive testing.

Articles

Articles are the smallest unit of a publication. An article can be a table, view, stored Procedure or function. Where an article is based on a table or view, it can contain all the data or just part of it. These filters of two types.: More common are the static ‘WHERE’ clauses, but filters can be used dynamically in Merge Replication to publish different ‘content’ (rows) to different ‘subscribers’ (databases receiving data). These latter Filters are called ‘Dynamic’ and can be simple Row Filters, or Join Filters, where the selection of rows to publish is based on a join with other tables, rather than a simple WHERE clause.

Normally, any alteration to an article that is a table is propagated to all the subscribers. You can also opt to propagate schema objects associated with the article such as indexes, constraints, triggers, collation and extended properties.

Updating articles

In Merge replication, the subscriber can update the article. This is, of course, a recipe for conflict, and these have to be resolved automatically. When the Merge Agent comes across a row that might have changed recently, it examines the history or ‘lineage’ of each site’s version of the row to see if there is a conflict. If so, then the update that is finally used. Has to be based on either

  • A “first wins” resolution,
  • a user-specified priority scheme to determine the update to select,
  • a customised resolution, using COM and stored procedures.

The ‘lineage’ is a history of changes in a table row in MSmerge_contents, which is maintained automatically when a user updates a row. Each column contains one entry for each site that has updated the row.

Conflicts to the data in the base table can occur within a column or a row. Most usual are column-tracked articles this means that, within any row, updates are only recognized as conflicts if the same column is updated by more than one subscriber. Occasionally, however, the business rules of the application may treat simultaneous changes to the any column within the row as a conflict, in which case row-level tracking is used.

Programming Replication Topologies

Replication agents and replication topologies can be administered and monitored remotely via SQL Scripts or RMO scripts. The task of building and maintaining replication topologies is made much easier of all parts of the deployments are scripted, even if this is done after the other methods such as wizards or RMO are used for the initial operation.

There are other uses for a replication script. It will be required in end-user application where, for example, such as s a pull subscription is synchronized when the user clicks a button, or where a routine administration task such as monitoring replication throughput is performed from a custom console. It is also generally used for writing customized business rules that are executed when a merge subscription is synchronized.

One can use the Object Explorer in SSMS, or the Enterprise Manager in earlier versions of SQL Server, to set up replication. BOL provide worked examples. Alternatively, RMO can be used with VB or C# to script the replication. Whatever system you use, it is a good idea to use the Transact SQL script as the reference for the replication topology you create

Ultimately, the functionality in a replication topology is provided by system stored procedures. The easiest approach is to use Transact-SQL script files to perform a logical sequence of replication tasks, because it provides a permanent, repeatable, copy of the steps used to deploy the replication topology that can, for example, be used to configure more than one subscriber. It also provides a measure of documentation and disaster-recovery. A script can be stored as a query object in a SQL Server Management Studio project.

Replication scripts can be created by hand, by the script generation of the replication wizards in SQL Server Management Studio, or by using Replication Management Objects (RMOs) to programmatically generate the script to create an RMO object.

When creating scripts to configure replication, it is best to use Windows Authentication so as to avoid storing security credentials in the script file. Otherwise you must secure the script file

Further reading:

SQL Server Replication
For the details of implementing replication, all the steps are documented here in Implementing Replication
details on configuring and maintaining a replication are here Configuring and Maintaining replication
For details on peer-to-peer replication, read Peer-to-Peer Transactional Replication
Some of the wider issues and dangers of replication are discussed here Data Replication as an Enterprise SOA Antipattern, in the excellent Microsoft Architecture Journal

Read Full Post »

Glossary

Glossary

By : Brad McGehee
Jul 19, 2006

Allocation Page: A SQL Server page used to hold metadata about other SQL Server pages. Some examples of allocation pages are the Page Free Space page, the Global Allocation Map page, and the Secondary Global Allocation Map page.

Bottleneck: Something that prevents SQL Server or a SQL Server application from running optimally. Often, this is a hardware limitation, such as insufficient CPU resources, a lack of RAM, not enough network bandwidth, or slow I/O. If a bottleneck exists in a server, it can prevent the other components of the server from running at their full capacity. As a DBA, one of your goals is to identify and remove as many server bottlenecks as you can in order to boost performance.

Branch Node: Another name for an intermediate node in an index. Branch, or intermediate nodes, are located between the root node and leaf nodes in an index. They contain index pages.

Clustered Index: An index that includes both table data and index data. In a clustered index, the leaf level of the index are the actual data pages. Data is physically stored in a clustered index in ascending order. A table can only have one clustered index.

Clustered Index Scan: Similar to a table scan, this is when the Query Optimizer reads each record in a clustered index to locate one or more records.

Clustered Index Seek: This is when the Query Optimizer is able to use a clustered index to directly select one or more specific records based on the WHERE clause of a query. This generally produces the fastest type of search.

Data Definition Language (DDL): Refers to Transact-SQL code used to create, alter, or drop database objects.

Data Page: A page designated to store rows of user data in a SQL Server file.

DDL: See Data Definition Language

Decision Support System (DSS): Refers to any software used to analyze business performance. Generally speaking, this software runs intensive queries against databases in order to analyze data for reporting purposes. These queries are best supported by extensive indexes for best performance. DSS queries run against OLTP databases may hurt the performance of production transactions, and should be avoided.

Density: Refers to the average percentage of duplicate rows in an index. If an indexed column, such as last_name, has much duplicate data, then the index is said the have high density. But if an indexed column, such as employee_number, has unique data, then the index is said to have low density. Density is also related to selectivity. Low selectivity is related to high density, and high selectivity is related to low density.

DSS: See Decision Support System

Execution Plan: A sequence of steps determined by the query optimizer (during compilation) used to execute a query tree or a sequence tree.

Extent: A group of eight continuous pages. Since each page in SQL Server holds 8K, an extent is 64K in size. There are mixed and uniform extents found in SQL Server. A mixed extent contains pages from two or more objects. Uniform extents have all of their 8 pages allocated to the same object.

GAM: See Global Allocation Page.

Ghost Record: When rows are deleted from the leaf level of an index, the are not removed immediately. Instead, they are marked as invalid and are called ghost records. Periodically, SQL Server starts a special housekeeping thread that locates and removes the ghost records.

Global Allocation Page (GAM): An allocation page containing information about allocated extends in a SQL Server file. The second page of every SQL Server file in a GAM page. Each GAM page can track nearly 64K of extents, which is about 4GB of data.

Heap: A collection of data pages containing rows for a table. Another way of saying this is that a heap is a table without a clustered index.

Index Allocation Map (IAM): An allocation page containing information about the extents that a table or index uses. It contains the location of the eight initial pages and a bitmap of extents indicating which extents are in use for that object. Each IAM page can track up to 512,000 data pages. IAM pages are used by SQL Server to help it navigate through a heap in order to locate available space for new rows to be inserted. IAM pages are the only logical connection between data pages in a heap.

Intermediate Level: Refers to the intermediate, or branch nodes, of an index.

Intermediate Node: Intermediate, or branch nodes, are located between the root node and leaf nodes in an index. They contain index pages.

IAM: See Index Allocation Map.

Index Pages: A page designated to store rows of index data in a SQL Server file.

I/O: Specifically, this is an acronym for Input/Output. When referred to in the context of SQL Server performance tuning, it refers to all the reads and writes performed on a server’s disk subsystem.

Leaf Level: Refers to the leaf pages in a clustered or non-clustered index.

Leaf Page: In a clustered index, it refers to the actual data pages of a table. In a non-clustered index, it refers to the lowest set of pages in an index, which includes pointers to the actual data pages.

Logical Read: Refers to when SQL Server reads a single 8K page from either memory or from disk.

Mixed Extend: See extent.

Non-Clustered Index: This type of index points to data pages, either in a heap or clustered index. This is a separate storage structure from a table and only includes index pages, no data pages. There can be a maximum of 250 non-clustered indexes per table.

OLAP: See Online Analytical Processing

OLTP: See Online Transaction Processing

Online Analytical Processing (OLAP): Refers to the analysis of data in a database, generally against a data warehouse or data mart. This generally involved resource-intensive queries to be run against the database.

Online Transaction Processing (OLTP): Refers to transactional processing run against a database, such as INSERTS, UPDATES, and DELETES.

Optimizer: See Query Optimizer.

Page: In SQL Server, a page contains 8K (8096 bytes) of data. There are many different kinds of pages in SQL Server, including data and index pages. Pages are always grouped in groups of 8 pages, called an extent. SQL Server stores, reads, and writes data in pages.

Page Free Space Page (PFS): An allocation page containing information on the amount of free space available on the pages in a file. The first page of each file in SQL Server is a PFS page. Every PFS page tracks up to 8,000 contiguous pages, which is about 64MB of data pages. For each file page, the PFS page contains a byte that tracks whether a page has been allocated, whether the page is on a mixed or uniform extent, and an estimate of how much room is available on the page.

Performance: Refers to the metrics relating to how a particular request is handled. For example, if a particular query takes 5 seconds to run, and after performance tuning, it now takes 3 seconds to run, we have boosted the performance of this query. On the other hand, if a particular query now takes 15 seconds, and it used to take only 10 seconds to run, then performance has degraded.

PFS Page: See Page Free Space Page.

Physical Read: Refers to when SQL Server reads a single 8K page from disk.

Query Optimizer: A software component of SQL Server that is used to analyze queries submitted to SQL Server for execution, and then determines the optimal way to execute the query. The query optimizer cannot be accessed directly by users. Instead, once queries are submitted to SQL Server, and parsed by the parser, the parser passes the queries to the query optimizer where optimization occurs.

Query Plan: The steps used to execute a query. A query plan is generated by the Query Optimizer.

Query Tree: If a stored procedure, trigger, or Transact-SQL batch is parsed, an internal structure is created that describes the series of steps needed to perform the requested actions. If any of these contain a query, then this internal structure is called a query tree.

RAID Level 0: Also referred to as disk striping, RAID Level 0 combines three or more physical disks into a single array, and data is striped evenly among all of the drives. RAID Level 0 offers the fastest speed of all RAID levels, but it is not fault tolerant. Because of this, it is not suitable for storing SQL Server OLTP databases files, although it can be used for the Tempdb database because the Tempdb database is recreated every time SQL Server is restarted. RAID Level 0 may also be considered for read-only databases where speed is of high importance.

RAID Level 1: Also referred to as disk mirroring or duplexing (mirroring uses one channel, and duplexing uses two channels), RAID Level 1 combines two physical disks into a single array, and data is written or read to both drives at the same time. RAID Level 1 offers fault tolerance and reasonable good read and write speed. The biggest downside to it is that it wastes 50% of your disk drives. It is very common to use RAID Level 1 arrays for a SQL Server’s operating system drive.

RAID Level 5: Also referred to as data striping with parity, RAID Level 5 combines three or more physical drives into a single array. Data, along with parity data used to recreate the data should one of the drives in the array fail, is striped among all the disks, making it fault tolerant. RAID Level 5 has good read performance, but mediocre write performance (because of the parity data that must be created and written for fault tolerance). For optimum performance, there should be as many physical drives in the array as possible (the number of drives in the array depends on what the controller can handle). RAID Level 5 is commonly used to store SQL Server production databases.

RAID Level 10: Also referred sometimes to as RAID Level 1+0 or 0+1, RAID Level 10 is a combination of RAID 1 and RAID 0. Essentially, it mirrors two RAID Level 0 arrays, which provides both very high I/O performance and fault tolerance. While it is the most expensive RAID option, it is the preferred RAID option for high performance SQL Server databases.

RID: See row identifier.

Root Node: The top level of a clustered or non-clustered index. In consists of a single index page.

Row Identifier: Identifies the physical location of a row on a page.

Sargable: The ability of the search arguments in a WHERE clause to use an index or not. For example, if the search arguments in a WHERE clause have the ability to use an index, then the search arguments are said to be sargable. If the search arguments in a WHERE clause cannot use an index, then they are considered to be non-sargable.

Scalable: The ability for a resource to increase in capacity in order to meet increasing workloads over a period of time. Scalability specifies how a metric varies with load. For example, if a SQL Server application is now running 1000 transactions per minutes, but later is running 2000 transactions per minutes with only a proportional increase in hardware utilization, then the application could be considered scalable.

Search Argument: The criteria used in a WHERE clause used to restrict a query to an exact match or a range of values. The more exact the search argument, generally the faster the query will perform. Try to avoid using wildcard characters, as they make the search argument less exact, and often slows the query.

Secondary Global Allocation Map Page (SGAM): An allocation page containing information about mixed extents. The third page of each SQL Server file is a SGAM page. SGAM pages track all mixed extents that have at least one unused page. One SGAM page can track up to almost 64K of mixed extents, or about 4GB of data.

Selectivity: Refers to the percentage of rows in a table that are returned by a query. A query is considered highly selective if it returns a very limited number of rows. A query is considered to have low selectivity if it returns a high percentage of rows. Generally speaking, if a query returns more than 5% of the number of rows in a table, it is considered to have low selectivity.

Sequence Tree: If a stored procedure, trigger, or Transact-SQL batch is parsed, an internal structure is created that describes the series of steps needed to perform the requested actions. If any of these contain a procedure, then this internal structure is called a sequence tree.

SGAM Page: See Secondary Global Allocation Map Page

Table Scan: When the SQL Server Query Optimizer determines that there is no useful indexes on a table to assist producing the results of a query, SQL Server performs a table scan on the table. This means that SQL Server has to read every row in the entire table in order to locate the rows requested in the query. In most cases, table scans are slow and should be avoided. In some cases, a table scan is faster when locating records than using an index. This is especially true for small tables.

Uniform Extent: See extent

 

Read Full Post »

Policy-Based Management is a new feature in SQL Server 2008 that allows you to define and implement policies across your SQL Server infrastructure. Policy-Based Management works in a manner similar to Active Directory’s Group Policies, a feature of Microsoft Windows NT-based operating systems. Group Policy offers centralized management and configuration of systems, applications, and users via administrator- or system-controlled policies, which can then be applied at various levels of the managed directory structure.

Policy-Based Management adheres to those same principles as Group Policy, in that you can apply a policy against a target (such as a database, table, or stored procedure) and evaluate whether the target complies with your policy. If your target does not adhere to your policy, you can either enforce compliance with that policy or trigger an alert to let an administrator know about the policy violation. You can set up your policy to actively deny any nonconforming actions, or choose to simply log such actions, so that an administrator can address them later.

Policy-Based Management is a system for managing one or more instances of SQL Server 2008. Through the creation, management, and deployment of policies, you are able to apply your own custom-defined standards across an entire SQL Server enterprise.

Read Full Post »

Useful SQL Server DBCC Commands

DBCC CACHESTATS displays information about the objects currently in the buffer cache, such as hit rates, compiled objects and plans, etc.

Example:

DBCC CACHESTATS

Sample Results (abbreviated):

Object Name       Hit Ratio
————      ————-

Proc              0.86420054765378507
Prepared          0.99988494930394334
Adhoc             0.93237136647793051
ReplProc          0.0
Trigger           0.99843452831887947
Cursor            0.42319205924058612
Exec Cxt          0.65279111666076906
View              0.95740334726893905
Default           0.60895011346896522
UsrTab            0.94985969576133511
SysTab            0.0
Check             0.67021276595744683
Rule              0.0
Summary           0.80056155581812771

Here’s what some of the key statistics from this command mean:

  • Hit Ratio: Displays the percentage of time that this particular object was found in SQL Server’s cache. The bigger this number, the better.
  • Object Count: Displays the total number of objects of the specified type that are cached.
  • Avg. Cost: A value used by SQL Server that measures how long it takes to compile a plan, along with the amount of memory needed by the plan. This value is used by SQL Server to determine if the plan should be cached or not.
  • Avg. Pages: Measures the total number of 8K pages used, on average, for cached objects.
  • LW Object Count, LW Avg Cost, WL Avg Stay, LW Ave Use: All these columns indicate how many of the specified objects have been removed from the cache by the Lazy Writer. The lower the figure, the better.

[7.0, 2000] Updated 9-1-2005

*****

DBCC DROPCLEANBUFFERS: Use this command to remove all the data from SQL Server’s data cache (buffer) between performance tests to ensure fair testing. Keep in mind that this command only removes clean buffers, not dirty buffers. Because of this, before running the DBCC DROPCLEANBUFFERS command, you may first want to run the CHECKPOINT command first. Running CHECKPOINT will write all dirty buffers to disk. And then when you run DBCC DROPCLEANBUFFERS, you can be assured that all data buffers are cleaned out, not just the clean ones.

Example:

DBCC DROPCLEANBUFFERS

[7.0, 2000, 2005] Updated 9-1-2005

*****

DBCC ERRORLOG: If you rarely restart the mssqlserver service, you may find that your server log gets very large and takes a long time to load and view. You can truncate (essentially create a new log) the Current Server log by running DBCC ERRORLOG. You might want to consider scheduling a regular job that runs this command once a week to automatically truncate the server log. As a rule, I do this for all of my SQL Servers on a weekly basis. Also, you can accomplish the same thing using this stored procedure: sp_cycle_errorlog.

Example:

DBCC ERRORLOG

[7.0, 2000, 2005] Updated 9-1-2005

*****

DBCC FLUSHPROCINDB: Used to clear out the stored procedure cache for a specific database on a SQL Server, not the entire SQL Server. The database ID number to be affected must be entered as part of the command.

You may want to use this command before testing to ensure that previous stored procedure plans won’t negatively affect testing results.

Example:

DECLARE @intDBID INTEGER SET @intDBID = (SELECT dbid FROM master.dbo.sysdatabases WHERE name = ‘database_name’)
DBCC FLUSHPROCINDB (@intDBID)

[7.0, 2000, 2005] Updated 9-1-2005

*****

DBCC INDEXDEFRAG: In SQL Server 2000, Microsoft introduced DBCC INDEXDEFRAG to help reduce logical disk fragmentation. When this command runs, it reduces fragmentation and does not lock tables, allowing users to access the table when the defragmentation process is running. Unfortunately, this command doesn’t do a great job of logical defragmentation.

The only way to truly reduce logical fragmentation is to rebuild your table’s indexes. While this will remove all fragmentation, unfortunately it will lock the table, preventing users from accessing it during this process. This means that you will need to find a time when this will not present a problem to your users.

Of course, if you are unable to find a time to reindex your indexes, then running DBCC INDEXDEFRAG is better than doing nothing.

Example:

DBCC INDEXDEFRAG (Database_Name, Table_Name, Index_Name)

DBCC FREEPROCCACHE: Used to clear out the stored procedure cache for all SQL Server databases. You may want to use this command before testing to ensure that previous stored procedure plans won’t negatively affect testing results.

Example:

DBCC FREEPROCCACHE

[7.0, 2000, 2005] Updated 10-16-2005

*****

DBCC MEMORYSTATUS: Lists a breakdown of how the SQL Server buffer cache is divided up, including buffer activity. This is an undocumented command, and one that may be dropped in future versions of SQL Server.

Example:

DBCC MEMORYSTATUS

[7.0, 2000] Updated 10-16-2005

*****

DBCC OPENTRAN: An open transaction can leave locks open, preventing others from accessing the data they need in a database. This command is used to identify the oldest open transaction in a specific database.

Example:

DBCC OPENTRAN(‘database_name’)

[7.0, 2000] Updated 10-16-2005

*****

DBCC PAGE: Use this command to look at contents of a data page stored in SQL Server.

Example:

DBCC PAGE ({dbid|dbname}, pagenum [,print option] [,cache] [,logical])

where:

Dbid or dbname: Enter either the dbid or the name of the database in question.

Pagenum: Enter the page number of the SQL Server page that is to be examined.

Print option: (Optional) Print option can be either 0, 1, or 2. 0 – (Default) This option causes DBCC PAGE to print out only the page header information. 1 – This option causes DBCC PAGE to print out the page header information, each row of information from the page, and the page’s offset table. Each of the rows printed out will be separated from each other. 2 – This option is the same as option 1, except it prints the page rows as a single block of information rather than separating the individual rows. The offset and header will also be displayed.

Cache: (Optional) This parameter allows either a 1 or a 0 to be entered. 0 – This option causes DBCC PAGE to retrieve the page number from disk rather than checking to see if it is in cache. 1 – (Default) This option takes the page from cache if it is in cache rather than getting it from disk only.

Logical: (Optional) This parameter is for use if the page number that is to be retrieved is a virtual page rather then a logical page. It can be either 0 or 1. 0 – If the page is to be a virtual page number. 1 – (Default) If the page is the logical page number.

[6.5, 7.0, 2000]Updated 10-16-2005

*****

DBCC PINTABLE & DBCC UNPINTABLE: By default, SQL Server automatically brings into its data cache the pages it needs to work with. These data pages will stay in the data cache until there is no room for them, and assuming they are not needed, these pages will be flushed out of the data cache onto disk. At some point in the future when SQL Server needs these data pages again, it will have to go to disk in order to read them again into the data cache for use. If SQL Server somehow had the ability to keep the data pages in the data cache all the time, then SQL Server’s performance would be increased because I/O could be reduced on the server.

The process of “pinning a table” is a way to tell SQL Server that we don’t want it to flush out data pages for specific named tables once they are read into the cache in the first place. This in effect keeps these database pages in the data cache all the time, which eliminates the process of SQL Server from having to read the data pages, flush them out, and reread them again when the time arrives. As you can imagine, this can reduce I/O for these pinned tables, boosting SQL Server’s performance.

To pin a table, the command DBCC PINTABLE is used. For example, the script below can be run to pin a table in SQL Server:

DECLARE @db_id int, @tbl_id int
USE Northwind
SET @db_id = DB_ID(‘Northwind’)
SET @tbl_id = OBJECT_ID(‘Northwind..categories’)
DBCC PINTABLE (@db_id, @tbl_id)

While you can use the DBCC PINTABLE directly, without the rest of the above script, you will find the script handy because the DBCC PINTABLE’s parameters refer to the database and table ID that you want to pin, not by their database and table name. This script makes it a little easier to pin a table. You must run this command for every table you want to pin.

Once a table is pinned in the data cache, this does not mean that the entire table is automatically loaded into the data cache. It only means that as data pages from that table are needed by SQL Server, they are loaded into the data cache, and then stay there, not ever being flushed out to disk until you give the command to unpin the table using the DBCC UNPINTABLE. It is possible that part of a table, and not all of it, will be all that is pinned.

When you are done with a table and you no longer want it pinned, you will want to unpin your table. To do so, run this example code:

DECLARE @db_id int, @tbl_id int
USE Northwind
SET @db_id = DB_ID(‘Northwind’)
SET @tbl_id = OBJECT_ID(‘Northwind..categories’)
DBCC UNPINTABLE (@db_id, @tbl_id)

DBCC PROCCACHE: Displays information about how the stored procedure cache is being used.

Example:

DBCC PROCCACHE

[6.5, 7.0, 2000]Updated 10-16-2005

*****

DBCC REINDEX: Periodically (weekly or monthly) perform a database reorganization on all the indexes on all the tables in your database. This will rebuild the indexes so that the data is no longer fragmented. Fragmented data can cause SQL Server to perform unnecessary data reads, slowing down SQL Server’s performance.

If you perform a reorganization on a table with a clustered index, any non-clustered indexes on that same table will automatically be rebuilt.

Database reorganizations can be done  byscheduling SQLMAINT.EXE to run using the SQL Server Agent, or if by running your own custom script via the SQL Server Agent (see below).

Unfortunately, the DBCC DBREINDEX command will not automatically rebuild all of the indexes on all the tables in a database; it can only work on one table at a time. But if you run the following script, you can index all the tables in a database with ease.

Example:

DBCC DBREINDEX(‘table_name’, fillfactor)

or

–Script to automatically reindex all tables in a database

USE DatabaseName –Enter the name of the database you want to reindex

DECLARE @TableName varchar(255)

DECLARE TableCursor CURSOR FOR
SELECT table_name FROM information_schema.tables
WHERE table_type = ‘base table’

OPEN TableCursor

FETCH NEXT FROM TableCursor INTO @TableName
WHILE @@FETCH_STATUS = 0
BEGIN
PRINT “Reindexing ” + @TableName
DBCC DBREINDEX(@TableName,’ ‘,90)
FETCH NEXT FROM TableCursor INTO @TableName
END

CLOSE TableCursor

DEALLOCATE TableCursor

The script will automatically reindex every index in every table of any database you select, and provide a fillfactor of 90%. You can substitute any number you want for the 90 in the above script.

When DBCC DBREINDEX is used to rebuild indexes, keep in mind that as the indexes on a table are being rebuilt, that the table becomes unavailable for use by your users. For example, when a non-clustered index is rebuilt, a shared table lock is put on the table, preventing all but SELECT operations to be performed on it. When a clustered index is rebuilt, an exclusive table lock is put on the table, preventing any table access by your users. Because of this, you should only run this command when users don’t need access to the tables being reorganized. [7.0, 2000]Updated 10-16-2005

*****

DBCC SHOWCONTIG: Used to show how fragmented data and indexes are in a specified table. If data pages storing data or index information becomes fragmented, it takes more disk I/O to find and move the data to the SQL Server cache buffer, hurting performance. This command tells you how fragmented these data pages are. If you find that fragmentation is a problem, you can reindex the tables to eliminate the fragmentation. Note: this fragmentation is fragmentation of data pages within the SQL Server MDB file, not of the physical file itself.

Since this command requires you to know the ID of both the table and index being analyzed, you may want to run the following script so you don’t have to manually look up the table name ID number and the index ID number.

Example:

DBCC SHOWCONTIG (Table_id, IndexID)

Or:

–Script to identify table fragmentation

–Declare variables
DECLARE
@ID int,
@IndexID int,
@IndexName varchar(128)

–Set the table and index to be examined
SELECT @IndexName = ‘index_name’           –enter name of index
SET @ID = OBJECT_ID(‘table_name’)          –enter name of table

–Get the Index Values
SELECT @IndexID = IndID
FROM sysindexes
WHERE id = @ID AND name = @IndexName

–Display the fragmentation
DBCC SHOWCONTIG (@id, @IndexID)

While the DBCC SHOWCONTIG command provides several measurements, the key one is Scan Density. This figure should be as close to 100% as possible. If the scan density is less than 75%, then you may want to reindex the tables in your database. [6.5, 7.0, 2000] Updated 3-20-2006

*****

DBCC SHOW_STATISTICS: Used to find out the selectivity of an index. Generally speaking, the higher the selectivity of an index, the greater the likelihood it will be used by the query optimizer. You have to specify both the table name and the index name you want to find the statistics on.

Example:

DBCC SHOW_STATISTICS (table_name, index_name)

DBCC SQLMGRSTATS: Used to produce three different values that can sometimes be useful when you want to find out how well caching is being performed on ad-hoc and prepared Transact-SQL statements.

Example:

DBCC SQLMGRSTATS

Sample Results:

Item                      Status
————————- ———–
Memory Used (8k Pages)    5446
Number CSql Objects       29098
Number False Hits         425490

Here’s what the above means:

  • Memory Used (8k Pages): If the amount of memory pages is very large, this may be an indication that some user connection is preparing many Transact-SQL statements, but it not un-preparing them.
  • Number CSql Objects: Measures the total number of cached Transact-SQL statements.
  • Number False Hits: Sometimes, false hits occur when SQL Server goes to match pre-existing cached Transact-SQL statements. Ideally, this figure should be as low as possible.

[2000] Added 4-17-2003

*****

DBCC SQLPERF(): This command includes both documented and undocumented options. Let’s take a look at all of them and see what they do.

DBCC SQLPERF (LOGSPACE)

This option (documented) returns data about the transaction log for all of the databases on the SQL Server, including Database Name, Log Size (MB), Log Space Used (%), and Status.

DBCC SQLPERF (UMSSTATS)

This option (undocumented) returns data about SQL Server thread management.

DBCC SQLPERF (WAITSTATS)

This option (undocumented) returns data about wait types for SQL Server resources.

DBCC SQLPERF (IOSTATS)

This option (undocumented) returns data about outstanding SQL Server reads and writes.

DBCC SQLPERF (RASTATS)

This option (undocumented) returns data about SQL Server read-ahead activity.

DBCC SQLPERF (THREADS)

This option (undocumented) returns data about I/O, CPU, and memory usage per SQL Server thread. [7.0, 2000] Updated 3-20-2006

*****

DBCC SQLPERF (UMSSTATS): When you run this command, you get output like this. (Note, this example was run on a 4 CPU server. There is 1 Scheduler ID per available CPU.)

Statistic                        Value
——————————– ————————
Scheduler ID                     0.0
num users                        18.0
num runnable                     0.0
num workers                      13.0
idle workers                     11.0
work queued                      0.0
cntxt switches                   2.2994396E+7
cntxt switches(idle)             1.7793976E+7
Scheduler ID                     1.0
num users                        15.0
num runnable                     0.0
num workers                      13.0
idle workers                     10.0
work queued                      0.0
cntxt switches                   2.4836728E+7
cntxt switches(idle)             1.6275707E+7
Scheduler ID                     2.0
num users                        17.0
num runnable                     0.0
num workers                      12.0
idle workers                     11.0
work queued                      0.0
cntxt switches                   1.1331447E+7
cntxt switches(idle)             1.6273097E+7
Scheduler ID                     3.0
num users                        16.0
num runnable                     0.0
num workers                      12.0
idle workers                     11.0
work queued                      0.0
cntxt switches                   1.1110251E+7
cntxt switches(idle)             1.624729E+7
Scheduler Switches               0.0
Total Work                       3.1632352E+7

Below is an explanation of some of the key statistics above:

  • num users: This is the number of SQL Server threads currently in the scheduler.
  • num runnable: This is the number of actual SQL Server threads that are runnable.
  • num workers: This is the actual number of worker there are to process threads. This is the size of the thread pool.
  • idle workers: The number of workers that are currently idle.
  • cntxt switches: The number of context switches between runnable threads.
  • cntxt switches (idle): The number of context switches to the idle thread.
  • DBCC TRACEON & DBCC TRACEOFF: Used to turn on and off trace flags. Trace flags are often used to turn on and off specific server behavior or server characteristics temporarily. In rare occasions, they can be useful to troubleshooting SQL Server performance problems.
  • Example:
  • To use the DBCC TRACEON command to turn on a specified trace flag, use this syntax:
  • DBCC TRACEON (trace# [,…n])
  • To use the DBCC TRACEON command to turn off a specified trace flag, use this syntax:
  • DBCC TRACEOFF (trace# [,…n])
  • You can also use the DBCC TRACESTATUS command to find out which trace flags are currently turned on in your server using this syntax:
  • DBCC TRACESTATUS (trace# [,…n])
  • For specific information on the different kinds of trace flags available, search this website or look them up in Books Online. [6.5, 7.0, 2000] Updated 3-20-2006
  • *****
  • DBCC UPDATEUSAGE: The official use for this command is to report and correct inaccuracies in the sysindexes table, which may result in incorrect space usage reports. Apparently, it can also fix the problem of unreclaimed data pages in SQL Server. You may want to consider running this command periodically to clean up potential problems. This command can take some time to run, and you want to run it during off times because it will negatively affect SQL Server’s performance when running. When you run this command, you must specify the name of the database that you want affected.
  • Example:
  • DBCC UPDATEUSAGE (‘databasename’)

 

 

Read Full Post »

12 Essential Steps After Installing SQL Server

Rolling out SQL Servers is a key task for DBAs.  While for some DBAs this can be a frequent exercise, others will have to face it one time or other throughout their career in an organisation.

Successfully planning an installation requires a fair bit of knowledge about SQL Server and a basic understanding of the operating system. Once the installation is complete, the system usually needs some changes to default configuration before production databases can be rolled out. In this article I list some common steps that I take after installing a SQL Server. This can be used as a “checklist” to ensure consistency for all the SQL Servers that you roll out.

Step1: Install the Service Pack, Hotfixes and Cumulative Updates

This is the first step you should take after a successful installation. You should always install the latest service pack available unless there is a clear business cause not to do so. At the time of this writing, SQL Server 2005 is in service pack level 3 and SQL Server 2008 has service pack 1. Service packs are freely downloadable from the Microsoft web site. If there are hotfixes or cumulative updates available after the last service pack has been released, you should apply them as well. This ensures that your server is ready with the latest version of the database software. Once all the patches are installed, make a backup of your system databases and reboot the server.

Step 2: Configure SQL Services

Although your planning phase should determine what SQL services you are going to install, there may be occasions when a service is rolled out but not needed immediately. Perhaps your business requires you to install a SQL Server and although reporting is not going to be part of it, management insists you install Reporting Services – “just in case”. If a service is not going to be needed immediately (or needed at all), disable it.

One example of a service that you can disable is the Browser service. This service is installed by default. If you are not running named instances or planning to install multiple instances in one machine, you can disable this service.  You can disable the Browser service from the SQL Server Configuration Manager.

Another service that you can disable is the SQL Server VSS Writer. Unless you are using applications that make use of the Windows Volume Shadow Copy infrastructure to backup SQL databases, you can disable this service safely. Note that SQL VSS Writer is not available from the Configuration Manager. You can access it from the Control Panel > Services applet (see below).

While non-essential services should be disabled, other services like the Agent are supposed to be running continuously. This is because scheduled backups, maintenance plans, replication and jobs are dependent on the Agent service. Similarly, if your server is going to execute Integration Service packages, ensure the SSIS service starts automatically every time the server boots up. You can configure the start-up property of a service from the SQL Server Configuration Manager.

One thing that is not available in the Configuration Manager is the recovery behaviour of SQL Services – i.e. what should the services do if they fail unexpectedly. To access this behaviour, you can bring up the service’s property from the Control Panel’s “Services” applet. The “Recovery” tab of the service property allows you to define what the computer should do in case the service fails once, twice or subsequent times. From here, you can specify that the service should start automatically if it fails.

The same type of behaviour can be set in the SQL Server Agent’s property page. Here, you can specify that SQL or Agent service should restart automatically if they stop unexpectedly.

Step 3: Configure Default Directories

Everything has its place in this world and when it comes to files and folders, SQL Server is no exception. A number of directories are created during installation – you have one directory for storing data and log files, another directory for replication, another for full text indexes, and yet another for backup files.

Where SQL places its files is determined mostly during the installation process. This happens because either you provide this information explicitly or SQL uses a default set of locations. The directory path information you provide during the setup should really be coming from your planning phase. Once the installation is complete, it is still possible to fine-tune some of these component locations.

If you have followed the best practice, your data files should be residing in a separate physical drive and directory.  The transaction log files of user databases should also be in a separate drive and folder. You can configure both the locations from the server’s properties:

Similar to the data and log directories, you can specify the default locations for:

a)    Backup files

b)    Replication files

c)    Full text index files

d)    Job output files

SQL Server creates the folders for these components in the same location where data files are placed. To ensure data files are given the maximum possible dedicated space as far as disk activity is concerned, I try to move these folders to a separate drive.

Unfortunately, you cannot use a GUI front-end to configure these alternate locations. The path information is saved as registry values and that means you either have to manually change registry entries or use an extended stored procedure like xp_regwrite.  Also, the location of the registry keys will be different between named instances. They will also vary depending on your version of SQL Server (2005 or 2008). Generally, you will be looking for a registry key called “Working Directory”.

The script below shows how I changed the default locations for a default instance of SQL Server 2005 installed in a Windows Server 2003 system:

Configuring_Default_Directories.sql

And of course, the paths you specify as registry key values must exist – that means you need to create (or move) the folders.

To ensure the system is aware of your changes, restart SQL services after making the changes.

Step 4: Configure Default Database Properties

We all know it – model database works as the “template” for new databases. If you want to set some common properties across all your user databases, you can set them in the model. A number of properties can be set here to take advantage of this approach:

a)    Recovery Model: perhaps you are installing SQL Server for a data warehouse that will host a number of data marts that would not require full recovery model. You can set the recovery model of the model database to “simple” mode in such cases.

b)    File Properties: by default, the data files are configured to grow by 1 MB and the log files by 10% of their size. This means that a large transaction log files will grow by a large proportion when it needs more space. For example, every time a 6 GB log file needs to grow, it will take up 600 MB of disk space in one gallop. For a write intensive database, the data file’s incremental growth by 1 MB can result in fragmentation. As a general rule of thumb, the log files should be quarter or half the size of the data files. If you have an idea about how big your data and log files are going to be and how they should grow, you can set up the properties in the model database. I generally set the growth increments in megabytes rather than leaving it as a percentage.

Another area where you can make use of the model database is filegroups. If you want your user tables to be created on secondary data files and leave the primary filegroups with system objects only, you can create secondary data files in the model database and assign them to a user defined filegroup and make that filegroup default.

c)    Database Options: depending on your requirements you can set up other database options in the model database. Examples of such options could be torn page detection, ANSI defaults etc.

d)    Custom Objects: if you have custom objects that need to be present in each user database, you can put them in the model database.

Step 5: Configure tempdb Database

The temporary database is used heavily in SQL 2005 and latter versions. Best practice suggests that for optimal performance, SQL Server’s tempdb database files should be placed in a separate dedicated disk drive. This should be followed wherever possible, but even if your server’s current disk configuration does not allow this, you can still make some changes to ensure tempdb is optimally tuned. Part of this tuning means you should allocate a healthy amount of space to this database.

The number of data files you should allocate in the tempdb database will depend on the number of CPUs (logical or physical) present in the machine. When SQL Server accesses a database, it starts a scheduler thread for each data file present. So if you have eight CPUs in your physical server, the tempdb database should have eight data files to “load balance” the threads.

The easiest way you can check the number of CPUs present in the machine is by looking at the Windows Task Manager’s Performance tab.

Also, data files in a database are used in a “proportional fill” algorithm. This means that if you have two data files in a database – one 4 GB and another 8 GB, SQL will try to stripe the data across the two files in a 1:2 ratio. For each stripe of data written to the 4 GB file, SQL will write two stripes in the 8GB file – hence the term “proportional fill”.

Now if you configure tempdb to have the same number of data files as the number of CPUs present and specify each data file to have the same size and growth, you are effectively ensuring the CPU load is evenly distributed across tempdb.

The following script shows one such configuration done from the query window.

Configuring_tempdb_Data_Files.sql

As you can see from the code, we are adding three new data files to tempdb (since there are four CPUs present and there is already one data file) and ensuring the four files to have the same size and growth characteristics.

Once you have completed the tempdb configuration, you will need to restart the SQL instance for the changes to take effect.

Step 6: Configure Server Properties and Features

Of all the SQL Server configurations, this one is probably the most important and has a direct effect on how the server behaves. What properties you configure and what values you set here will depend on your organisation’s business needs, SOEs etc. There are a number of properties you can set for the database server engine and there is more than one way you can set some of them. I like to divide these properties into three categories:

Configuring Features:

If it is SQL 2005, use the Surface Area Configuration tool for enabling some of the SQL Server features. For SQL 2008, use the Surface Area Configuration Facet from the Server properties. Both these versions list the same features and by default all of these features are disabled.

What you enable here will depend on what you want your server to do. You may want to enable Database Mail (discussed later). If your stored procedures are going to use the xp_cmdshell utility, you will need to switch it on here. If you think there will be managed code in your database application, you need to enable CLR.

Configuring Parameter Values:

a)    Memory: Set the minimum and maximum server memory parameters. You can do so from the server properties dialogue box or using the sp_configure system stored procedure. If AWE is required, enable that too.

b)    Fill factor: Set a default value for index fill factor. This can save page splitting in indexes in future.

c)    Processor: Configure max. degree of parallelism, cost threshold for parallelism as necessary.

d)    Network connections: Configure remote query timeouts if you want to override the default behaviour of 600 seconds.

e)    Security: Although you can change the authentication mode (Trusted or Mixed) at this stage – this is a choice you probably have made during the installation. You can also enable C2 audit mode as part of security configuration.

f)     Trace Flags: If you are installing SQL Server 2005, you may wish to enable trace flags as start-up parameters for detecting and reporting deadlocks. The trace flag 1204 can be used for such purposes.

Configuring SQL Server Network:

This is where you define how SQL will use the network to talk to other computers. The first thing you need to do here is enabling or disabling the network protocols SQL will use to listen for incoming traffic. You set this from the Configuration Manager. The picture below shows how I have disabled Named Pipes.

Next, you need to see if your applications require server aliases. Some of the older apps may require you to define a server alias for a named instance.

Finally, you need to decide if you will assign non-default TCP port number for your instance. The default instance uses a default port number of 1433 and the named instances are usually assigned dynamic port numbers at start up. If you want to assign a non-default port number to SQL Server for TCP connections, you can set it here.

Most of the server property changes would require restarting the service.

Step 7: Configure Security

When it comes down to SQL Server security, companies usually take one of two approaches – some follow a rigorous data security policy, others just leave it all to defaults (and to potential attacks). If you are working for a company like the former, you are in luck – you will know what to do. If your company does not have any specific procedures for implementing SQL Server security (written or unwritten), it is up to you how you define it. Once you have completed this part though, you will have something to refer back to in future.

First and foremost, ensure there are as few members in the sysadmin server role as possible. The Local Administrators Windows group is a member of the sysadmin role by default. Unless you want to have your Windows Administrators to have access to the SQL Server, you should get rid of this group from the sysadmin fixed server role. The next step would be to revoke the login for this group.

If you have a pre-defined account that you use for database administration, add that to the sysadmin server role.

You can also create Credentials and Proxy accounts at this stage if you want your jobs to run SSIS packages or operating system commands under those accounts.

Step 8: Configure Error Logs

SQL Server error log is the first port of call for DBAs when they troubleshoot server related issues. By default, the server keeps six logs as archive and one log as current. Every time SQL service starts, the system begins a new log file. The currently active log file name is ERRORLOG. The log file before restart (previous ERRORLOG) becomes archived as ERRORLOG.1, the log file before that (named ERRORLOG.2) becomes ERRORLOG.3 and so on.

One of the annoying things with default error log behaviour is that any one file can contain a number of days of entries. When you try to view a file, it can take some time to load. If you are interested in a particular day’s events, you will need to sift through the entries or use a filter.

Also, log files are overwritten in a rolling fashion after six files are created. So that means you won’t have any history before the oldest log file (ERRORLOG.6).

What you can do here is to configure SQL Server to keep a predefined number of log files. You can also create a scheduled job to “recycle” the error logs regularly. Recycling means that SQL Server will start a new log file without restarting the service.

To keep a pre-defined number of archived logs, you will need to configure the SQL error log properties.

As you can see, I would recommend keeping at least the last thirty log files.

Once the number of error log files is configured, you can create a scheduled job to run every day to reinitialise the log. The job will have only one step that will call the sp_cycle_errorlog system stored procedure. This procedure starts a new error log file without restarting the SQL service. I usually schedule it to run at 12:00 AM every morning.

The combined effect of these two changes is that you will have one new log file created every day at midnight. If your server does not restart in between, you will have the last one month’s log entries in disk with each day’s log in one separate file.

The same kind of job can be defined for SQL Agent error log files. The latest Agent error log is named SQLAGENT.OUT. Executing the msdb database’s sp_cycle_agent_errorlog procedure from a job will create a new Agent log file and archive the previous one. Unlike SQL Server error logs though, you cannot use a dialogue box to specify the number of archived Agent error logs to keep.

Step 9: Configure Database Mail

This step is optional. If your databases are going to send e-mails, you will first need to enable database mail from the Surface Area Configuration tool (SQL 2005) or the Server Properties’ Surface Area Configuration Facet (for SQL 2008).

Once enabled, you can easily configure it using a wizard in the Management Studio.

You can also enable the mail profile for SQL Agent service.

If your apps are not going to use the mail feature – don’t enable or configure it.

Step 10: Configure SQL Agent & Operator

We are almost at the end of our configuration process. It is now worth having a look at the SQL Agent properties.  The three most important things that can be done here are:

a)    Configuring SQL and Agent services to restart when they stop unexpectedly (discussed before)

b)    Enabling the mail profile for SQL Agent. You can do it from the SQL Agent’s Alert System properties

c)    Increasing the size of the job history log in the msdb database

Changing the size of the job history log in the msdb database is usually a good idea if your server is going to host a large number of scheduled jobs.

Define DBA Operators:

Every database server should have one standard DBA operator defined. This operator will be notified via e-mail when jobs fail or alerts fire. Do not create yourself as this operator – if you leave the company, your e-mail account is likely to be deleted and those who come after you will need to spend time for reconfiguring everything as the notifications will fail. That’s why you should ask your system administrators to create a standard DBA account and assign a mailbox to it.

Step 11: Configure Operating System

You can also make some OS level changes to give some extra power to SQL. This works if you have administrative privilege on the Local Windows server.

If the database application is going to participate in distributed transactions through linked servers, ensure that Network DTC is enabled.

To reach this screen in a Windows Server 2003 system,

1.    Start the Component Service applet (Start > Administrative Tools > Component Services)

2.    Expand Component Services node and then Computers node in the left side pane

3.    Right Click on My Computer and choose Properties

4.    Choose the MS DTC tab

5.    Click on Security Configuration screen

The steps are similar for Windows Server 2008

You should also give your SQL Service account the right to “Lock pages in memory”. This ensures that Windows will not swap out SQL pages from the memory on to the paging file on disk. In fact from version 2005, SQL Server can detect a memory pressure from OS signal and can dynamically allocate / de-allocate pages itself.

To configure this property, assign the SQL service account to this privilege from the Local Security Policy.


Step 12: Set up Maintenance Plans for System Databases

Your server is now ready for production database rollouts. To wrap it up all, take a moment to create a maintenance plan for the system databases. This maintenance plan should reorganise the indexes, run DBCC CHECKS and update the statistics. It should also back up the master, model and the msdb databases.  As a best practice, the plan should backup the system databases once every day. When you have created the maintenance plan, run each of the tasks manually to see if they succeeded.

Conclusion

Although it is not possible to describe every possible installation scenario, the steps discussed here should be common in every installation. You should customise this list to incorporate any extra configuration necessary. For example, you may need to configure Reporting Services after SQL configuration is complete. Perhaps there are custom jobs or DBA stored procedures that you roll out in every SQL Server in the company. In each case, these extra steps will form parts of your post-installation configuration.


Read Full Post »

Older Posts »