Tuesday, September 04, 2007

Randomly selecting records from the Table - SQL Server

If you would like to retrieve the 10 records randomly, you have to use either RAND() or NEWID() function.

SELECT TOP 10 EmployeeID, RAND() AS RNumber FROM dbo.Employee ORDER BY 2

SELECT TOP 10 EmployeeID FROM dbo.Employee ORDER BY NEWID()

The above 2 methods of retrieving the records from the table have own mertis and demerits.

RAND() methods will give same sample of records in every run. so you have to come with own logic to produce different random number for every records.

NEWID() method give easy solution for the above problem. The idea behind this method is having of unique identifier for each rows. SQL Server maintain this for every rows.

There will be performance overhead problem in this approach when you are using this for TABLE having more records. Because its scans through whole table. you can avoid the full scan by limiting the records using WHERE condition.

SQL Server 2005.

SQL Server 2005 provides new option to get the records randonly. That new option is
TABLESAMPLE. This keyword is used with FROM clause. This approach could not read
through entire table. its just take the sample records instead of scanning entire




SELECT TOP 10 EmployeeID FROM dbo.Employee TABLESAMPLE (50 ROWS)

The above first query will not return 50 rows exactly.
First it will convert the ROWS into percent. And the select the records randomly.

Selection of random records based on DATA pages(8K) for that table instead of rows identifier. so it will not produce the expected result.

To get the expected result, we have to use the TOP clause in the select query. The TOP #(number) should be less than selection of records specified in the TABLESAMPLE.

Sometimes,Even this appraoch will not produce the expected result.so you have to use this approach carefully with your logic.

No comments: