May 2, 2013 at 7:36 AM
Edited May 2, 2013 at 8:12 AM
This is the issue when one batch of source data (in SCD Merge Wizard terms, one SELECT from the source table) contains duplicate rows, considering the combination of Business Key columns.
This is certainly an issue for the SSIS SCD transformation. If two "New" duplicate rows arrive in the same buffer, they will both be sent to the "New" output and the output table will end up with duplicate Business Keys.
According to my testing, this is also an issue for the T-SQL MERGE command as generated by the SCD Merge Wizard, also resulting in duplicate Business Keys in the target table.
My solution is to add a ROW_NUMBER to the source SELECT and filter for just the first row, e.g.
SELECT * FROM (
<source table column list>
, ROW_NUMBER () OVER (
PARTITION BY BusinessKey1 , BusinessKey2
ORDER BY BusinessKey1 , BusinessKey2
) AS Business_Key_Row_Number
WHERE Business_Key_Row_Number = 1
Could this solution be added automatically by the wizard - perhaps controlled by a Configuration option? It would be nice to be able to specify extra columns to be added to the ORDER BY clause to control which row is chosen (e.g by LastUpdateDate DESC to get
the most recent row).