SSIS: Transform All String Columns in a Data Flow Stream

Written by Kopis 04/02/12


How many times has a Quality Assurance person asked, “Can we trim the extra spaces off all of the text columns?” Or even “Can we convert all of the blanks to nulls?” In the past, I would just roll my eyes (mostly because I forgot to check this before sending to QA in the first place) and go through the monotonous process of updating each of my string columns with a trim and conversion to null. However, I recently was asked to do just this for a file/table containing over 200 columns. I figured there must be an easier way. After a few minutes of searching through the capabilities of an SSIS Script Task, I created the following solution using a few lines of C#:

  1. Place a Script Component in the Data Flow where the data needs to be trimmed and converted from blank to null. Choose “Transformation” as the type and be sure to connect the source to the Script Task.
  2. Double click the new Script Task to open its editor
  3. Select Input Columns from the left pane, select the string columns and set the “Usage Type” of each to “ReadWrite” (I just set all of my columns to ReadWrite to make sure)
  4. Select Script from the left pane
  5. Choose Microsoft Visual C# 2008 for the ScriptLanguage property
  6. Click “Edit Script…” button
  7. Remove everything inside of the ScriptMain class (Remove all 3 methods: PreExecute(), PostExecute() and Input0_ProcessInputRow(Input0Buffer Row)
  8. Insert the following code inside of the curly braces for the ScriptMain class:

 

The function loops through all columns in the stream, searching for columns of data type DT_STR or DT_WSTR, trims the data, and converts empty strings to nulls.  However, this function could easily be tweaked to perform any operation that would apply to all columns of a single data type.  Simply update the line that contains “type=DataType.DT_STR” to contain the data types to be operated on and edit the block of code inside to perform the transformations that are needed for those data types.  To set the output data, use any of the Buffer.Set* methods, passing in the column index (i in the example) and new data (colData in the example) after applying the necessary transformation logic.  To set a column null, use the Buffer.SetNull method, passing only the index to the column that needs to be null.