使用SSIS 2008将Excel导入SQL时跳过行

时间:2023-01-02 16:35:00

I need to import sheets which look like the following:

我需要导入如下所示的表格:

    March Orders   
   ***Empty Row    
    Week Order # Date Cust #
    3.1 271356 3/3/10 010572
    3.1 280353 3/5/10 022114
    3.1 290822 3/5/10 010275
    3.1 291436 3/2/10 010155
    3.1 291627 3/5/10 011840

The column headers are actually row 3. I can use an Excel Sourch to import them, but I don't know how to specify that the information starts at row 3.

列标题实际上是第3行。我可以使用Excel Sourch导入它们,但我不知道如何指定信息从第3行开始。

I Googled the problem, but came up empty.

我用Google搜索了问题,但空洞了。

4 个解决方案

#1


10  

have a look:

看一看:

the links have more details, but I've included some text from the pages (just in case the links go dead)

链接有更多的细节,但我已经从页面中包含了一些文本(以防链接失效)

http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/97144bb2-9bb9-4cb8-b069-45c29690dfeb

http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/97144bb2-9bb9-4cb8-b069-45c29690dfeb

Q:

问:

While we are loading the text file to SQL Server via SSIS, we have the provision to skip any number of leading rows from the source and load the data to SQL server. Is there any provision to do the same for Excel file.

当我们通过SSIS将文本文件加载到SQL Server时,我们可以从源中跳过任意数量的前导行并将数据加载到SQL Server。是否有任何规定对Excel文件执行相同操作。

The source Excel file for me has some description in the leading 5 rows, I want to skip it and start the data load from the row 6. Please provide your thoughts on this.

我的源Excel文件在前5行中有一些描述,我想跳过它并从第6行开始加载数据。请提供您的想法。

A:

A:

Easiest would be to give each row a number (a bit like an identity in SQL Server) and then use a conditional split to filter out everything where the number <=5

最简单的方法是给每一行一个数字(有点像SQL Server中的标识),然后使用条件分割来过滤掉数字<= 5的所有内容

http://social.msdn.microsoft.com/Forums/en/sqlintegrationservices/thread/947fa27e-e31f-4108-a889-18acebce9217

http://social.msdn.microsoft.com/Forums/en/sqlintegrationservices/thread/947fa27e-e31f-4108-a889-18acebce9217

Q:

问:

  1. Is it possible during import data from Excel to DB table skip first 6 rows for example?

    从Excel到DB表的导入数据是否可以跳过前6行?

  2. Also Excel data divided by sections with headers. Is it possible for example to skip every 12th row?

    Excel数据也除以带标题的部分。例如,是否可以跳过每12行?

A:

A:

  1. YES YOU CAN. Actually, you can do this very easily if you know the number columns that will be imported from your Excel file. In your Data Flow task, you will need to set the "OpenRowset" Custom Property of your Excel Connection (right-click your Excel connection > Properties; in the Properties window, look for OpenRowset under Custom Properties). To ignore the first 5 rows in Sheet1, and import columns A-M, you would enter the following value for OpenRowset: Sheet1$A6:M (notice, I did not specify a row number for column M. You can enter a row number if you like, but in my case the number of rows can vary from one iteration to the next)

    是的你可以。实际上,如果您知道将从Excel文件导入的数字列,则可以非常轻松地执行此操作。在“数据流”任务中,您需要设置Excel连接的“OpenRowset”自定义属性(右键单击Excel连接>“属性”;在“属性”窗口中,查找“自定义属性”下的“OpenRowset”)。要忽略Sheet1中的前5行,并导入列AM,您应为OpenRowset输入以下值:Sheet1 $ A6:M(注意,我没有为列M指定行号。如果您输入行号,则可以输入行号喜欢,但在我的情况下,行数可以从一次迭代到下一次迭代不同)

  2. AGAIN, YES YOU CAN. You can import the data using a conditional split. You'd configure the conditional split to look for something in each row that uniquely identifies it as a header row; skip the rows that match this 'header logic'. Another option would be to import all the rows and then remove the header rows using a SQL script in the database...like a cursor that deletes every 12th row. Or you could add an identity field with seed/increment of 1/1 and then delete all rows with row numbers that divide perfectly by 12. Something like that...

    再见,是的你可以。您可以使用条件拆分导入数据。您将配置条件拆分以在每一行中查找唯一标识为标题行的内容;跳过与此“标题逻辑”匹配的行。另一种选择是导入所有行,然后使用数据库中的SQL脚本删除标题行...就像删除每第12行的游标一样。或者你可以添加一个种子/增量为1/1的标识字段,然后删除行数完全除以12的所有行。这样的东西......

http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/847c4b9e-b2d7-4cdf-a193-e4ce14986ee2

http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/847c4b9e-b2d7-4cdf-a193-e4ce14986ee2

Q:

问:

I have an SSIS package that imports from an Excel file with data beginning in the 7th row.

我有一个SSIS包,从Excel文件导入包含从第7行开始的数据。

Unlike the same operation with a csv file ('Header Rows to Skip' in Connection Manager Editor), I can't seem to find a way to ignore the first 6 rows of an Excel file connection.

与使用csv文件的相同操作(连接管理器编辑器中的“Header Rows to Skip”)不同,我似乎无法找到忽略Excel文件连接的前6行的方法。

I'm guessing the answer might be in one of the Data Flow Transformation objects, but I'm not very familiar with them.

我猜测答案可能在其中一个数据流转换对象中,但我对它们并不是很熟悉。

A:

A:

Question Sign in to vote 1 Sign in to vote rbhro, actually there were 2 fields in the upper 5 rows that had some data that I think prevented the importer from ignoring those rows completely.

问题登录进行投票1登录以投票rbhro,实际上在上面5行中有2个字段,其中包含一些我认为阻止导入器完全忽略这些行的数据。

Anyway, I did find a solution to my problem.

无论如何,我确实找到了解决问题的方法。

In my Excel source object, I used 'SQL Command' as the 'Data Access Mode' (it's drop down when you double-click the Excel Source object). From there I was able to build a query ('Build Query' button) that only grabbed records I needed. Something like this: SELECT F4, F5, F6 FROM [Spreadsheet$] WHERE (F4 IS NOT NULL) AND (F4 <> 'TheHeaderFieldName')

在我的Excel源对象中,我使用'SQL Command'作为'数据访问模式'(当您双击Excel Source对象时它是下拉列表)。从那里我能够构建一个查询('构建查询'按钮),只获取我需要的记录。像这样:SELECT F4,F5,F6 FROM [Spreadsheet $] WHERE(F4 IS NOT NULL)AND(F4 <>'TheHeaderFieldName')

Note: I initially tried an ISNUMERIC instead of 'IS NOT NULL', but that wasn't supported for some reason.

注意:我最初尝试使用ISNUMERIC而不是“IS NOT NULL”,但由于某种原因不支持。

In my particular case, I was only interested in rows where F4 wasn't NULL (and fortunately F4 didn't containing any junk in the first 5 rows). I could skip the whole header row (row 6) with the 2nd WHERE clause.

在我的特定情况下,我只对F4不为NULL的行感兴趣(幸运的是F4在前5行中没有包含任何垃圾)。我可以使用第二个WHERE子句跳过整个标题行(第6行)。

So that cleaned up my data source perfectly. All I needed to do now was add a Data Conversion object in between the source and destination (everything needed to be converted from unicode in the spreadsheet), and it worked.

这样就可以完美地清理我的数据源。我现在需要做的就是在源和目标之间添加一个数据转换对象(所有需要从电子表格中的unicode转换的对象),并且它有效。

#2


1  

My first suggestion is not to accept a file in that format. Excel files to be imported should always start with column header rows. Send it back to whoever provides it to you and tell them to fix their format. This works most of the time.

我的第一个建议是不接受该格式的文件。要导入的Excel文件应始终以列标题行开头。将其发送给任何提供给您的人并告诉他们修改他们的格式。这大部分时间都有效。

We provide guidance to our customers and vendors about how files must be formatted before we can process them and it is up to them to meet the guidlines as much as possible. People often aren't aware that files like that create a problem in processing (next month it might have six lines before the data starts) and they need to be educated that Excel files must start with the column headers, have no blank lines in the middle of the data and no repeating the headers multiple times and most important of all, they must have the same columns with the same column titles in the same order every time. If they can't provide that then you probably don't have something that will work for automated import as you will get the file in a differnt format everytime depending on the mood of the person who maintains the Excel spreadsheet. Incidentally, we push really hard to never receive any data from Excel (only works some of the time, but if they have the data in a database, they can usually accomodate). They also must know that any changes they make to the spreadsheet format will result in a change to the import package and that they willl be charged for those development changes (assuming that these are outside clients and not internal ones). These changes must be communicated in advance and developer time scheduled, a file with the wrong format will fail and be returned to them to fix if not.

我们为客户和供应商提供有关如何在处理文件之前必须格式化文件的指导,并且他们应尽可能地满足指导。人们通常不知道这样的文件会在处理过程中产生问题(下个月它可能在数据启动之前有六行),并且他们需要了解Excel文件必须以列标题开头,没有空行。数据的中间部分并没有多次重复标题,最重要的是,它们必须每次都以相同的顺序具有相同的列标题。如果他们无法提供,那么您可能没有适合自动导入的内容,因为每次都会以不同的格式获取文件,具体取决于维护Excel电子表格的人的心情。顺便说一下,我们非常努力地从不接收来自Excel的任何数据(只在某些时候工作,但如果他们在数据库中有数据,他们通常可以容纳)。他们还必须知道,他们对电子表格格式所做的任何更改都将导致对导入包进行更改,并且他们将对这些开发更改收取费用(假设这些更改是在客户端之外而不是内部客户端)。必须事先传达这些更改并安排开发人员时间,格式错误的文件将失败并返回给他们以修复,如果没有。

If that doesn't work, may I suggest that you open the file, delete the first two rows and save a text file in a data flow. Then write a data flow that will process the text file. SSIS did a lousy job of supporting Excel and anything you can do to get the file in a different format will make life easier in the long run.

如果这不起作用,我可以建议您打开文件,删除前两行并将文本文件保存在数据流中。然后编写将处理文本文件的数据流。 SSIS在支持Excel方面做得很糟糕,而且你可以做的任何事情都可以使文件以不同的格式运行,从长远来看,这将使生活更轻松。

#3


1  

You can just use the OpenRowset property you can find in the Excel Source properties. Take a look here for details:

您可以使用Excel Source属性中的OpenRowset属性。看看这里的详细信息:

SSIS: Read and Export Excel data from nth Row

SSIS:从第n行读取和导出Excel数据

Regards.

问候。

#4


0  

My first suggestion is not to accept a file in that format. Excel files to be imported should always start with column header rows. Send it back to whoever provides it to you and tell them to fix their format. This works most of the time.

我的第一个建议是不接受该格式的文件。要导入的Excel文件应始终以列标题行开头。将其发送给任何提供给您的人并告诉他们修改他们的格式。这大部分时间都有效。

Not entirely correct.

不完全正确。

SSIS forces you to use the format and quite often it does not work correctly with excel

SSIS强制您使用该格式,并且通常它与excel无法正常工作

If you can't change he format consider using our Advanced ETL Processor.

如果您无法更改格式,请考虑使用我们的高级ETL处理器。

You can skip rows or fields and you can validate the data the way you want.

您可以跳过行或字段,并且可以按照自己的方式验证数据。

http://www.dbsoftlab.com/etl-tools/advanced-etl-processor/overview.html

http://www.dbsoftlab.com/etl-tools/advanced-etl-processor/overview.html

Sky is the limit

天空才是极限

#1


10  

have a look:

看一看:

the links have more details, but I've included some text from the pages (just in case the links go dead)

链接有更多的细节,但我已经从页面中包含了一些文本(以防链接失效)

http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/97144bb2-9bb9-4cb8-b069-45c29690dfeb

http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/97144bb2-9bb9-4cb8-b069-45c29690dfeb

Q:

问:

While we are loading the text file to SQL Server via SSIS, we have the provision to skip any number of leading rows from the source and load the data to SQL server. Is there any provision to do the same for Excel file.

当我们通过SSIS将文本文件加载到SQL Server时,我们可以从源中跳过任意数量的前导行并将数据加载到SQL Server。是否有任何规定对Excel文件执行相同操作。

The source Excel file for me has some description in the leading 5 rows, I want to skip it and start the data load from the row 6. Please provide your thoughts on this.

我的源Excel文件在前5行中有一些描述,我想跳过它并从第6行开始加载数据。请提供您的想法。

A:

A:

Easiest would be to give each row a number (a bit like an identity in SQL Server) and then use a conditional split to filter out everything where the number <=5

最简单的方法是给每一行一个数字(有点像SQL Server中的标识),然后使用条件分割来过滤掉数字<= 5的所有内容

http://social.msdn.microsoft.com/Forums/en/sqlintegrationservices/thread/947fa27e-e31f-4108-a889-18acebce9217

http://social.msdn.microsoft.com/Forums/en/sqlintegrationservices/thread/947fa27e-e31f-4108-a889-18acebce9217

Q:

问:

  1. Is it possible during import data from Excel to DB table skip first 6 rows for example?

    从Excel到DB表的导入数据是否可以跳过前6行?

  2. Also Excel data divided by sections with headers. Is it possible for example to skip every 12th row?

    Excel数据也除以带标题的部分。例如,是否可以跳过每12行?

A:

A:

  1. YES YOU CAN. Actually, you can do this very easily if you know the number columns that will be imported from your Excel file. In your Data Flow task, you will need to set the "OpenRowset" Custom Property of your Excel Connection (right-click your Excel connection > Properties; in the Properties window, look for OpenRowset under Custom Properties). To ignore the first 5 rows in Sheet1, and import columns A-M, you would enter the following value for OpenRowset: Sheet1$A6:M (notice, I did not specify a row number for column M. You can enter a row number if you like, but in my case the number of rows can vary from one iteration to the next)

    是的你可以。实际上,如果您知道将从Excel文件导入的数字列,则可以非常轻松地执行此操作。在“数据流”任务中,您需要设置Excel连接的“OpenRowset”自定义属性(右键单击Excel连接>“属性”;在“属性”窗口中,查找“自定义属性”下的“OpenRowset”)。要忽略Sheet1中的前5行,并导入列AM,您应为OpenRowset输入以下值:Sheet1 $ A6:M(注意,我没有为列M指定行号。如果您输入行号,则可以输入行号喜欢,但在我的情况下,行数可以从一次迭代到下一次迭代不同)

  2. AGAIN, YES YOU CAN. You can import the data using a conditional split. You'd configure the conditional split to look for something in each row that uniquely identifies it as a header row; skip the rows that match this 'header logic'. Another option would be to import all the rows and then remove the header rows using a SQL script in the database...like a cursor that deletes every 12th row. Or you could add an identity field with seed/increment of 1/1 and then delete all rows with row numbers that divide perfectly by 12. Something like that...

    再见,是的你可以。您可以使用条件拆分导入数据。您将配置条件拆分以在每一行中查找唯一标识为标题行的内容;跳过与此“标题逻辑”匹配的行。另一种选择是导入所有行,然后使用数据库中的SQL脚本删除标题行...就像删除每第12行的游标一样。或者你可以添加一个种子/增量为1/1的标识字段,然后删除行数完全除以12的所有行。这样的东西......

http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/847c4b9e-b2d7-4cdf-a193-e4ce14986ee2

http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/847c4b9e-b2d7-4cdf-a193-e4ce14986ee2

Q:

问:

I have an SSIS package that imports from an Excel file with data beginning in the 7th row.

我有一个SSIS包,从Excel文件导入包含从第7行开始的数据。

Unlike the same operation with a csv file ('Header Rows to Skip' in Connection Manager Editor), I can't seem to find a way to ignore the first 6 rows of an Excel file connection.

与使用csv文件的相同操作(连接管理器编辑器中的“Header Rows to Skip”)不同,我似乎无法找到忽略Excel文件连接的前6行的方法。

I'm guessing the answer might be in one of the Data Flow Transformation objects, but I'm not very familiar with them.

我猜测答案可能在其中一个数据流转换对象中,但我对它们并不是很熟悉。

A:

A:

Question Sign in to vote 1 Sign in to vote rbhro, actually there were 2 fields in the upper 5 rows that had some data that I think prevented the importer from ignoring those rows completely.

问题登录进行投票1登录以投票rbhro,实际上在上面5行中有2个字段,其中包含一些我认为阻止导入器完全忽略这些行的数据。

Anyway, I did find a solution to my problem.

无论如何,我确实找到了解决问题的方法。

In my Excel source object, I used 'SQL Command' as the 'Data Access Mode' (it's drop down when you double-click the Excel Source object). From there I was able to build a query ('Build Query' button) that only grabbed records I needed. Something like this: SELECT F4, F5, F6 FROM [Spreadsheet$] WHERE (F4 IS NOT NULL) AND (F4 <> 'TheHeaderFieldName')

在我的Excel源对象中,我使用'SQL Command'作为'数据访问模式'(当您双击Excel Source对象时它是下拉列表)。从那里我能够构建一个查询('构建查询'按钮),只获取我需要的记录。像这样:SELECT F4,F5,F6 FROM [Spreadsheet $] WHERE(F4 IS NOT NULL)AND(F4 <>'TheHeaderFieldName')

Note: I initially tried an ISNUMERIC instead of 'IS NOT NULL', but that wasn't supported for some reason.

注意:我最初尝试使用ISNUMERIC而不是“IS NOT NULL”,但由于某种原因不支持。

In my particular case, I was only interested in rows where F4 wasn't NULL (and fortunately F4 didn't containing any junk in the first 5 rows). I could skip the whole header row (row 6) with the 2nd WHERE clause.

在我的特定情况下,我只对F4不为NULL的行感兴趣(幸运的是F4在前5行中没有包含任何垃圾)。我可以使用第二个WHERE子句跳过整个标题行(第6行)。

So that cleaned up my data source perfectly. All I needed to do now was add a Data Conversion object in between the source and destination (everything needed to be converted from unicode in the spreadsheet), and it worked.

这样就可以完美地清理我的数据源。我现在需要做的就是在源和目标之间添加一个数据转换对象(所有需要从电子表格中的unicode转换的对象),并且它有效。

#2


1  

My first suggestion is not to accept a file in that format. Excel files to be imported should always start with column header rows. Send it back to whoever provides it to you and tell them to fix their format. This works most of the time.

我的第一个建议是不接受该格式的文件。要导入的Excel文件应始终以列标题行开头。将其发送给任何提供给您的人并告诉他们修改他们的格式。这大部分时间都有效。

We provide guidance to our customers and vendors about how files must be formatted before we can process them and it is up to them to meet the guidlines as much as possible. People often aren't aware that files like that create a problem in processing (next month it might have six lines before the data starts) and they need to be educated that Excel files must start with the column headers, have no blank lines in the middle of the data and no repeating the headers multiple times and most important of all, they must have the same columns with the same column titles in the same order every time. If they can't provide that then you probably don't have something that will work for automated import as you will get the file in a differnt format everytime depending on the mood of the person who maintains the Excel spreadsheet. Incidentally, we push really hard to never receive any data from Excel (only works some of the time, but if they have the data in a database, they can usually accomodate). They also must know that any changes they make to the spreadsheet format will result in a change to the import package and that they willl be charged for those development changes (assuming that these are outside clients and not internal ones). These changes must be communicated in advance and developer time scheduled, a file with the wrong format will fail and be returned to them to fix if not.

我们为客户和供应商提供有关如何在处理文件之前必须格式化文件的指导,并且他们应尽可能地满足指导。人们通常不知道这样的文件会在处理过程中产生问题(下个月它可能在数据启动之前有六行),并且他们需要了解Excel文件必须以列标题开头,没有空行。数据的中间部分并没有多次重复标题,最重要的是,它们必须每次都以相同的顺序具有相同的列标题。如果他们无法提供,那么您可能没有适合自动导入的内容,因为每次都会以不同的格式获取文件,具体取决于维护Excel电子表格的人的心情。顺便说一下,我们非常努力地从不接收来自Excel的任何数据(只在某些时候工作,但如果他们在数据库中有数据,他们通常可以容纳)。他们还必须知道,他们对电子表格格式所做的任何更改都将导致对导入包进行更改,并且他们将对这些开发更改收取费用(假设这些更改是在客户端之外而不是内部客户端)。必须事先传达这些更改并安排开发人员时间,格式错误的文件将失败并返回给他们以修复,如果没有。

If that doesn't work, may I suggest that you open the file, delete the first two rows and save a text file in a data flow. Then write a data flow that will process the text file. SSIS did a lousy job of supporting Excel and anything you can do to get the file in a different format will make life easier in the long run.

如果这不起作用,我可以建议您打开文件,删除前两行并将文本文件保存在数据流中。然后编写将处理文本文件的数据流。 SSIS在支持Excel方面做得很糟糕,而且你可以做的任何事情都可以使文件以不同的格式运行,从长远来看,这将使生活更轻松。

#3


1  

You can just use the OpenRowset property you can find in the Excel Source properties. Take a look here for details:

您可以使用Excel Source属性中的OpenRowset属性。看看这里的详细信息:

SSIS: Read and Export Excel data from nth Row

SSIS:从第n行读取和导出Excel数据

Regards.

问候。

#4


0  

My first suggestion is not to accept a file in that format. Excel files to be imported should always start with column header rows. Send it back to whoever provides it to you and tell them to fix their format. This works most of the time.

我的第一个建议是不接受该格式的文件。要导入的Excel文件应始终以列标题行开头。将其发送给任何提供给您的人并告诉他们修改他们的格式。这大部分时间都有效。

Not entirely correct.

不完全正确。

SSIS forces you to use the format and quite often it does not work correctly with excel

SSIS强制您使用该格式,并且通常它与excel无法正常工作

If you can't change he format consider using our Advanced ETL Processor.

如果您无法更改格式,请考虑使用我们的高级ETL处理器。

You can skip rows or fields and you can validate the data the way you want.

您可以跳过行或字段,并且可以按照自己的方式验证数据。

http://www.dbsoftlab.com/etl-tools/advanced-etl-processor/overview.html

http://www.dbsoftlab.com/etl-tools/advanced-etl-processor/overview.html

Sky is the limit

天空才是极限