Excel上的VBA“Out of Memory”错误

时间:2022-11-19 23:31:00

I'm on Excel 2010, on an admittedly very large sheet (400k rows X 20 columns).

我在Excel 2010上,在一张非常大的纸张上(400k行X 20列)。

My code aims to:

我的代码旨在:

  • load the entire sheet into an array
  • 将整个工作表加载到一个数组中
  • examine every row for a certain criteria
  • 检查每一行的特定标准
  • rows which qualify are copied to another array
  • 限定的行被复制到另一个数组
  • finally return the second array back to another sheet
  • 最后将第二个数组返回到另一个表单
  • the second array will end up being roughly 90% of the original
  • 第二个阵列最终将占原始数据的大约90%

I wrote a definition of two variable arrays as variants And tried to initialize them by copying the sheet's content twice.

我写了两个变量数组的定义作为变体并尝试通过复制工作表的内容两次来初始化它们。

first copy works, but by the second one I hit an error of "Out of memory".

第一个副本工作,但第二个副本我遇到了“内存不足”的错误。

Any ideas if there's a workaround? or is this just a limitation of VBA/ Excel.

有任何想法,如果有一个解决方法?或者这仅仅是VBA / Excel的限制。

Is there a way to not pre-define / initialize the destination array, and instead, let it "grow" with every successful qualification of the criteria? (On a scale of this magnitude).

有没有办法不预先定义/初始化目标数组,而是让它随着标准的每次成功限定而“增长”? (在这个量级上)。

Sub CopyPending()
Dim LastRow As Long
Dim LastCol As Integer
Dim AllRange() As Variant
Dim CopyRange() As Variant
Dim i As Long
Dim x As Long
Dim z As Long

LastCol = 21
LastRow = ActiveSheet.UsedRange.Rows.Count

AllRange = Range(Cells(2, 1), Cells(LastRow, LastCol)).Value
CopyRange = Range(Cells(2, 1), Cells(LastRow, LastCol)).Value ''' ERROR TRIGGER

i = 1
x = 1
z = 1

For i = LBound(AllRange) To UBound(AllRange) - 1
  If AllRange(i, 7) = "TestCriteria" Then
    For z = 1 To LastCol
      CopyRange(x, z) = AllRange(i, z)
    Next z
    x = x + 1
  End If
Next i

With Sheets(2)
  .Range(.Cells(2, 1), .Cells(x, LastCol)).Value = CopyRange
End With

End Sub

2 个解决方案

#1


1  

As comments on your post indicate, this error comes from shortage of working memory.

正如您的帖子上的评论所示,此错误来自工作记忆的短缺。

Each Variant type variable consumes 16 bytes, this is the reason your code require vast amount of memory. So one way to solve this problem is increase physical memory on your computer.

每个Variant类型变量占用16个字节,这就是您的代码需要大量内存的原因。因此,解决此问题的一种方法是增加计算机上的物理内存。

Other solution is filtering data by certain amount of rows.

其他解决方案是按一定数量的行过滤数据。

Sub ProcessRows()
    Dim originalData() As Variant
    Dim maxRow as Long, currentRow as Long, incrementRow

    maxRow = ActiveSheet.Usedrange.Rows.Count
    currentRow =1
    incrementRow=5000

    While currentRow < maxRow
        Set originalData = Range(.Cells(currentRow,1),.Cells(currentRow+incrementRow-1,20)

        your process to filter data

        currentRow = currentRow +incrementRow
    Wend
End Sub 

Of course you can go with row by row approach, but I assume you use array variable to speed up your code, so I do not recommend to use row by row approach.

当然你可以逐行进行,但我假设你使用数组变量来加速你的代码,所以我不建议使用逐行方法。

#2


1  

Working row by row is extremely slow so this is not a viable solution for such a large dataset.

逐行工作非常慢,因此对于如此大的数据集来说这不是一个可行的解决方案。

Arrays are definately the way to go so the choice is between:

数组肯定是要走的路,所以选择是:

  1. Loading the data in batches then running your processing on a contiguous data set *(viable until large amounts of data - perhaps around 8M elements depending on your system)
  2. 批量加载数据然后在连续数据集上运行处理*(可行直到大量数据 - 可能大约8M元素,具体取决于您的系统)
  3. Loading the data in batches then running your processing on the batch only (viable for an arbitrary amount of data)
  4. 批量加载数据然后仅在批处理上运行处理(对于任意数量的数据是可行的)

Edit: I see you are 400k * 20 which is pushing the boundaries of Option 1. You may have no choice but to refactor your code and load and process by batch (vs. load by batch then process together)

编辑:我看到你是400k * 20,它正在推动选项1的界限。您可能别无选择,只能重构代码并按批次加载和处理(与批量加载然后一起处理)

Note:

注意:

  • This should be fine until very large datasets as the Out of Memory error is at first not from the size of the array itself but rather from reading from the worksheet.
  • 这应该没问题,直到非常大的数据集,因为Out of Memory错误最初不是来自数组本身的大小,而是来自工作表的读取。
  • If you get an Out of Memory error from the size of the array itself, then:
    • you will have no choice but to either use 64-bit Excel;
    • 你将别无选择,只能使用64位Excel;
    • Or (better) to refactor your procedure to process the data in chunks (Option 2 above).
    • 或者(更好)重构您的过程以处理块中的数据(上面的选项2)。
  • 如果从阵列本身的大小中得到Out of Memory错误,那么:你将别无选择,只能使用64位Excel;或者(更好)重构您的过程以处理块中的数据(上面的选项2)。

The below loads the data in batches into a single array by recursively loading the data in batches. Try it - the benefits of still having one array at the end mean you don't have to restructure the rest of your code.

下面通过批量递归加载数据将批量数据加载到单个数组中。尝试一下 - 最后仍然有一个数组的好处意味着你不必重构其余的代码。

Example of Option 1:

选项1的示例:

Option Explicit

Sub example()

    Dim myCompletedataArr
    Dim myTestDataRange As Range

    Set myTestDataRange = ActiveSheet.UsedRange

    loadDataInBatches myTestDataRange, myCompletedataArr

    Debug.Assert False

End Sub


Sub loadDataInBatches(dataRange As Range, dataArr, Optional startRow As Long = 1, Optional rows As Long = 10000)
    Dim endRow As Long, i As Long, j As Long
    Dim dataArrLb1 As Long, dataArrLb2 As Long, batchArrLb1 As Long, batchArrLb2 As Long
    Dim batchArr, batchRange As Range

    If Not IsArray(dataArr) Then
        ReDim dataArr(0 To dataRange.rows.Count - 1, 0 To dataRange.Columns.Count - 1)
    End If 'otherwise assume dataArr is correctly dimensioned (for simplicity)

    endRow = WorksheetFunction.Min(startRow + rows - 1, dataRange.rows.Count)

    If endRow <= startRow Then Exit Sub

    Set batchRange = dataRange.rows(startRow & ":" & endRow)

    batchArr = batchRange.Value

    'cache lower bounds as we use them a lot
    dataArrLb1 = LBound(dataArr, 1): dataArrLb2 = LBound(dataArr, 2)
    batchArrLb1 = LBound(batchArr, 1): batchArrLb2 = LBound(batchArr, 2)

    For i = batchArrLb1 To UBound(batchArr, 1)
        For j = batchArrLb2 To UBound(batchArr, 2)
            dataArr(startRow - 1 + i + dataArrLb1 - batchArrLb1, j + dataArrLb2 - batchArrLb2) = batchArr(i, j)
        Next j
    Next i
    Erase batchArr 'free up some memory before the recursive call

    loadDataInBatches dataRange, dataArr, endRow + 1, rows

End Sub

#1


1  

As comments on your post indicate, this error comes from shortage of working memory.

正如您的帖子上的评论所示,此错误来自工作记忆的短缺。

Each Variant type variable consumes 16 bytes, this is the reason your code require vast amount of memory. So one way to solve this problem is increase physical memory on your computer.

每个Variant类型变量占用16个字节,这就是您的代码需要大量内存的原因。因此,解决此问题的一种方法是增加计算机上的物理内存。

Other solution is filtering data by certain amount of rows.

其他解决方案是按一定数量的行过滤数据。

Sub ProcessRows()
    Dim originalData() As Variant
    Dim maxRow as Long, currentRow as Long, incrementRow

    maxRow = ActiveSheet.Usedrange.Rows.Count
    currentRow =1
    incrementRow=5000

    While currentRow < maxRow
        Set originalData = Range(.Cells(currentRow,1),.Cells(currentRow+incrementRow-1,20)

        your process to filter data

        currentRow = currentRow +incrementRow
    Wend
End Sub 

Of course you can go with row by row approach, but I assume you use array variable to speed up your code, so I do not recommend to use row by row approach.

当然你可以逐行进行,但我假设你使用数组变量来加速你的代码,所以我不建议使用逐行方法。

#2


1  

Working row by row is extremely slow so this is not a viable solution for such a large dataset.

逐行工作非常慢,因此对于如此大的数据集来说这不是一个可行的解决方案。

Arrays are definately the way to go so the choice is between:

数组肯定是要走的路,所以选择是:

  1. Loading the data in batches then running your processing on a contiguous data set *(viable until large amounts of data - perhaps around 8M elements depending on your system)
  2. 批量加载数据然后在连续数据集上运行处理*(可行直到大量数据 - 可能大约8M元素,具体取决于您的系统)
  3. Loading the data in batches then running your processing on the batch only (viable for an arbitrary amount of data)
  4. 批量加载数据然后仅在批处理上运行处理(对于任意数量的数据是可行的)

Edit: I see you are 400k * 20 which is pushing the boundaries of Option 1. You may have no choice but to refactor your code and load and process by batch (vs. load by batch then process together)

编辑:我看到你是400k * 20,它正在推动选项1的界限。您可能别无选择,只能重构代码并按批次加载和处理(与批量加载然后一起处理)

Note:

注意:

  • This should be fine until very large datasets as the Out of Memory error is at first not from the size of the array itself but rather from reading from the worksheet.
  • 这应该没问题,直到非常大的数据集,因为Out of Memory错误最初不是来自数组本身的大小,而是来自工作表的读取。
  • If you get an Out of Memory error from the size of the array itself, then:
    • you will have no choice but to either use 64-bit Excel;
    • 你将别无选择,只能使用64位Excel;
    • Or (better) to refactor your procedure to process the data in chunks (Option 2 above).
    • 或者(更好)重构您的过程以处理块中的数据(上面的选项2)。
  • 如果从阵列本身的大小中得到Out of Memory错误,那么:你将别无选择,只能使用64位Excel;或者(更好)重构您的过程以处理块中的数据(上面的选项2)。

The below loads the data in batches into a single array by recursively loading the data in batches. Try it - the benefits of still having one array at the end mean you don't have to restructure the rest of your code.

下面通过批量递归加载数据将批量数据加载到单个数组中。尝试一下 - 最后仍然有一个数组的好处意味着你不必重构其余的代码。

Example of Option 1:

选项1的示例:

Option Explicit

Sub example()

    Dim myCompletedataArr
    Dim myTestDataRange As Range

    Set myTestDataRange = ActiveSheet.UsedRange

    loadDataInBatches myTestDataRange, myCompletedataArr

    Debug.Assert False

End Sub


Sub loadDataInBatches(dataRange As Range, dataArr, Optional startRow As Long = 1, Optional rows As Long = 10000)
    Dim endRow As Long, i As Long, j As Long
    Dim dataArrLb1 As Long, dataArrLb2 As Long, batchArrLb1 As Long, batchArrLb2 As Long
    Dim batchArr, batchRange As Range

    If Not IsArray(dataArr) Then
        ReDim dataArr(0 To dataRange.rows.Count - 1, 0 To dataRange.Columns.Count - 1)
    End If 'otherwise assume dataArr is correctly dimensioned (for simplicity)

    endRow = WorksheetFunction.Min(startRow + rows - 1, dataRange.rows.Count)

    If endRow <= startRow Then Exit Sub

    Set batchRange = dataRange.rows(startRow & ":" & endRow)

    batchArr = batchRange.Value

    'cache lower bounds as we use them a lot
    dataArrLb1 = LBound(dataArr, 1): dataArrLb2 = LBound(dataArr, 2)
    batchArrLb1 = LBound(batchArr, 1): batchArrLb2 = LBound(batchArr, 2)

    For i = batchArrLb1 To UBound(batchArr, 1)
        For j = batchArrLb2 To UBound(batchArr, 2)
            dataArr(startRow - 1 + i + dataArrLb1 - batchArrLb1, j + dataArrLb2 - batchArrLb2) = batchArr(i, j)
        Next j
    Next i
    Erase batchArr 'free up some memory before the recursive call

    loadDataInBatches dataRange, dataArr, endRow + 1, rows

End Sub