使用Excel VBA在多维数组中查找(不删除)重复值(行)

时间:2022-07-18 13:02:28

Building off of one of my past questions
What I'm looking to accomplish:

建立我过去的一个问题我想要实现的目标:

I'm looking to find and highlight duplicate Upcharges using VBA code based on multiple criteria:

我正在寻找并根据多个标准使用VBA代码突出显示重复的Upcharges:

  1. Product's XID (Column A)
  2. 产品的XID(A列)
  3. Upcharge Criteria 1 (Column CT)
  4. 上行标准1(CT列)
  5. Upcharge Criteria 2 (Column CU)
  6. 上行标准2(列CU)
  7. Upcharge Type (Column CV) and
  8. 上电类型(列CV)和
  9. Upcharge Level (Column CW)
  10. 上升电平(列CW)

If there is more than one instance/row in a spreadsheet that share/match ALL of these criteria then that means the Upcharge is a duplicate. As seen in my previous post linked above:

如果电子表格中有多个实例/行共享/匹配所有这些条件,那么这意味着Upcharge是重复的。正如我上面链接的帖子所示:

What I've tried:

我尝试过的:

  1. Created a general formula (see below) that is inserted into a Helper column and copied all the way down the spreadsheet which points out which Upcharges are duplicate. This method was too resource heavy and took too long (8-10 minutes for all the formulas to calculate, but doesn't lag when filtering). Then I tried
  2. 创建了一个通用公式(见下文),该公式插入到Helper列中并一直复制到电子表格中,指出哪些Upcharges是重复的。这种方法资源太重,耗时太长(所有公式计算时间为8-10分钟,但过滤时不会滞后)。然后我试了一下
  3. Evolved the general formula into a Conditional Formatting Formula and applied it to the Upcharge Name column via VBA code.(Takes same amount of time AND lags when filtering)
  4. 将通用公式演变为条件格式公式,并通过VBA代码将其应用于Upcharge Name列。(过滤时花费相同的时间和滞后)
  5. I've also looked into possibly using a scripting.dictionary, but I'm not sure how (or if) that would work with a multi-dimensional array.
  6. 我也研究过可能使用scripting.dictionary,但我不确定如何(或者如果)可以使用多维数组。

Now I've finally found the method I think will be much faster,

现在我终于找到了我认为会更快的方法,

The faster method I'm looking to use: Dumping the aforementioned columns into a multi-dimensional array, finding the duplicate "rows" in the array, then highlighting the corresponding spreadsheet rows.

我想要使​​用的更快的方法:将上述列转储到多维数组中,在数组中找到重复的“行”,然后突出显示相应的电子表格行。

My attempt at the faster method: Here's how I populate the multi-dimensional array

我尝试更快的方法:这是我如何填充多维数组

Sub populateArray()
    Dim arrXID() As Variant, arrUpchargeOne() As Variant, arrUpchargeTwo() As Variant, arrUpchargeType() As Variant, arrUpchargeLevel() As Variant
    Dim arrAllData() As Variant
    Dim i As Long, lrow As Long
    lrow = ActiveSheet.Cells(Rows.Count, 1).End(xlUp).Row

    arrXID = Range("A2:A" & lrow) 'amend column number
    arrUpchargeOne = Range("CT2:CT" & lrow)
    arrUpchargeTwo = Range("CU2:CU" & lrow)
    arrUpchargeType = Range("CV2:CV" & lrow)
    arrUpchargeLevel = Range("CW2:CW" & lrow)

    ReDim arrAllData(1 To UBound(arrXID, 1), 4) As Variant
        For i = 1 To UBound(arrXID, 1)
            arrAllData(i, 0) = arrXID(i, 1)
            arrAllData(i, 1) = arrUpchargeOne(i, 1)
            arrAllData(i, 2) = arrUpchargeTwo(i, 1)
            arrAllData(i, 3) = arrUpchargeType(i, 1)
            arrAllData(i, 4) = arrUpchargeLevel(i, 1)
        Next i
End Sub

I can get the columns into the array, but I get stuck from there. I'm not sure how to go about checking for the duplicate "rows" in the array.

我可以将列放入数组中,但是我从那里被卡住了。我不确定如何检查数组中的重复“行”。

My questions:

我的问题:

  1. Is there a way I can apply my formula (see below) from my first attempt in my previous post and apply it inside the array?:
  2. 有没有办法可以在我上一篇文章的第一次尝试中应用我的公式(见下文)并将其应用到数组中?:
  3. Or, even better, is there a faster way I can find the duplicate "rows" inside the array?
  4. 或者,更好的是,有更快的方法可以在数组中找到重复的“行”吗?
  5. Then how could I go about highlighting the Upcharge Name (CS) cell in the spreadsheet rows that correspond with the "rows" in the array that were flagged as duplicates?
  6. 那么我怎样才能突出显示电子表格行中的上传名称(CS)单元格,该单元格与数组中标记为重复项的“行”相对应?

Formula from my previous post for reference:

我以前的帖子中的公式供参考:

=AND(SUMPRODUCT(($A$2:$A$" & lastRow & "=$A2)*($CT$2:$CT$" & lastRow & "=$CT2)*($CU$2:$CU$" & lastRow & "=$CU2)*($CV$2:$CV$" & lastRow & "=$CV2)*($CW$2:$CW$" & lastRow & "=$CW2))>1,$CT2 <> """")"
Returns TRUE if Upcharge is a duplicate 

5 个解决方案

#1


4  

You say identify duplicates; I hear Scripting.Dictionary object.

你说识别重复;我听说Scripting.Dictionary对象。

Public Sub lminyDupes()
    Dim d As Long, str As String, vAs As Variant, vCTCWs As Variant
    Dim dDUPEs As Object                      '<~~ Late Binding
    'Dim dDUPEs As New Scripting.Dictionary   '<~~ Early Binding

    Debug.Print Timer
    Application.ScreenUpdating = False '<~~ uncomment this once you are no longer debugging

    'Remove the next line with Early Binding¹
    Set dDUPEs = CreateObject("Scripting.Dictionary")
    dDUPEs.comparemode = vbTextCompare

    With Worksheets("Upcharge") '<~~ you know what worksheet you are supposed to be on
        With .Cells(1, 1).CurrentRegion
            With .Resize(.Rows.Count - 1, .Columns.Count).Offset(1, 0)
                .Columns(97).Interior.Pattern = xlNone  '<~~ reset column CS

                'the following is intended to mimic a CF rule using this formula
                '=AND(COUNTIFS(A:A, A2, CT:CT, CT2, CU:CU, CU2, CV:CV, CV2, CW:CW, CW2)>1, SIGN(LEN(CT2)))

                vAs = .Columns(1).Value2
                vCTCWs = Union(.Columns(98), .Columns(99), .Columns(100), .Columns(101)).Value2

                For d = LBound(vAs, 1) To UBound(vAs, 1)
                    If CBool(Len(vCTCWs(d, 1))) Then
                        'make a key of the criteria values
                        str = Join(Array(vAs(d, 1), vCTCWs(d, 1), vCTCWs(d, 2), vCTCWs(d, 3), vCTCWs(d, 4)), ChrW(8203))
                        If dDUPEs.exists(str) Then
                            'the comboned key exists in the dictionary; append the current row
                            dDUPEs.Item(str) = dDUPEs.Item(str) & Chr(44) & "CS" & d
                        Else
                            'the combined key does not exist in the dictionary; store the current row
                            dDUPEs.Add Key:=str, Item:="CS" & d
                        End If
                    End If
                Next d

                'reuse a variant var to provide row highlighting
                Erase vAs
                For Each vAs In dDUPEs.keys
                    'if there is more than a single cell address, highlight all
                    If CBool(InStr(1, dDUPEs.Item(vAs), Chr(44))) Then _
                        .Range(dDUPEs.Item(vAs)).Interior.Color = vbRed
                Next vAs
            End With
        End With

    End With

    dDUPEs.RemoveAll: Set dDUPEs = Nothing
    Erase vCTCWs

    Application.ScreenUpdating = True
    Debug.Print Timer

End Sub

This seems faster than the formula approach.

这似乎比公式方法更快。


¹ If you plan to convert the late binding of the Scripting.Dictionary object to early binding, you must add Microsoft Scripting Runtime to the VBE's Tools ► References.

¹如果您计划将Scripting.Dictionary对象的后期绑定转换为早期绑定,则必须将Microsoft Scripting Runtime添加到VBE的工具►参考。

#2


1  

Conditional Formatting and Filtering

SUMPRODUCT vs COUNTIFS

SUMPRODUCT vs COUNTIFS

First off, your choice of functions was inappropriate for such a large number of rows coupled with several conditions. A COUNTIFS function can perform many of the same multiple criteria operations that a SUMPRODUCT function can but in typically 25-35% of the calculation load and time. Additionally, full column references can be used without detriment in COUNTIFS as the column references are internally truncated at the limits of the Worksheet.UsedRange property.

首先,您选择的功能不适用于如此大量的行以及多个条件。 COUNTIFS函数可以执行SUMPRODUCT函数可以执行的许多相同的多标准操作,但通常占计算负载和时间的25-35%。此外,可以在COUNTIFS中使用完整列引用而不会产生任何损害,因为列引用在Worksheet.UsedRange属性的限制内部被截断。

Your standard formula can be written with COUNTIFS as,

您的标准公式可以用COUNTIFS编写,

=AND(COUNTIFS(A:A, A2, CT:CT, CT2, CU:CU, CU2, CV:CV, CV2, CW:CW, CW2)>1, CT2<>"")
'... or,
=COUNTIFS(A:A, A2, CT:CT, CT2, CT:CT, "<>", CU:CU, CU2, CV:CV, CV2, CW:CW, CW2)>1

Bringing the non-blank column CT condition directly into the COUNTIFS function actually improved calculation time slightly.

将非空白列CT条件直接引入COUNTIFS函数实际上略微改善了计算时间。

Only Calculate When You Have To

只在必要时计算

The original formula can be broken down to two main conditions.

原始公式可以分解为两个主要条件。

  1. Is the cell in column CT non-blank?
  2. 列CT中的单元格是否为空白?
  3. Do the values in five columns match the same five columns any other row?
  4. 五列中的值是否与任何其他行的相同五列匹配?

A rudimentary IF function halts processing if the condition is not true. If the test for a non-blank cell in column CT is moved into a wrapping IF then the COUNTIFS (the bulk of the calculation) will only be processed if there is a value in the current row's CT column.

如果条件不为真,则基本IF函数将停止处理。如果将CT列中的非空单元格的测试移动到包装IF中,则只有当前行的CT列中存在值时才会处理COUNTIFS(计算的大部分)。

The improved standard formula becomes,

改进的标准公式变为,

=IF(CT2<>"", COUNTIFS(A:A, A2, CT:CT, CT2, CU:CU, CU2, CV:CV, CV2, CW:CW, CW2)>1)

The benefits for this modification depend upon the number of blank cells in column CT. If only 1% of the 15,000 cells are blank, very little improvement will be noticed. However, if 50% of the cells in column CT are typically blank there will be a substantial improvement as you are literally knocking your calculation cycles in half.

这种修改的好处取决于CT列中的空白细胞数量。如果15,000个细胞中只有1%是空白的,那么注意到很少的改善。但是,如果CT列中50%的细胞通常是空白的,那么将会有很大的改进,因为您实际上将计算周期缩减了一半。

Sorting the Data to Limit the Ranges

对数据进行排序以限制范围

By far, the biggest calculation parasite is with the COUNTIFS looking through 15,000 rows of data in five separate columns. If the data was sorted on one or more of the criteria columns then it becomes unnecessary to look through all 15,000 rows for matches to all five columns of criteria.

到目前为止,最大的计算寄生虫是COUNTIFS在五个单独的列中查看15,000行数据。如果数据在一个或多个条件列上排序,则无需查看所有15,000行以匹配所有五列标准。

For the purpose of this exercise, it will be assumed that column A is sorted in an ascending manner. If you want to test the hypothesis discussed here, sort the data now.

出于本练习的目的,将假设列A以升序方式排序。如果要测试此处讨论的假设,请立即对数据进行排序。

The INDEX function does more than return a value; it actually returns a valid cell address. When used in its most common lookup capacity, you see the value returned but in reality, unlike a similar VLOOKUP operation which only return the cell's value, INDEX is returning the actual cell; e.g. =A1, not the 99 that A1 contains. This hyper-functionality can be used to create valid ranges that can be used in other functions. e.g. A2:A9 can also be written as INDEX(A:A, 2):INDEX(A:A, 9).

INDEX函数不只是返回一个值;它实际返回一个有效的单元格地址。当在最常见的查找容量中使用时,您会看到返回的值,但实际上,与仅返回单元格值的类似VLOOKUP操作不同,INDEX返回实际单元格;例如= A1,而不是A1包含的99。此超级功能可用于创建可用于其他功能的有效范围。例如A2:A9也可以写成INDEX(A:A,2):INDEX(A:A,9)。

This functionality cannot be used directly within a Conditional Formatting rule. However, it can be used in a Named Range and a Named Range can be used in a Conditional Formatting rule.

无法在条件格式规则中直接使用此功能。但是,它可以在命名范围中使用,命名范围可以在条件格式规则中使用。

tl;dr

Sub lminyCFrule()

    Debug.Print Timer
    'Application.ScreenUpdating = False '<~~ uncomment this once you are no longer debugging
    On Error Resume Next    '<~~ needed for deleting objects without checking to see if they exist

    With Worksheets("Upcharge") '<~~ you know what worksheet you are supposed to be on
        If .AutoFilterMode Then .AutoFilterMode = False

        'delete any existing defined name called 'localXID' or 'local200'
        With .Parent
            .Names("localXID").Delete
            .Names("local200").Delete
        End With

        'create a new defined name called 'localXID' for CF rule method 1
        .Names.Add Name:="localXID", RefersToR1C1:= _
            "=INDEX('" & .Name & "'!C1:C104, MATCH('" & .Name & "'!RC1, '" & .Name & "'!C1, 0), 0):" & _
             "INDEX('" & .Name & "'!C1:C104, MATCH('" & .Name & "'!RC1, '" & .Name & "'!C1 ), 0)"
        'create a new defined name called 'local200' for CF rule method 2
        .Names.Add Name:="local200", RefersToR1C1:= _
            "=INDEX(Upcharge!C1:C104, MAX(2, ROW()-100), 0):INDEX(Upcharge!C1:C101, ROW()+100, 0)"

        With .Cells(1, 1).CurrentRegion
            'sort on column A in ascending order
             .Cells.Sort Key1:=.Columns(1), Order1:=xlAscending, _
                        Orientation:=xlTopToBottom, Header:=xlYes

            'create a CF rule on column CS
            With .Resize(.Rows.Count - 1, 1).Offset(1, 96)
                With .FormatConditions
                    .Delete
                    ' method 1 and method 2. Only use ONE of these!
                    ' method 1 - definitively start and end of XIDs in column A (slower, no mistakes)
                    '.Add Type:=xlExpression, Formula1:= _
                        "=IF(CT2<>"""", COUNTIFS(INDEX(localXID, 0, 1), A2, INDEX(localXID, 0, 98), CT2," & _
                                                "INDEX(localXID, 0, 99), CU2, INDEX(localXID, 0, 100), CV2," & _
                                                "INDEX(localXID, 0, 101), CW2)-1)"
                    ' method 2 - best guess at start and end of XIDs in column A (faster, guesswork at true scope)
                    .Add Type:=xlExpression, Formula1:= _
                        "=IF(CT2<>"""", COUNTIFS(INDEX(local200, 0, 1), A2, INDEX(local200, 0, 98), CT2," & _
                                                "INDEX(local200, 0, 99), CU2, INDEX(local200, 0, 100), CV2," & _
                                                "INDEX(local200, 0, 101), CW2)-1)"
                End With
                .FormatConditions(.FormatConditions.Count).Interior.ColorIndex = 3
            End With

            'Filter based on column CS is red
            .Columns(97).AutoFilter Field:=1, Criteria1:=vbRed, Operator:=xlFilterCellColor
        End With
    End With

    Application.ScreenUpdating = True
    Debug.Print Timer

End Sub

While not screaming fast, this does the job handily. The 'best guess' is faster than the 'definitive start and finish' but you run the risk of not completely covering the scope of the duplicates in column A. Of course, the offsets (e.g. 100 up and down) that control the scope could be adjusted.

虽然没有快速尖叫,但这很方便。 “最佳猜测”比“确定的开始和结束”更快,但是您冒着不完全覆盖A列中重复范围的风险。当然,控制范围的偏移(例如100上下)可能调整。

#3


0  

Why don't you remove the Indirect() and replace the Countif() function with some stable Row reference. Since Indirect() part is a volatile and instead of using Indirect() you can straight away use some stable row reference like $A$2:$A$50000 which may show some significant change in performance.

为什么不删除Indirect()并用一些稳定的Row引用替换Countif()函数。由于Indirect()部分是易失性的,而不是使用Indirect(),您可以立即使用一些稳定的行引用,如$ A $ 2:$ A $ 50000,这可能会显示性能的一些重大变化。

Or

要么

Use Create Table for your data. Use Table reference in your formula which will work faster than Indirect() reference.

使用Create Table作为数据。在公式中使用表引用,它将比Indirect()引用更快。

Edit

编辑

Your actual formula

你的实际配方

=AND(SUMPRODUCT(($A$2:$A$500=$A2)*($CU$2:$CU$500=$CU2)*($CV$2:$CV$500=$CV2)*($CW$2:$CW$500=$CW2)*($CX$2:$CX$500=$CX2))>1,$CU2 <> "")

Why don't you convert it to Counti(S) with stable reference like the below?

为什么不将它转换为具有稳定参考的Counti(S),如下所示?

=AND(COUNTIFS($A$2:$A$500,$A2,$CU$2:$CU$500,$CU2,$CV$2:$CV$500,$CV2,$CW$2:$CW**$500,$CW2,$CX$2:$CX$500,$CX2)>1,$CU12<>"")

#4


0  

Consider an SQL solution as this is a typical aggregate group by query where you filter for counts greater than 1. To go about your route requires many conditional logic within the loop across all elements of array.

考虑一个SQL解决方案,因为这是一个典型的查询聚合组,您可以在其中筛选大于1的计数。要了解您的路由,需要在循环内跨越所有数组元素的条件逻辑。

While I recommend you simply import your data into a database like Excel's sibling MS Access, Excel can run SQL statements on its own workbook using an ADO connection (not to get into particulars but both Excel and Access uses the same Jet/ACE engine). And one good thing is you seem to be set up to run such a query with the table like structure of named columns.

虽然我建议您只是将数据导入数据库,如Excel的兄弟MS Access,但Excel可以使用ADO连接在自己的工作簿上运行SQL语句(不是为了详细说明,但Excel和Access都使用相同的Jet / ACE引擎)。还有一件好事是你似乎被设置为使用像命名列的结构这样的表来运行这样的查询。

The below example references your fields in a worksheet called Data (Data$) and query outputs to a worksheet called Results (with headers). Change names as needed. Two connection strings are included (one of which is commented out). Hopefully it runs on your end!

下面的示例在名为Data(Data $)的工作表中引用您的字段,并将查询输出引用到名为Results(带标题)的工作表。根据需要更改名称。包括两个连接字符串(其中一个被注释掉)。希望它在你的最后运行!

Sub RunSQL()

    Dim conn As Object, rst As Object
    Dim i As Integer, fld As Object
    Dim strConnection As String, strSQL As String

    Set conn = CreateObject("ADODB.Connection")
    Set rst = CreateObject("ADODB.Recordset")

    ' Connection and SQL Strings
'    strConnection = "DRIVER={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};" _
'                      & "DBQ=C:\Path\To\Workbook.xlsm;"
    strConnection = "Provider=Microsoft.ACE.OLEDB.12.0;" _
                       & "Data Source='C:\Path\To\Workbook.xlsm';" _
                       & "Extended Properties=""Excel 8.0;HDR=YES;"";"

    strSQL = " SELECT [Data$].[Product's XID], [Data$].[Upcharge Criteria 1]," _
                & " [Data$].[Upcharge Criteria 2], [Data$].[Upcharge Type]," _
                & " [Data$].[Upcharge Type], [Data$].[Upcharge Level]" _ 
                & " FROM [Data$]" _
                & " GROUP BY [Data$].[Product's XID], [Data$].[Upcharge Criteria 1]," _
                & " [Data$].[Upcharge Criteria 2], [Data$].[Upcharge Type]," _
                & " [Data$].[Upcharge Type], [Data$].[Upcharge Level]," _
                & " [Data$].[Product's XID]" _
                & " HAVING COUNT(*) > 1;"

    ' Open the db connection
    conn.Open strConnection
    rst.Open strSQL, conn

    ' Column headers
    i = 0
    Worksheets("Results").Range("A1").Activate
    For Each fld In rst.Fields
        ActiveCell.Offset(0, i) = fld.Name
        i = i + 1
    Next fld

    ' Data rows        
    Worksheets("Results").Range("A2").CopyFromRecordset rst

    rst.Close
    conn.Close

End Sub

#5


0  

This might work like a magic trick, but not sure if it would work.

这可能就像一个魔术,但不确定它是否会起作用。

Could you just create another supportive (temporary) column, concatenating all four criteria?

你能创建另一个支持性(临时)列,连接所有四个标准吗?

ZZ_Temp = concatenate (CS; CV; CZ; etc)

ZZ_Temp =连接(CS; CV; CZ;等)

This way, I suppose, you could show/highlight duplicates a lot faster.

这样,我想,你可以更快地显示/突出重复。

#1


4  

You say identify duplicates; I hear Scripting.Dictionary object.

你说识别重复;我听说Scripting.Dictionary对象。

Public Sub lminyDupes()
    Dim d As Long, str As String, vAs As Variant, vCTCWs As Variant
    Dim dDUPEs As Object                      '<~~ Late Binding
    'Dim dDUPEs As New Scripting.Dictionary   '<~~ Early Binding

    Debug.Print Timer
    Application.ScreenUpdating = False '<~~ uncomment this once you are no longer debugging

    'Remove the next line with Early Binding¹
    Set dDUPEs = CreateObject("Scripting.Dictionary")
    dDUPEs.comparemode = vbTextCompare

    With Worksheets("Upcharge") '<~~ you know what worksheet you are supposed to be on
        With .Cells(1, 1).CurrentRegion
            With .Resize(.Rows.Count - 1, .Columns.Count).Offset(1, 0)
                .Columns(97).Interior.Pattern = xlNone  '<~~ reset column CS

                'the following is intended to mimic a CF rule using this formula
                '=AND(COUNTIFS(A:A, A2, CT:CT, CT2, CU:CU, CU2, CV:CV, CV2, CW:CW, CW2)>1, SIGN(LEN(CT2)))

                vAs = .Columns(1).Value2
                vCTCWs = Union(.Columns(98), .Columns(99), .Columns(100), .Columns(101)).Value2

                For d = LBound(vAs, 1) To UBound(vAs, 1)
                    If CBool(Len(vCTCWs(d, 1))) Then
                        'make a key of the criteria values
                        str = Join(Array(vAs(d, 1), vCTCWs(d, 1), vCTCWs(d, 2), vCTCWs(d, 3), vCTCWs(d, 4)), ChrW(8203))
                        If dDUPEs.exists(str) Then
                            'the comboned key exists in the dictionary; append the current row
                            dDUPEs.Item(str) = dDUPEs.Item(str) & Chr(44) & "CS" & d
                        Else
                            'the combined key does not exist in the dictionary; store the current row
                            dDUPEs.Add Key:=str, Item:="CS" & d
                        End If
                    End If
                Next d

                'reuse a variant var to provide row highlighting
                Erase vAs
                For Each vAs In dDUPEs.keys
                    'if there is more than a single cell address, highlight all
                    If CBool(InStr(1, dDUPEs.Item(vAs), Chr(44))) Then _
                        .Range(dDUPEs.Item(vAs)).Interior.Color = vbRed
                Next vAs
            End With
        End With

    End With

    dDUPEs.RemoveAll: Set dDUPEs = Nothing
    Erase vCTCWs

    Application.ScreenUpdating = True
    Debug.Print Timer

End Sub

This seems faster than the formula approach.

这似乎比公式方法更快。


¹ If you plan to convert the late binding of the Scripting.Dictionary object to early binding, you must add Microsoft Scripting Runtime to the VBE's Tools ► References.

¹如果您计划将Scripting.Dictionary对象的后期绑定转换为早期绑定,则必须将Microsoft Scripting Runtime添加到VBE的工具►参考。

#2


1  

Conditional Formatting and Filtering

SUMPRODUCT vs COUNTIFS

SUMPRODUCT vs COUNTIFS

First off, your choice of functions was inappropriate for such a large number of rows coupled with several conditions. A COUNTIFS function can perform many of the same multiple criteria operations that a SUMPRODUCT function can but in typically 25-35% of the calculation load and time. Additionally, full column references can be used without detriment in COUNTIFS as the column references are internally truncated at the limits of the Worksheet.UsedRange property.

首先,您选择的功能不适用于如此大量的行以及多个条件。 COUNTIFS函数可以执行SUMPRODUCT函数可以执行的许多相同的多标准操作,但通常占计算负载和时间的25-35%。此外,可以在COUNTIFS中使用完整列引用而不会产生任何损害,因为列引用在Worksheet.UsedRange属性的限制内部被截断。

Your standard formula can be written with COUNTIFS as,

您的标准公式可以用COUNTIFS编写,

=AND(COUNTIFS(A:A, A2, CT:CT, CT2, CU:CU, CU2, CV:CV, CV2, CW:CW, CW2)>1, CT2<>"")
'... or,
=COUNTIFS(A:A, A2, CT:CT, CT2, CT:CT, "<>", CU:CU, CU2, CV:CV, CV2, CW:CW, CW2)>1

Bringing the non-blank column CT condition directly into the COUNTIFS function actually improved calculation time slightly.

将非空白列CT条件直接引入COUNTIFS函数实际上略微改善了计算时间。

Only Calculate When You Have To

只在必要时计算

The original formula can be broken down to two main conditions.

原始公式可以分解为两个主要条件。

  1. Is the cell in column CT non-blank?
  2. 列CT中的单元格是否为空白?
  3. Do the values in five columns match the same five columns any other row?
  4. 五列中的值是否与任何其他行的相同五列匹配?

A rudimentary IF function halts processing if the condition is not true. If the test for a non-blank cell in column CT is moved into a wrapping IF then the COUNTIFS (the bulk of the calculation) will only be processed if there is a value in the current row's CT column.

如果条件不为真,则基本IF函数将停止处理。如果将CT列中的非空单元格的测试移动到包装IF中,则只有当前行的CT列中存在值时才会处理COUNTIFS(计算的大部分)。

The improved standard formula becomes,

改进的标准公式变为,

=IF(CT2<>"", COUNTIFS(A:A, A2, CT:CT, CT2, CU:CU, CU2, CV:CV, CV2, CW:CW, CW2)>1)

The benefits for this modification depend upon the number of blank cells in column CT. If only 1% of the 15,000 cells are blank, very little improvement will be noticed. However, if 50% of the cells in column CT are typically blank there will be a substantial improvement as you are literally knocking your calculation cycles in half.

这种修改的好处取决于CT列中的空白细胞数量。如果15,000个细胞中只有1%是空白的,那么注意到很少的改善。但是,如果CT列中50%的细胞通常是空白的,那么将会有很大的改进,因为您实际上将计算周期缩减了一半。

Sorting the Data to Limit the Ranges

对数据进行排序以限制范围

By far, the biggest calculation parasite is with the COUNTIFS looking through 15,000 rows of data in five separate columns. If the data was sorted on one or more of the criteria columns then it becomes unnecessary to look through all 15,000 rows for matches to all five columns of criteria.

到目前为止,最大的计算寄生虫是COUNTIFS在五个单独的列中查看15,000行数据。如果数据在一个或多个条件列上排序,则无需查看所有15,000行以匹配所有五列标准。

For the purpose of this exercise, it will be assumed that column A is sorted in an ascending manner. If you want to test the hypothesis discussed here, sort the data now.

出于本练习的目的,将假设列A以升序方式排序。如果要测试此处讨论的假设,请立即对数据进行排序。

The INDEX function does more than return a value; it actually returns a valid cell address. When used in its most common lookup capacity, you see the value returned but in reality, unlike a similar VLOOKUP operation which only return the cell's value, INDEX is returning the actual cell; e.g. =A1, not the 99 that A1 contains. This hyper-functionality can be used to create valid ranges that can be used in other functions. e.g. A2:A9 can also be written as INDEX(A:A, 2):INDEX(A:A, 9).

INDEX函数不只是返回一个值;它实际返回一个有效的单元格地址。当在最常见的查找容量中使用时,您会看到返回的值,但实际上,与仅返回单元格值的类似VLOOKUP操作不同,INDEX返回实际单元格;例如= A1,而不是A1包含的99。此超级功能可用于创建可用于其他功能的有效范围。例如A2:A9也可以写成INDEX(A:A,2):INDEX(A:A,9)。

This functionality cannot be used directly within a Conditional Formatting rule. However, it can be used in a Named Range and a Named Range can be used in a Conditional Formatting rule.

无法在条件格式规则中直接使用此功能。但是,它可以在命名范围中使用,命名范围可以在条件格式规则中使用。

tl;dr

Sub lminyCFrule()

    Debug.Print Timer
    'Application.ScreenUpdating = False '<~~ uncomment this once you are no longer debugging
    On Error Resume Next    '<~~ needed for deleting objects without checking to see if they exist

    With Worksheets("Upcharge") '<~~ you know what worksheet you are supposed to be on
        If .AutoFilterMode Then .AutoFilterMode = False

        'delete any existing defined name called 'localXID' or 'local200'
        With .Parent
            .Names("localXID").Delete
            .Names("local200").Delete
        End With

        'create a new defined name called 'localXID' for CF rule method 1
        .Names.Add Name:="localXID", RefersToR1C1:= _
            "=INDEX('" & .Name & "'!C1:C104, MATCH('" & .Name & "'!RC1, '" & .Name & "'!C1, 0), 0):" & _
             "INDEX('" & .Name & "'!C1:C104, MATCH('" & .Name & "'!RC1, '" & .Name & "'!C1 ), 0)"
        'create a new defined name called 'local200' for CF rule method 2
        .Names.Add Name:="local200", RefersToR1C1:= _
            "=INDEX(Upcharge!C1:C104, MAX(2, ROW()-100), 0):INDEX(Upcharge!C1:C101, ROW()+100, 0)"

        With .Cells(1, 1).CurrentRegion
            'sort on column A in ascending order
             .Cells.Sort Key1:=.Columns(1), Order1:=xlAscending, _
                        Orientation:=xlTopToBottom, Header:=xlYes

            'create a CF rule on column CS
            With .Resize(.Rows.Count - 1, 1).Offset(1, 96)
                With .FormatConditions
                    .Delete
                    ' method 1 and method 2. Only use ONE of these!
                    ' method 1 - definitively start and end of XIDs in column A (slower, no mistakes)
                    '.Add Type:=xlExpression, Formula1:= _
                        "=IF(CT2<>"""", COUNTIFS(INDEX(localXID, 0, 1), A2, INDEX(localXID, 0, 98), CT2," & _
                                                "INDEX(localXID, 0, 99), CU2, INDEX(localXID, 0, 100), CV2," & _
                                                "INDEX(localXID, 0, 101), CW2)-1)"
                    ' method 2 - best guess at start and end of XIDs in column A (faster, guesswork at true scope)
                    .Add Type:=xlExpression, Formula1:= _
                        "=IF(CT2<>"""", COUNTIFS(INDEX(local200, 0, 1), A2, INDEX(local200, 0, 98), CT2," & _
                                                "INDEX(local200, 0, 99), CU2, INDEX(local200, 0, 100), CV2," & _
                                                "INDEX(local200, 0, 101), CW2)-1)"
                End With
                .FormatConditions(.FormatConditions.Count).Interior.ColorIndex = 3
            End With

            'Filter based on column CS is red
            .Columns(97).AutoFilter Field:=1, Criteria1:=vbRed, Operator:=xlFilterCellColor
        End With
    End With

    Application.ScreenUpdating = True
    Debug.Print Timer

End Sub

While not screaming fast, this does the job handily. The 'best guess' is faster than the 'definitive start and finish' but you run the risk of not completely covering the scope of the duplicates in column A. Of course, the offsets (e.g. 100 up and down) that control the scope could be adjusted.

虽然没有快速尖叫,但这很方便。 “最佳猜测”比“确定的开始和结束”更快,但是您冒着不完全覆盖A列中重复范围的风险。当然,控制范围的偏移(例如100上下)可能调整。

#3


0  

Why don't you remove the Indirect() and replace the Countif() function with some stable Row reference. Since Indirect() part is a volatile and instead of using Indirect() you can straight away use some stable row reference like $A$2:$A$50000 which may show some significant change in performance.

为什么不删除Indirect()并用一些稳定的Row引用替换Countif()函数。由于Indirect()部分是易失性的,而不是使用Indirect(),您可以立即使用一些稳定的行引用,如$ A $ 2:$ A $ 50000,这可能会显示性能的一些重大变化。

Or

要么

Use Create Table for your data. Use Table reference in your formula which will work faster than Indirect() reference.

使用Create Table作为数据。在公式中使用表引用,它将比Indirect()引用更快。

Edit

编辑

Your actual formula

你的实际配方

=AND(SUMPRODUCT(($A$2:$A$500=$A2)*($CU$2:$CU$500=$CU2)*($CV$2:$CV$500=$CV2)*($CW$2:$CW$500=$CW2)*($CX$2:$CX$500=$CX2))>1,$CU2 <> "")

Why don't you convert it to Counti(S) with stable reference like the below?

为什么不将它转换为具有稳定参考的Counti(S),如下所示?

=AND(COUNTIFS($A$2:$A$500,$A2,$CU$2:$CU$500,$CU2,$CV$2:$CV$500,$CV2,$CW$2:$CW**$500,$CW2,$CX$2:$CX$500,$CX2)>1,$CU12<>"")

#4


0  

Consider an SQL solution as this is a typical aggregate group by query where you filter for counts greater than 1. To go about your route requires many conditional logic within the loop across all elements of array.

考虑一个SQL解决方案,因为这是一个典型的查询聚合组,您可以在其中筛选大于1的计数。要了解您的路由,需要在循环内跨越所有数组元素的条件逻辑。

While I recommend you simply import your data into a database like Excel's sibling MS Access, Excel can run SQL statements on its own workbook using an ADO connection (not to get into particulars but both Excel and Access uses the same Jet/ACE engine). And one good thing is you seem to be set up to run such a query with the table like structure of named columns.

虽然我建议您只是将数据导入数据库,如Excel的兄弟MS Access,但Excel可以使用ADO连接在自己的工作簿上运行SQL语句(不是为了详细说明,但Excel和Access都使用相同的Jet / ACE引擎)。还有一件好事是你似乎被设置为使用像命名列的结构这样的表来运行这样的查询。

The below example references your fields in a worksheet called Data (Data$) and query outputs to a worksheet called Results (with headers). Change names as needed. Two connection strings are included (one of which is commented out). Hopefully it runs on your end!

下面的示例在名为Data(Data $)的工作表中引用您的字段,并将查询输出引用到名为Results(带标题)的工作表。根据需要更改名称。包括两个连接字符串(其中一个被注释掉)。希望它在你的最后运行!

Sub RunSQL()

    Dim conn As Object, rst As Object
    Dim i As Integer, fld As Object
    Dim strConnection As String, strSQL As String

    Set conn = CreateObject("ADODB.Connection")
    Set rst = CreateObject("ADODB.Recordset")

    ' Connection and SQL Strings
'    strConnection = "DRIVER={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};" _
'                      & "DBQ=C:\Path\To\Workbook.xlsm;"
    strConnection = "Provider=Microsoft.ACE.OLEDB.12.0;" _
                       & "Data Source='C:\Path\To\Workbook.xlsm';" _
                       & "Extended Properties=""Excel 8.0;HDR=YES;"";"

    strSQL = " SELECT [Data$].[Product's XID], [Data$].[Upcharge Criteria 1]," _
                & " [Data$].[Upcharge Criteria 2], [Data$].[Upcharge Type]," _
                & " [Data$].[Upcharge Type], [Data$].[Upcharge Level]" _ 
                & " FROM [Data$]" _
                & " GROUP BY [Data$].[Product's XID], [Data$].[Upcharge Criteria 1]," _
                & " [Data$].[Upcharge Criteria 2], [Data$].[Upcharge Type]," _
                & " [Data$].[Upcharge Type], [Data$].[Upcharge Level]," _
                & " [Data$].[Product's XID]" _
                & " HAVING COUNT(*) > 1;"

    ' Open the db connection
    conn.Open strConnection
    rst.Open strSQL, conn

    ' Column headers
    i = 0
    Worksheets("Results").Range("A1").Activate
    For Each fld In rst.Fields
        ActiveCell.Offset(0, i) = fld.Name
        i = i + 1
    Next fld

    ' Data rows        
    Worksheets("Results").Range("A2").CopyFromRecordset rst

    rst.Close
    conn.Close

End Sub

#5


0  

This might work like a magic trick, but not sure if it would work.

这可能就像一个魔术,但不确定它是否会起作用。

Could you just create another supportive (temporary) column, concatenating all four criteria?

你能创建另一个支持性(临时)列,连接所有四个标准吗?

ZZ_Temp = concatenate (CS; CV; CZ; etc)

ZZ_Temp =连接(CS; CV; CZ;等)

This way, I suppose, you could show/highlight duplicates a lot faster.

这样,我想,你可以更快地显示/突出重复。