SudokuSolver Forum

A forum for Sudoku enthusiasts to share puzzles, techniques and software
It is currently Sat Apr 27, 2024 9:48 pm

All times are UTC




Post new topic Reply to topic  [ 15 posts ]  Go to page Previous  1, 2
Author Message
PostPosted: Sat May 24, 2008 3:08 pm 
Offline
Grand Master
Grand Master
User avatar

Joined: Thu Apr 24, 2008 4:27 pm
Posts: 791
Børge wrote:
enxio27 wrote:
Can it check for dupes through more than one sheet (presumably by highlighting multiple sheets)?

Even if this is possible I would strongly advice against doing so, unless you are capable of remembering how far you have processed X sheets.

What I was wanting to do was to compare two puzzle collection files (imported into two sheets of a worksheet) to find duplicates between them, but I suppose I could just as easily put them (the smaller ones, anyway) into the same worksheet. I have several larger ones, though, that I won't be able to do that with (too many rows).

Børge wrote:
DO NOT check "Copy to another location" and leave "Copy to:" empty. Excel then hides all rows having duplicates. Now select the complete sheet (Ctrl+A) and copy (Ctr+C) and paste it to a new blank one. The hidden rows are not included, i.e. selected and copied

Hmmm. . . That sounds very good! I wasn't aware that leaving the copy option unchecked would hide the duplicates (the "Help" file isn't very clear on that).

Børge wrote:
enxio27 wrote:
Is the FUNCTIONAL limit smaller?
I do not understand your question!

In other words, would a spreadsheet with several worksheets of, say 50,000 lines each, with a couple of 81-character columns in each worksheet, be impossible (or excessively slow) to process for dupes and such? (I've already discovered that Excel 2003 can't handle subtraction of 81-digit numbers. . . ;) )


Top
 Profile  
Reply with quote  
PostPosted: Sat May 24, 2008 4:39 pm 
Offline
Grand Master
Grand Master
User avatar

Joined: Mon Apr 21, 2008 10:32 am
Posts: 868
enxio27 wrote:
What I was wanting to do was to compare two puzzle collection files (imported into two sheets of a worksheet) to find duplicates between them, but I suppose I could just as easily put them (the smaller ones, anyway) into the same worksheet. I have several larger ones, though, that I won't be able to do that with (too many rows).
You obviously have FAR TOO many puzzles (> 500,000) What do you want with that many puzzles? It is more than you can solve and push in one lifetime! I can now mass generate puzzles (>30 per second) solvable with either Naked Singles or Naked/Hidden Singles only. So using this program you could easily add a couple of million puzzles to your collection. :cheesey:

I do not think that inter-sheet removal of duplicates works, but trying is the best answer ... see comment to next quote. If inter-sheet removal of duplicates do not work, you could try Excel 2007, but 1,048,576 rows is probably also too little. :geek:

enxio27 wrote:
Hmmm. . . That sounds very good! I wasn't aware that leaving the copy option unchecked would hide the duplicates (the "Help" file isn't very clear on that).
Creative thinking and trying is often the best way, and faster than asking. ;)

enxio27 wrote:
In other words, would a workbook (NOT spreadsheet) with several worksheets (worksheet = spreadsheet) of, say 50,000 lines each, with a couple of 81-character columns in each worksheet, be impossible (or excessively slow) to process for dupes and such?
I do not have that kind of puzzle collections, so I cannot try this. Please see comment to previous quote.

enxio27 wrote:
(I've already discovered that Excel 2003 can't handle subtraction of 81-digit numbers. . . ;) )
Of course excel can subtract 81-digit numbers, but not with a precision of 80 decimals, which you are probably looking for. :mrgreen: :study:

_________________
Quis custodiet ipsos custodes?
Normal: [D  Y-m-d,  G:i]     PM->email: [D, d M Y H:i:s]


Top
 Profile  
Reply with quote  
PostPosted: Sat May 31, 2008 2:20 am 
Offline
Grand Master
Grand Master
User avatar

Joined: Thu Apr 24, 2008 4:27 pm
Posts: 791
Børge wrote:
Creative thinking and trying is often the best way, and faster than asking. ;) [/quote]
I did try it, but it didn't do anything that I could see, so I figured I was not understanding something correctly. Thanks for your help!


Last edited by enxio27 on Sat May 31, 2008 2:44 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sat May 31, 2008 2:32 am 
Offline
Grand Master
Grand Master
User avatar

Joined: Thu Apr 24, 2008 4:27 pm
Posts: 791
Børge wrote:
DO NOT check "Copy to another location" and leave "Copy to:" empty. Excel then hides all rows having duplicates. Now select the complete sheet (Ctrl+A) and copy (Ctr+C) and paste it to a new blank one. The hidden rows are not included, i.e. selected and copied


Perfect! Thanks!

I'll probably wait on trying to look for duplicates through multiple sheets, though.


Top
 Profile  
Reply with quote  
PostPosted: Sat May 31, 2008 2:34 pm 
Offline
Grand Master
Grand Master
User avatar

Joined: Mon Apr 21, 2008 10:32 am
Posts: 868
The Excel macro below should hopefully do everything you want. HERE is the file enxio27.xls with the macro.
The macro works on ALL ordinary worksheets ("Worksheet" objects) in ALL open Workbooks in AN Excel Application Session.
At the bottom of EACH worksheet at least ONE row must be empty, otherwise the macro will throw an uncaught exception.

The macro does the following two things:
  1. Sorts all worksheets. See below for details.
  2. Removes ALL ROWS with duplicates in all Worksheets in all Workbooks in its own Excel session.

Do the following:
  1. Download the workbook and open it.
  2. Hit ALT+F11 to open the VBA editor.
  3. Modify the two global constants "data_column" and "first_data_row" for your data. In your case "data_column" is the column with the Sudoku codes and "first_data_row" is the first row with a Sudoku code, in case you have header rows. In the examples in the rest of this post I will use the default values for "data_column" and "first_data_row". All worksheets must have the same "data_column" and "first_data_row"!!
  4. Close and save the Workbook.
  5. Open the Workbook enxio27.xls again.
  6. Open all other Workbooks with worksheets whose duplicate lines are to be removed. Make a backup copy of these workbooks first.
  7. Hit Ctrl+Q and with your big worksheets let the PC run overnight.
  8. Before using my macro on your real data, test it on a test case first.

THE SORTING:
In all ordinary worksheets ("Worksheet" objects) the area bound by column A + "first_data_row" AND 'rightmost column with data + bottommost row with data' is sorted using the values in column C ("data_column") as keys.

Code:
Option Explicit



Private Const data_column      As String = "C"
Private Const first_data_row   As String = "5"



Public Sub ENXIO_27()

    Dim all_current_Excel_session_data_cells()      As Range
    Dim array_index                                 As Long
    Dim array_index_data_cell_with_smallest_value   As Long
    Dim data_cell                                   As Range
    Dim data_cell_with_smallest_value               As Range
    Dim Excel_session_workbook                      As Workbook
    Dim Excel_session_workbooks_worksheet           As Object
    Dim num_data_cells_with_empty_value             As Long
    Dim num_ordinary_worksheets                     As Long
    Dim sort_area                                   As Range



    ' Count the number of ordinary worksheets ("Worksheet" objects) in all Workbooks in this Excel session.
    For Each Excel_session_workbook In Application.Workbooks

        For Each Excel_session_workbooks_worksheet In Excel_session_workbook.Worksheets
            If TypeName(Excel_session_workbooks_worksheet) = "Worksheet" Then num_ordinary_worksheets = num_ordinary_worksheets + 1
        Next

    Next

    If num_ordinary_worksheets = 0 Then Exit Sub

    ' Set an element in "all_current_Excel_session_data_cells()" to the first "data_cell" in every "Worksheet" in all Workbooks in this Excel session.
    ReDim all_current_Excel_session_data_cells(1 To num_ordinary_worksheets)

    For Each Excel_session_workbook In Application.Workbooks

        For Each Excel_session_workbooks_worksheet In Excel_session_workbook.Worksheets

            If TypeName(Excel_session_workbooks_worksheet) = "Worksheet" Then
                ' Set next free element in "all_current_Excel_session_data_cells()" = "Excel_session_workbooks_worksheet.Range(data_column & first_data_row)"
                array_index = array_index + 1
                Set data_cell = Excel_session_workbooks_worksheet.Range(data_column & first_data_row)
                Set all_current_Excel_session_data_cells(array_index) = data_cell

                ' In "Excel_session_workbooks_worksheet" sort the area bound by Range("A" & first_data_row) AND "data_cell.SpecialCells(xlCellTypeLastCell)"
                ' using the values in "data_column" as keys.
                Set sort_area = Excel_session_workbooks_worksheet.Range(Excel_session_workbooks_worksheet.Range("A" & first_data_row), data_cell.SpecialCells(xlCellTypeLastCell))
                If Not (Application.Intersect(data_cell, sort_area) Is Nothing) Then Call sort_area.Sort(key1:=data_cell, Orientation:=xlSortColumns)
            End If

        Next

    Next

    ' Remove all duplicates all in all Worksheets in all Workbooks in this Excel session.
    Application.ScreenUpdating = False

    Do While True
        num_data_cells_with_empty_value = 0
        array_index_data_cell_with_smallest_value = 1
        Set data_cell_with_smallest_value = all_current_Excel_session_data_cells(1)

        ' Find the "data_cell" with the smallest value of "all_current_Excel_session_data_cells()".
        For array_index = 1 To num_ordinary_worksheets

            If all_current_Excel_session_data_cells(array_index).Value = vbNullString Then
                num_data_cells_with_empty_value = num_data_cells_with_empty_value + 1
            ElseIf data_cell_with_smallest_value.Value = vbNullString Then
                array_index_data_cell_with_smallest_value = array_index
                Set data_cell_with_smallest_value = all_current_Excel_session_data_cells(array_index)
            ElseIf all_current_Excel_session_data_cells(array_index).Value < data_cell_with_smallest_value.Value Then
                array_index_data_cell_with_smallest_value = array_index
                Set data_cell_with_smallest_value = all_current_Excel_session_data_cells(array_index)
            End If

        Next

        ' Test if done.
        If num_data_cells_with_empty_value = num_ordinary_worksheets Then Exit Do

        ' Increase "all_current_Excel_session_data_cells(array_index_data_cell_with_smallest_value)" one Row, so that it does not get deleted and the algorithm advances.
        Set all_current_Excel_session_data_cells(array_index_data_cell_with_smallest_value) = data_cell_with_smallest_value.Offset(RowOffset:=1, ColumnOffset:=0)

        ' Remove all Rows in all Worksheets in all Workbooks in this Excel session whose "current_data_cell.Value = data_cell_with_smallest_value.Value".
        For array_index = 1 To num_ordinary_worksheets

            If all_current_Excel_session_data_cells(array_index).Value = data_cell_with_smallest_value.Value Then
                Set data_cell = all_current_Excel_session_data_cells(array_index)

                If data_cell.Worksheet Is data_cell_with_smallest_value.Worksheet Then
                    Set all_current_Excel_session_data_cells(array_index) = data_cell.Offset(RowOffset:=-1, ColumnOffset:=0)
                Else
                    Set all_current_Excel_session_data_cells(array_index) = data_cell.Offset(RowOffset:=1, ColumnOffset:=0)
                End If

                Call data_cell.Worksheet.Rows(data_cell.Row).Delete
            End If

        Next

    Loop

    Application.ScreenUpdating = True
End Sub

_________________
Quis custodiet ipsos custodes?
Normal: [D  Y-m-d,  G:i]     PM->email: [D, d M Y H:i:s]


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 15 posts ]  Go to page Previous  1, 2

All times are UTC


Who is online

Users browsing this forum: No registered users and 25 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group