SudokuSolver Forum

A forum for Sudoku enthusiasts to share puzzles, techniques and software
It is currently Sat Apr 27, 2024 10:00 pm

All times are UTC




Post new topic Reply to topic  [ 15 posts ]  Go to page 1, 2  Next
Author Message
PostPosted: Tue May 20, 2008 2:30 pm 
Offline
Grand Master
Grand Master
User avatar

Joined: Thu Apr 24, 2008 4:27 pm
Posts: 791
What is a good way to find duplicates in a file of 81-character puzzle strings, or between two such files? There are too many puzzles for a visual check to be feasible.


Top
 Profile  
Reply with quote  
PostPosted: Tue May 20, 2008 7:06 pm 
Offline
Expert
Expert

Joined: Mon Apr 21, 2008 6:23 am
Posts: 113
Location: Germany
enxio27 wrote:
What is a good way to find duplicates in a file of 81-character puzzle strings, or between two such files?

If you have access to a Linux (or Unix) installation, and there are not expected to be many duplicates, then you could try:

Code:
cat <filelist> | sort | uniq -d

where <filelist> is a whitespace-delimited list of one or more names of files containing 1 puzzle per line.

This would output a list of duplicates to stdout, which you could then search for (in the original files) and prune manually.

If you've only got access to Windows, then you'll have to look for another way.

_________________
Cheers,
Mike


Top
 Profile  
Reply with quote  
PostPosted: Tue May 20, 2008 7:25 pm 
Offline
Expert
Expert

Joined: Tue Apr 22, 2008 2:07 am
Posts: 107
I might be tempted to read them into Excel and then sort. Scrolling down through the sort, one could spot dupes.


Top
 Profile  
Reply with quote  
PostPosted: Tue May 20, 2008 7:27 pm 
Offline
Grand Master
Grand Master
User avatar

Joined: Thu Apr 24, 2008 4:27 pm
Posts: 791
mhparker wrote:
If you have access to a Linux (or Unix) installation, and there are not expected to be many duplicates,

There shouldn't be, but. . .

mhparker wrote:
then you could try:

Code:
cat <filelist> | sort | uniq -d

where <filelist> is a whitespace-delimited list of one or more names of files containing 1 puzzle per line.

This would output a list of duplicates to stdout, which you could then search for (in the original files) and prune manually.

Hmmm. . . I've been needing to reinstall Red Hat anyway. . .


Last edited by enxio27 on Tue May 20, 2008 7:30 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Tue May 20, 2008 7:28 pm 
Offline
Grand Master
Grand Master
User avatar

Joined: Thu Apr 24, 2008 4:27 pm
Posts: 791
nj3h wrote:
I might be tempted to read them into Excel and then sort. Scrolling down through the sort, one could spot dupes.

That's the way I've been doing it, but I have too many to do it that way--just takes too long. That's why I was hoping for something more automated.


Last edited by enxio27 on Sat May 24, 2008 2:00 am, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Tue May 20, 2008 8:06 pm 
Offline
Grand Master
Grand Master
User avatar

Joined: Mon Apr 21, 2008 10:32 am
Posts: 868
In Excel you can use an Advanced Filter (Data > Filter > Advanced Filter) to remove duplicates from a sorted or unsorted column:

Image

_________________
Quis custodiet ipsos custodes?
Normal: [D  Y-m-d,  G:i]     PM->email: [D, d M Y H:i:s]


Last edited by Børge on Fri Jun 13, 2008 8:27 pm, edited 1 time in total.

Top
 Profile  
Reply with quote  
PostPosted: Sat May 24, 2008 2:03 am 
Offline
Grand Master
Grand Master
User avatar

Joined: Thu Apr 24, 2008 4:27 pm
Posts: 791
Børge wrote:
In Excel you can use an Advanced Filter (Data > Filter > Advanced Filter) to remove duplicates from a sorted or unsorted column:

I tried that out, but evidently I'm not understanding how it works. From the Help file, it appears that it makes a listing somewhere of all the unique ones, but I couldn't figure out where Excel puts the listing or what it does with it.


Top
 Profile  
Reply with quote  
PostPosted: Sat May 24, 2008 4:45 am 
Offline
Grand Master
Grand Master
User avatar

Joined: Wed Apr 23, 2008 5:29 am
Posts: 302
Location: Sydney, Australia
For example, if you put the column list [1,2,3,9,9,4,2,5,7,3] in the cells A1:A10. Now click menu Data -> Filter -> Advanced Filter. Check both "Copy to another location" and "Unique records only". Put "A1:A10" into the "List range:" entry and put "B1:B10" into the "Copy to:" entry. Click "OK". Voila! :idea:

_________________
ADYFNC HJPLI BVSM GgK Oa m


Top
 Profile  
Reply with quote  
PostPosted: Sat May 24, 2008 12:25 pm 
Offline
Grand Master
Grand Master
User avatar

Joined: Thu Apr 24, 2008 4:27 pm
Posts: 791
udosuk wrote:
For example, if you put the column list [1,2,3,9,9,4,2,5,7,3] in the cells A1:A10. Now click menu Data -> Filter -> Advanced Filter. Check both "Copy to another location" and "Unique records only". Put "A1:A10" into the "List range:" entry and put "B1:B10" into the "Copy to:" entry. Click "OK". Voila! :idea:

OK, thanks! Can it check for dupes through more than one sheet (presumably by highlighting multiple sheets)?

I don't see a way, though, to check just one column for duplicates but copy the entire row with the unique record to, say, a second sheet. I want to keep the data in the other columns (puzzle number, for instance) from becoming detached from the column with the puzzle data itself. I had hoped that "Criteria range" could be used to do that, but apparently not.

Is 65536 the maximum number of lines Excel can handle in a worksheet? Is the FUNCTIONAL limit smaller?


Top
 Profile  
Reply with quote  
PostPosted: Sat May 24, 2008 12:47 pm 
Offline
Grand Master
Grand Master
User avatar

Joined: Mon Apr 21, 2008 10:32 am
Posts: 868
enxio27 wrote:
OK, thanks! Can it check for dupes through more than one sheet (presumably by highlighting multiple sheets)?
Just try it with a test workbook and you will find it out.
Even if this is possible I would strongly advice against doing so, unless you are capable of remembering how far you have processed X sheets. See next answer.

enxio27 wrote:
I don't see a way, though, to check just one column for duplicates but copy the entire row with the unique record to, say, a second sheet. I want to keep the data in the other columns (puzzle number, for instance) from becoming detached from the column with the puzzle data itself.
DO NOT check "Copy to another location" and leave "Copy to:" empty. Excel then hides all rows having duplicates. Now select the complete sheet (Ctrl+A) and copy (Ctr+C) and paste it to a new blank one. The hidden rows are not included, i.e. selected and copied

enxio27 wrote:
Is 65536 the maximum number of lines Excel can handle in a worksheet?
Yes, for all versions prior to Excel 2007. Excel 2007 can have 1,048,576 lines.

enxio27 wrote:
Is the FUNCTIONAL limit smaller?
I do not understand your question!

_________________
Quis custodiet ipsos custodes?
Normal: [D  Y-m-d,  G:i]     PM->email: [D, d M Y H:i:s]


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 15 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 26 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group