Announcement

Collapse
No announcement yet.

Finding duplicate file names with different extensions, not duplicate files.

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Finding duplicate file names with different extensions, not duplicate files.

    I just started working on this issue, so rather than pounding my head any longer I thought I'd give my friends here a chance to help a brother out...

    I am in the midst of re-ripping my CD collection, about 400 CD's. I had done them a couple of years ago in .ogg format. Now I am re-ripping them into .flac format. They are organized in this directory structure: /Music/Artist/Album/Songs. The songs are titled with just the number (in order of the CD) and title like "10 - Some Song.ogg" and this rest of the info is handled by directory structure and ID tags.

    Here's the quandary:

    With this many files (all the above plus several thousand more) in various formats depending on their source. I didn't want to go through every sub-directory to weed out manually every duplicate title. The normal utilities, like FSLINT will only tag the as duplicate if they have the same size, which these obviously don't.

    At this point, I'm planning on using find to list all the files *.ogg and *.flac and then using kompare to create a non-unique list and then, finally, deleting the *.ogg dupes.

    Anyone have a more elegant solution?

    Please Read Me

    #2
    You can use the "uniq" command line w/ same fancy find-fu to do it a little simpler. The -c option on uniq counts the number of occurences of a string in your data.

    Something along the lines of:
    Code:
    find . -name * -printf %f | <strip off the extension> | sort | uniq -c (or -d)
    Last edited by tnorris; Feb 21, 2012, 07:47 PM.

    Comment


      #3
      Thinking out loud...

      * From the top-level folder
      * find -iname '*.ogg' > musicfilelist1
      * find -iname '*.flac' >> musicfilelist1
      * sed something-that-drops-the-extension < musicfilelist1 | sort -u > musicfilelist2
      * for j in $(< musicfilelist2); do rm -rfv $j.ogg; done

      Not very refined, but maybe a start?

      Comment


        #4
        BTW here's the sed command to remove the last characters
        Code:
        [LEFT][COLOR=#000000]|sed 's/.\{4\}$//'[/COLOR][/LEFT]
        I added this to the -iname line like this

        Code:
        find -iname '*.ogg' |sed 's/.\{4\}$//' >list1
        find -iname '*.flac' |sed 's/.\{5\}$//' >>list1
        This produces the list of filenames without the extensions. So now I want to remove the unique names from the list, then remove the dups from the list, then delete the remaining names with .ogg added back on.

        Unfortunately, the sort command is not working as advertised. The list it's producing has all the filenames in it, only stripping out the dups. Thus all the .ogg files will be deleted rather than just the ones that have corresponding flac files.

        Am I making this too complicated? Is there a simpler solution?

        UPDATE:

        I hadn't tried the find command yet, but I used the uniq with the -d with steves suggestions and that produces the correct list. Now I can just put it into a little script and run it - thanks guys!
        Last edited by oshunluvr; Feb 25, 2012, 12:19 PM.

        Please Read Me

        Comment


          #5
          Hmmm, the remove command doesn't work because the filenames all have spaces in them.

          For example, for the name

          ./Yes/90125/09 - Hearts


          The "$j.ogg" output is

          ./Yes/90125/09.ogg

          Please Read Me

          Comment


            #6
            What about
            Code:
            rm -rfv '$j.ogg'

            Comment


              #7
              That results in

              Code:
              rm: cannot remove `$j.ogg': No such file or directory
              However, what does work (just got it) is to change the Internal Field Separator to "newline" before running the rm command. So the new command is

              Code:
              IFS=$'\n' && for j in $(< removelist); do rm -rfvi $j.ogg; done
              Thanks again for the help guys

              Please Read Me

              Comment


                #8
                Originally posted by oshunluvr View Post
                rm: cannot remove `$j.ogg': No such file or directory
                Yeah...just realized now that it would have worked with double, not single, quotes:

                Code:
                rm -rfv "$j.ogg"
                Originally posted by oshunluvr View Post
                However, what does work (just got it) is to change the Internal Field Separator to "newline" before running the rm command.
                Neat discovery.

                I love these little shell tricks. So much more capability than DOS...

                Comment

                Working...
                X