Announcement

Collapse
No announcement yet.

Photo sorting and finding duplicates - and logical photo storage...

Collapse
This topic is closed.
X
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    Photo sorting and finding duplicates - and logical photo storage...

    I used a couple new (to me) command line tools this week, so I thought I'd introduce them. Feel free to skip ahead to the THE TOOLS: section if you don't care about the whys or hows.

    BACKGROUND STORY:

    I have a family server and have had for many years. We, the whole family, have been basically just dumping digital photos onto it with not much regard to any sort of organization or duplication. My wife typically copies her photos to her laptop, which I backup to the server at regular intervals. I, before each new outing or family event, clear the desired cameras' photo cards onto the server. My sons will occasionally add their stuff as well and we often borrowed the memory cards from friends and family members to capture all their shots of events and get-togethers.

    Last year I began the long sloooowwww process of scanning ancient family slides from 1960 to about 1977. There are thousands of them and it's tough to do more than a handful in a day.

    Bottom line: I had built up to 68GBs of photos and videos and viewing them in an enjoyable way was almost impossible. What's the point of even taking photos if you can't retrieve them conveniently for viewing when your Mom comes over?

    FIRST ATTEMPT AT ORGANIZATION:

    Initially, I believed separation was king. I divided all the photos into years and months by their file dates. Made sense at the time. However during viewing sessions, I would think "Did we go to Paris in July or August of 2008? Was the Alaska cruise 2007 or 2009?" This also left several photos languishing in folders all their own. A single photo taken in May nested three-deep into the file structure - a needless state.

    Clearly, year and month was insufficient. Add to that the fact that a couple of the camera's were not properly date stamped or the move and copy processes had re-dated the file dates. Slides have no date at all except the stamp on the cardboard frame showing the month and year they were developed. Plus - many photos were not of the "family" variety, but of utilitarian purposes: photos of cars or house designs we like, objects for later uses in decorating, etc. and scans are dated the date of the scan, bot the source of the photo.

    In short:I had a big ol' mess.

    THE ORGANIZATION REVISION:
    I settled on keeping the year as the primary sort criterion. I then judged major events as the secondary sort - vacations, weddings, etc. I figured remembering the year we went to Alaska was easy enough to deduce, but the month not. Non-family type photos got their own primary folders: properties, cars, humorous, projects, etc. The remaining day-to-day family snaps were left all bundled together in the primary (year) folder. Additionally, I removed all the videos for later stitching together, editing, and re-sorting (this was 25GB of the total).

    This left me with this basic directory tree:

    /shared/Pictures
    /2007
    /Alaska
    /Thanksgiving

    This folder arrangement seems much more useful. I can jump right to our Paris trip or the Alaska cruise with little delay and re-playing the entire year of 2004 when the baby first came home is easy too.

    This re-alignment also revealed thousands of duplicates that had been double-stored - half of them in the wrong folder or left in a camera folder and not sorted at all - but it also revealed the mis-dating of so many photos. Those dups were easily discovered if they hadn't been renamed or re-dated. Another issue is each camera uses it's own file naming system. A year might be spread out into six different groups of photos or more - not a very useful viewing order.

    A re-naming scheme was also sorely needed. One that made sense and aided organization. The actual name of a photo isn't as important as that it in the right place and sequence. The photo slide-show software doesn't care what the file name is, just the order.

    NEW NAMES:
    First: the re-naming. The only naming scheme that made sense to me was a date/time label. This would serve to keep the photos in their proper year folder and in the proper sequence in that folder. Here's an example of what I wanted:

    070812_132244.jpg

    It looks totally random until you decipher it. The first segment is YEAR/MONTH/DAY and the second is HOUR/MINUTE/SECOND. So this photo was taken August 12th, 2007 at 1:22 and 44 seconds pm. I don't care that it's not a descriptive name. It is a very useful name for my needs. In theory, this sorts and orders all the photos exactly how I want regardless of source (assuming of course the clocks in the camera are reasonably correct).

    Here's the cool part: virtually all digital cameras include this information with the jpg in a data segment called exif data. But how to get that data to replace the file name without typing it in? I have thousands of photos!

    CULL THE DUPLICATES:
    The re-naming and changes to the folder scheme revealed about 2000 dupes! Could there be more? Probably.

    THE TOOLS:

    exiftool

    This tool is extremely detailed and I won't go into anything here other than what I'm using it for. I recommend you do your own research to get an inkling of the hundreds of things this tool can do. Start here as I did. Credit for the info below goes to this webpage.

    To rename my photo's the command I used was:

    exiftool '-filename<CreateDate' -d %y%m%d_%H%M%S%%-c.%%le -r -ext jpg /shared/Pictures


    Here's the breakdown of each element:
    • '-filename<CreateDate' means rename the image file using the image's creation date and time.
    • -d means "Set format for date/time values".
    • %y%m%d_ means the first part of the new file name should be composed of the last two digits of the creation-date year, followed by the month and day, both represented by two digits. The underscore _ means put in an underscore after the date part of the file name.
    • %H%M%S means add the hour, minute, and second of the creation time, all represented by two digits.
    • %%-c means that if two images have the same file name up to this point in the naming process, it will automatically add an incremented number to the end to give each image a unique name. Note the doubled %% is required to preventing "escaping" the command. The "-" before the "c" isn't really necessary, but it puts a dash before the copy number.
    • .%%le means keep the original file name extension, but make it lower-case if it was originally upper-case, a nice option when cameras insist on using "JPG" instead of "jpg". (If you prefer upper-case extensions, then use .%. If you prefer to keep the original case intact, use .%%e.)
    • -ext jpg means only rename files with the "jpg" extension. To rename all image files in the source folder, don't specify any extensions or you can add other extensions by adding more -ext switches followed by your desired extension, one -ext for each extension.
    • -r means recurse through all sub-directories below the target folder.
    • /shared/Pictures is the absolute path to the top folder holding all my images to be renamed. Use your own path, of course.


    Unless the jpg has missing or malformed exif data, this command will do the renaming. In my case, this renamed about 80% of all my jpgs, uncovered 2000ish dupes, and significantly reduced the remaining work ahead. Eventually, I will be renaming the remaining scans and photos manually. exiftool will also allow you to re-date/time the exif date or even incrementally add time to the exif data so the date/time stamps are correct.

    findimagedupes

    This command has only a tenth of the options the exiftool has but performs an amazing amount of work. Basically, it resizes, blurs, re-sizes again, and then compares (all in tmp or memory - no files are harmed) all the given images to one another and then makes a guess as to which ones are duplicates. As you can imaging, this takes some time. I let it run on the entire Pictures folder while I was at work, dumping the results into a text file for later use. I have no idea how long this took; but 10 hours later when I returned it was done:

    findimagedupes -R -- . >alldupes.txt


    I ran this while in my /shared/Pictures folder so all my photos were compared to each other. I then dumped the result file alldupes.txt (1000+ lines) into a spreadsheet so I could sort the results easily. I am now manually comparing the list to remove every last duplicate. The downside of this command is that similar photos are matched as dupes. Based on my file naming scheme, most of these erroneous dupe reports are easily discarded and again I have saved myself days and weeks of manual file comparison.

    THE RESULTS (SO FAR):

    Now I have about 99% of the dupes deleted and about 80% the photos sensibly sorted and in order. Already, we can sit at the living room TV and enjoy a slide show and I recovered over 20GBs of drive space. I still have more to do - many files had no or bad exif data - but it's in the order of days' worth or work rather than months' or years' worth.

    Good luck with your family photo project! Please post solutions you used if you have had similar problems. I'd love to compare results.
    Last edited by oshunluvr; May 23, 2013, 11:31 AM.

    Please Read Me

    #2
    Thanks oshunluvr for this. It sure has been a lot of work for you do to that!

    I'm in the process of scanning heaps of old photos my Dad took over the years going back to about 1937. He was an avid photographer and it's going to take quite a while. My brother and I have decided to scan all our old photos into digital format for prosterity also. So I was very interested in your solutions and may employ some of them myself. I also have been using the excellent exiftool and can vouch for it's powerful array of options for manipulating photos.
    Desktop PC: Intel Core-i5-4670 3.40Ghz, 16Gb Crucial ram, Asus H97-Plus MB, 128Gb Crucial SSD + 2Tb Seagate Barracuda 7200.14 HDD running Kubuntu 18.04 LTS and Kubuntu 14.04 LTS (on SSD).
    Laptop: HP EliteBook 8460p Core-i5-2540M, 4Gb ram, Transcend 120Gb SSD, currently running Deepin 15.8 and Manjaro KDE 18.

    Comment


      #3
      Thanks, oshunluvr. There must be millions of us facing this task.
      Not very Kubuntu, but very KFN.

      Regards, John Little
      Regards, John Little

      Comment


        #4
        Thanks oshunluvr. This is exactly what I was looking for.

        We have been struggling to keep our photos organized. The digital photos are stored by year and month. The real challenge for me are the 2500 or so slides that I digitalized. Not all of them were clearly marked so I have approximate dates. In some instances I only have the date the slides were developed. The really old prints date back to the 1920s. At least the old prints were in a photo album that had dates.
        sigpic

        Comment


          #5
          This worked beautifully with photos from digital cameras and phones. I modified one part of the renaming photo command. I used "Y" instead of "y" in %y%m%d_. I wanted the file image filename to start with a 4 digit year instead of the last two digits of the year. Doing this causes the photos to sort chronologically when I view the thumbnails in a file browser.

          Code:
          exiftool '-filename<CreateDate' -d %Y%m%d_%H%M%S%%-c.%%le -r -ext jpg /home/life0riley/Pictures/sample_test_pics
          I discovered some scanned images that were missing ModifyDate, DateTimeOriginal, and CreateDate.

          Code:
          life0riley@kubuntu1204:~$ exiftool '-filename<CreateDate' -d %Y%m%d_%H%M%S%%-c.%%le -r -ext jpg /home/life0riley/Pictures/"Old Cell Phone Pics"
          Warning: No writable tags found - /home/life0riley/Pictures/Old Cell Phone Pics/1004090921.jpg
          Warning: No writable tags found - /home/life0riley/Pictures/Old Cell Phone Pics/1030091440.jpg
              5 directories scanned
            353 image files updated
              2 image files unchanged
          life0riley@kubuntu1204:~$
          I updated them using the following example. There were only two, so I updated them one at a time.

          Code:
          exiftool -AllDates='2009:06:01 08:35:00' -overwrite_original -P /home/life0riley/Pictures/"Old Cell Phone Pics"/1004090921.jpg
          The "-P" preserves the file modification date/time. After this I ran the rename command again.

          The scanned images will be my biggest challenge. I'll have to be update them with the CreateDate. It will take some time to get through them.

          Thanks oshunluvr for posting this and providing the link.
          sigpic

          Comment


            #6
            I'm glad you guys found this useful. Next will be family video editing!

            Please Read Me

            Comment


              #7
              lifeOriley: I too have the old slides and dating issue - although not quite as old!

              I initially dated them as stamped. My plan is to eventually examine them along with ones missing data and try to get them in order, which is all I really need.

              Please Read Me

              Comment

              Working...
              X