I have a problem: I’ve put all my photos I’ve taken the past ten years into Lightroom. To ensure that I didn’t miss any, I’ve put in all from my server, all from my backups etc etc etc. In short: duplicate galore. Now, I’ve sorted by filename and removed duplicates. I’ve sorted by capture time and removed duplicates. That made me go from 88k to 74k. But I still have duplicates. They have different times because some were posted in my galleries. They have different sizes because some were thumbnails. They have different names because some were exported and sent via mail. And some are more odd, or perhaps a combination of them all. But in 74k photos, finding duplicates and deciding which one to keep is hard!

I have made a solution: a little Python script that will go through the Lightroom database, generate a 9×9 thumbnail of the photo, and compare it to all other photos. It takes a couple of hours to run on my 74k photos, helped me clean out 2k duplicates with only two false positives. That’s not bad!

What it does is that it sets the label with a string representing the thumbnail, and then you sort your grid view by label and voila, you’re good to go. Since it goes for the raw files, it will find duplicates even if they have been corrected or worked on afterwards:



Another example is that if you’ve changed the colours a bit around, it will still find it:


So that’s all very nice. :-) Here’s the source code, under BSD license. Put it in your Lightroom directory together with Lightroom.lrdata. And oh, btw, if it blows up anything at all, do tell me, but don’t hold me responsible. This hasn’t been tested on much. But if you read the code, I think you’ll find it can do very limited harm

Challenges ahead:

  • RAW files not supported yet
  • Doesn’t work too well on the different size problem, must find a better solution
  • Faster run times?

Hope you can use it as well as it helped me

Share and Enjoy:
  • Print
  • del.icio.us
  • Facebook
  • LinkedIn
  • StumbleUpon
  • Tumblr
  • Twitter

  8 Responses to “Finding more Lightroom duplicates”

  1. i find duplicate finder is easier to use! it find duplicate files base on byte by byte search engine.

    get it here : http://www.ashisoft.com

  2. niklas

    Thanks, but no thanks. I’m looking for duplicates that are visually duplicates but not byte-for-byte duplicates.

  3. Sorry if i sound kind of dumb but how do you run the script?

  4. niklas

    Ron, you can download it and run it through the python interpreter from the terminal or a Mac or from command on Windows

  5. Thanks for tackling something I really wish Lightroom could do by default. I installed Python 3.01 and placed your LabelRoom.py script in the directory as you stated. This is the error I get:

    C:\MyLightRoomDataPath>python LabelRoom.py
    File “LabelRoom.py”, line 28
    print “Failed ” + filename
    ^
    SyntaxError: invalid syntax

    Sorry, but I’m not even vaguely familiar with Python. Any pointers?

  6. Thanks for tackling something I really wish Lightroom could do by default. I installed Python 3.01 and placed your LabelRoom.py script in the directory as you stated. This is the error I get:

    C:\MyLightRoomDataPath>python LabelRoom.py
    File “LabelRoom.py”, line 28
    print “Failed ” + filename
    ^
    SyntaxError: invalid syntax

    Sorry, but I’m not even vaguely familiar with Python. Any pointers?

  7. Ron, you can download it and run it through the python interpreter from the terminal or a Mac or from command on Windows

  8. I found dupeGuru to work great for me. It does not work within Lightroom but you can just point it at the folder with the high-res images and it will find the duplicates. It uses a visual recognition system. It is a bit slow but works great for me. When you are done identifiying the photos you can have dupeGuru move them to the trash. Then in Lightroom go to your main folder of high-res images and right-click and select “syncronize folder” and then you can remove those images from your Lightroom Library.

    http://www.hardcoded.net/dupeguru_pe/

 Leave a Reply

(required)

(required)

* Copy this password:

* Type or paste password here:

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

   
© 2012 Niklas Saers' blog Suffusion theme by Sayontan Sinha