I’d consider most of my posts to be intellectually interesting but not very useful. This post is going to be the opposite.

I hate duplicate photos. I’m not sure why, probably because all good programmers are at least a little OCD and I’m a programmer (logical flaw intended).

Now that my kids are growing up, I care a lot more about my photo library than I have in the past. For example, I’ve figured out how to rate photos. But organizing and rating your photos is annoying when you have duplicates. The possiblity of duplicates has also made me hesitant to import large numbers of photos, since I’m often not sure if I’ve imported some of them before.

Well, fear no more. I present to you a pair of extremely useful python libraries1 that make programmatically interacting with your iPhotos library a breeze: osxphotos and photoscript.

Why two?

  • osxphotos is extremely full featured but is basically “read only”. It’s not able to modify your photo library, but it can inspect almost any part of it.
  • photoscript is less fully featured but, crucially, it can modify your photo library.

Combined, you have a very powerful tool. Without further ado, here is a simple python script that will find all your duplicate photos (I limited my search to photos contained in the ‘originals’ folder) and place them in an album called “Duplicates”. You need to create that album manually before running the script.

Importantly, this script will not delete any photos. It will just conveniently place them in a Duplicates album for you to remove manually. In order to efficiently delete photos from within an album, it’s worth knowing about this one neat trick2.

import os.path
import osxphotos
import photoscript
import hashlib

db = os.path.expanduser("~/Pictures/Photos Library.photoslibrary")
photosdb = osxphotos.PhotosDB(db)
originals = [ p for p in photosdb.photos() if p.path and '/originals/' in p.path ]

# this computes a hash of your photo such that if two hashes are equal
# then the photos are equal (with probability 1 or something)
def md5sum(photo):
	return hashlib.md5(open(photo.path,'rb').read()).hexdigest()
	
# this part takes a while.  Compute all the md5sum hashes and group photos by hash.
by_md5sum = {}
for p in originals:
    hash = md5sum(p)
    if hash not in by_md5sum: by_md5sum[hash] = []
    by_md5sum[hash].append(p)
	
duplicates = [ p for (k,v) in by_md5sum.items() if len(v) > 1 for p in v ]
print(f"found {len(duplicates)} duplicates")

# here is where we start using photoscript
duplicates_album = photoscript.PhotosLibrary().album("Duplicates")

if duplicates_album is not None:
    # here we use the uuid from osxphotos' object to find the photo using photoscript
    duplicate_photos = [ photoscript.Photo(p.uuid) for p in duplicates ]
    duplicates_album.add(duplicate_photos)
else:
    print("You need to create the Duplicates album first")
  1. Many thanks to Rhet Turnbull for not only creating and open-sourcing these libraries but also being extremely helpful and gracious when I asked him questions via email. 

  2. Hold down the command key after right clicking on a photo (or set of photos) and “remove from album” will become “delete”.