morphic.dupfinder

Duplicate image and video detection via perceptual hashing with optional GPU acceleration.

Accelerator

GPU/CPU backend selection and batch operations.

GPU Accelerator Module

Provides GPU-accelerated operations for image/video processing with automatic fallback through: CUDA -> AMD/ROCm -> OpenCL -> CPU multiprocessing

Accelerates: 1. Image resizing/preprocessing 2. DCT computation for perceptual hashing 3. Hamming distance computation for similarity matrix

class morphic.dupfinder.accelerator.AcceleratorType(value)[source]

Bases: Enum

Available acceleration backends.

CUDA = 1
ROCM = 2
OPENCL = 3
CPU = 4
class morphic.dupfinder.accelerator.GPUAccelerator[source]

Bases: object

GPU-accelerated operations with automatic backend detection and fallback.

Priority: CUDA -> ROCm -> OpenCL -> CPU multiprocessing

Return type:

GPUAccelerator

backend: AcceleratorType
property is_gpu_available: bool

Check if any GPU acceleration is available.

get_backend_name()[source]

Get human-readable backend name.

Return type:

str

resize_image_batch(images, target_size)[source]

Resize a batch of images using the best available backend.

Parameters:
Return type:

list[ndarray]

compute_dct_batch(images)[source]

Compute DCT for a batch of images (used in pHash).

Parameters:

images (list[ndarray])

Return type:

list[ndarray]

compute_similarity_matrix(hashes, threshold=0.0)[source]

Compute pairwise similarity matrix for hash arrays.

Parameters:
Return type:

ndarray

batch_hamming_distance(hashes1, hashes2)[source]

Compute Hamming distances between two lists of hex hash strings.

Parameters:
Return type:

ndarray

morphic.dupfinder.accelerator.get_accelerator()[source]

Get the global GPU accelerator instance.

Return type:

GPUAccelerator

morphic.dupfinder.accelerator.compute_phash_gpu(images, hash_size=8)[source]

Compute perceptual hashes for images using GPU acceleration.

Parameters:
Return type:

list[ndarray]

morphic.dupfinder.accelerator.compute_similarity_matrix_gpu(hashes, hash_size=16)[source]

Compute pairwise similarity matrix for hashes using GPU.

Parameters:
Return type:

ndarray

Images

Image duplicate detection.

Image Duplicate Finder module.

Detects duplicate images based on content similarity using perceptual hashing.

class morphic.dupfinder.images.ImageInfo(path, width=0, height=0, file_size=0, format='', mode='', phash=None, ahash=None, dhash=None, whash=None)[source]

Bases: object

Stores information about an image file.

Parameters:
  • path (str)

  • width (int)

  • height (int)

  • file_size (int)

  • format (str)

  • mode (str)

  • phash (str | None)

  • ahash (str | None)

  • dhash (str | None)

  • whash (str | None)

path: str
width: int = 0
height: int = 0
file_size: int = 0
format: str = ''
mode: str = ''
phash: str | None = None
ahash: str | None = None
dhash: str | None = None
whash: str | None = None
to_dict()[source]

Convert to dictionary for JSON serialization.

Return type:

dict

class morphic.dupfinder.images.ImageHasher(hash_size=16)[source]

Bases: object

Handles image loading and perceptual hashing.

Parameters:

hash_size (int)

compute_hashes(image_path)[source]

Compute perceptual hashes for an image.

Parameters:

image_path (str)

Return type:

ImageInfo

class morphic.dupfinder.images.ImageDuplicateFinder(similarity_threshold=0.9, hash_size=16, num_workers=4, hash_type='combined', use_gpu=True, batch_size=1000)[source]

Bases: object

Finds duplicate images based on perceptual hash similarity.

Parameters:
  • similarity_threshold (float)

  • hash_size (int)

  • num_workers (int)

  • hash_type (str)

  • use_gpu (bool)

  • batch_size (int)

image_infos: dict[str, ImageInfo]
find_images(folder)[source]

Find all image files in a folder recursively.

Parameters:

folder (str)

Return type:

list[str]

process_images(image_files)[source]

Process all images and compute their hashes.

Parameters:

image_files (list[str])

Return type:

dict[str, ImageInfo]

compute_similarity(info1, info2)[source]

Compute similarity between two images based on their hashes.

Parameters:
Return type:

float

find_duplicates_fast()[source]

Find groups of duplicate images using hash bucketing.

Return type:

list[list[tuple[str, float]]]

find_duplicates()[source]

Find groups of duplicate images.

Return type:

list[list[tuple[str, float]]]

Videos

Video duplicate detection.

Video Duplicate Finder module.

Detects duplicate videos based on content similarity using perceptual hashing of extracted frames.

class morphic.dupfinder.videos.VideoInfo(path, duration=0.0, fps=0.0, frame_count=0, width=0, height=0, file_size=0, frame_hashes=<factory>, average_hash=None)[source]

Bases: object

Stores information about a video file.

Parameters:
path: str
duration: float = 0.0
fps: float = 0.0
frame_count: int = 0
width: int = 0
height: int = 0
file_size: int = 0
frame_hashes: list[str]
average_hash: str | None = None
to_dict()[source]

Convert to dictionary for JSON serialization.

Return type:

dict

class morphic.dupfinder.videos.VideoHasher(num_frames=10, hash_size=16)[source]

Bases: object

Handles video frame extraction and perceptual hashing.

Parameters:
  • num_frames (int)

  • hash_size (int)

extract_frames(video_path)[source]

Extract frames from a video at regular intervals.

Parameters:

video_path (str)

Return type:

tuple[list[ndarray], VideoInfo]

compute_frame_hash(frame)[source]

Compute perceptual hash for a single frame.

Parameters:

frame (ndarray)

Return type:

str

compute_video_hashes(video_path)[source]

Compute perceptual hashes for a video.

Parameters:

video_path (str)

Return type:

VideoInfo

class morphic.dupfinder.videos.VideoDuplicateFinder(similarity_threshold=0.85, num_frames=10, hash_size=16, num_workers=4, use_gpu=True)[source]

Bases: object

Finds duplicate videos based on perceptual hash similarity.

Parameters:
  • similarity_threshold (float)

  • num_frames (int)

  • hash_size (int)

  • num_workers (int)

  • use_gpu (bool)

video_infos: dict[str, VideoInfo]
find_videos(folder)[source]

Find all video files in a folder recursively.

Parameters:

folder (str)

Return type:

list[str]

process_videos(video_files)[source]

Process all videos and compute their hashes.

Parameters:

video_files (list[str])

Return type:

dict[str, VideoInfo]

compute_similarity(info1, info2)[source]

Compute similarity between two videos.

Parameters:
Return type:

float

find_duplicates()[source]

Find groups of duplicate videos.

Return type:

list[list[tuple[str, float]]]

Scanner

Background scan job management (used by the web API).

Background scan job management for the dupfinder web UI.

Handles running duplicate-detection scans in background threads and converting results into JSON-serializable formats.

class morphic.dupfinder.scanner.ScanJob(id, folder, scan_type, status='pending', progress=0.0, message='', error=None, image_groups=<factory>, video_groups=<factory>, image_infos=<factory>, video_infos=<factory>, total_files_found=0, total_files_processed=0, space_savings=0, started_at=0.0, finished_at=0.0, image_threshold=0.9, video_threshold=0.85)[source]

Bases: object

Represents a running or completed scan job.

Parameters:
id: str
folder: str
scan_type: str
status: str = 'pending'
progress: float = 0.0
message: str = ''
error: str | None = None
image_groups: list[list[dict]]
video_groups: list[list[dict]]
image_infos: dict[str, ImageInfo]
video_infos: dict[str, VideoInfo]
total_files_found: int = 0
total_files_processed: int = 0
space_savings: int = 0
started_at: float = 0.0
finished_at: float = 0.0
image_threshold: float = 0.9
video_threshold: float = 0.85
morphic.dupfinder.scanner.get_job(job_id)[source]

Retrieve a scan job by ID.

Parameters:

job_id (str)

Return type:

ScanJob | None

morphic.dupfinder.scanner.start_job(folder, scan_type, image_threshold=0.9, video_threshold=0.85)[source]

Create and launch a new scan job. Returns the job ID.

Parameters:
Return type:

str