morphic.dupfinder
Duplicate image and video detection via perceptual hashing with optional
GPU acceleration.
Accelerator
GPU/CPU backend selection and batch operations.
GPU Accelerator Module
Provides GPU-accelerated operations for image/video processing with automatic
fallback through: CUDA -> AMD/ROCm -> OpenCL -> CPU multiprocessing
Accelerates:
1. Image resizing/preprocessing
2. DCT computation for perceptual hashing
3. Hamming distance computation for similarity matrix
-
class morphic.dupfinder.accelerator.AcceleratorType(value)[source]
Bases: Enum
Available acceleration backends.
-
CUDA = 1
-
ROCM = 2
-
OPENCL = 3
-
CPU = 4
-
class morphic.dupfinder.accelerator.GPUAccelerator[source]
Bases: object
GPU-accelerated operations with automatic backend detection and fallback.
Priority: CUDA -> ROCm -> OpenCL -> CPU multiprocessing
- Return type:
GPUAccelerator
-
backend: AcceleratorType
-
property is_gpu_available: bool
Check if any GPU acceleration is available.
-
get_backend_name()[source]
Get human-readable backend name.
- Return type:
str
-
resize_image_batch(images, target_size)[source]
Resize a batch of images using the best available backend.
- Parameters:
-
- Return type:
list[ndarray]
-
compute_dct_batch(images)[source]
Compute DCT for a batch of images (used in pHash).
- Parameters:
images (list[ndarray])
- Return type:
list[ndarray]
-
compute_similarity_matrix(hashes, threshold=0.0)[source]
Compute pairwise similarity matrix for hash arrays.
- Parameters:
-
- Return type:
ndarray
-
batch_hamming_distance(hashes1, hashes2)[source]
Compute Hamming distances between two lists of hex hash strings.
- Parameters:
-
- Return type:
ndarray
-
morphic.dupfinder.accelerator.get_accelerator()[source]
Get the global GPU accelerator instance.
- Return type:
GPUAccelerator
-
morphic.dupfinder.accelerator.compute_phash_gpu(images, hash_size=8)[source]
Compute perceptual hashes for images using GPU acceleration.
- Parameters:
-
- Return type:
list[ndarray]
-
morphic.dupfinder.accelerator.compute_similarity_matrix_gpu(hashes, hash_size=16)[source]
Compute pairwise similarity matrix for hashes using GPU.
- Parameters:
-
- Return type:
ndarray
Images
Image duplicate detection.
Image Duplicate Finder module.
Detects duplicate images based on content similarity using perceptual hashing.
-
class morphic.dupfinder.images.ImageInfo(path, width=0, height=0, file_size=0, format='', mode='', phash=None, ahash=None, dhash=None, whash=None)[source]
Bases: object
Stores information about an image file.
- Parameters:
path (str)
width (int)
height (int)
file_size (int)
format (str)
mode (str)
phash (str | None)
ahash (str | None)
dhash (str | None)
whash (str | None)
-
path: str
-
width: int = 0
-
height: int = 0
-
file_size: int = 0
-
format: str = ''
-
mode: str = ''
-
phash: str | None = None
-
ahash: str | None = None
-
dhash: str | None = None
-
whash: str | None = None
-
to_dict()[source]
Convert to dictionary for JSON serialization.
- Return type:
dict
-
class morphic.dupfinder.images.ImageHasher(hash_size=16)[source]
Bases: object
Handles image loading and perceptual hashing.
- Parameters:
hash_size (int)
-
compute_hashes(image_path)[source]
Compute perceptual hashes for an image.
- Parameters:
image_path (str)
- Return type:
ImageInfo
-
class morphic.dupfinder.images.ImageDuplicateFinder(similarity_threshold=0.9, hash_size=16, num_workers=4, hash_type='combined', use_gpu=True, batch_size=1000)[source]
Bases: object
Finds duplicate images based on perceptual hash similarity.
- Parameters:
-
-
image_infos: dict[str, ImageInfo]
-
find_images(folder)[source]
Find all image files in a folder recursively.
- Parameters:
folder (str)
- Return type:
list[str]
-
process_images(image_files)[source]
Process all images and compute their hashes.
- Parameters:
image_files (list[str])
- Return type:
dict[str, ImageInfo]
-
compute_similarity(info1, info2)[source]
Compute similarity between two images based on their hashes.
- Parameters:
-
- Return type:
float
-
find_duplicates_fast()[source]
Find groups of duplicate images using hash bucketing.
- Return type:
list[list[tuple[str, float]]]
-
find_duplicates()[source]
Find groups of duplicate images.
- Return type:
list[list[tuple[str, float]]]
Videos
Video duplicate detection.
Video Duplicate Finder module.
Detects duplicate videos based on content similarity using perceptual hashing
of extracted frames.
-
class morphic.dupfinder.videos.VideoInfo(path, duration=0.0, fps=0.0, frame_count=0, width=0, height=0, file_size=0, frame_hashes=<factory>, average_hash=None)[source]
Bases: object
Stores information about a video file.
- Parameters:
-
-
path: str
-
duration: float = 0.0
-
fps: float = 0.0
-
frame_count: int = 0
-
width: int = 0
-
height: int = 0
-
file_size: int = 0
-
frame_hashes: list[str]
-
average_hash: str | None = None
-
to_dict()[source]
Convert to dictionary for JSON serialization.
- Return type:
dict
-
class morphic.dupfinder.videos.VideoHasher(num_frames=10, hash_size=16)[source]
Bases: object
Handles video frame extraction and perceptual hashing.
- Parameters:
num_frames (int)
hash_size (int)
Extract frames from a video at regular intervals.
- Parameters:
video_path (str)
- Return type:
tuple[list[ndarray], VideoInfo]
-
compute_frame_hash(frame)[source]
Compute perceptual hash for a single frame.
- Parameters:
frame (ndarray)
- Return type:
str
-
compute_video_hashes(video_path)[source]
Compute perceptual hashes for a video.
- Parameters:
video_path (str)
- Return type:
VideoInfo
-
class morphic.dupfinder.videos.VideoDuplicateFinder(similarity_threshold=0.85, num_frames=10, hash_size=16, num_workers=4, use_gpu=True)[source]
Bases: object
Finds duplicate videos based on perceptual hash similarity.
- Parameters:
-
-
video_infos: dict[str, VideoInfo]
-
find_videos(folder)[source]
Find all video files in a folder recursively.
- Parameters:
folder (str)
- Return type:
list[str]
-
process_videos(video_files)[source]
Process all videos and compute their hashes.
- Parameters:
video_files (list[str])
- Return type:
dict[str, VideoInfo]
-
compute_similarity(info1, info2)[source]
Compute similarity between two videos.
- Parameters:
-
- Return type:
float
-
find_duplicates()[source]
Find groups of duplicate videos.
- Return type:
list[list[tuple[str, float]]]
Scanner
Background scan job management (used by the web API).
Background scan job management for the dupfinder web UI.
Handles running duplicate-detection scans in background threads and
converting results into JSON-serializable formats.
-
class morphic.dupfinder.scanner.ScanJob(id, folder, scan_type, status='pending', progress=0.0, message='', error=None, image_groups=<factory>, video_groups=<factory>, image_infos=<factory>, video_infos=<factory>, total_files_found=0, total_files_processed=0, space_savings=0, started_at=0.0, finished_at=0.0, image_threshold=0.9, video_threshold=0.85)[source]
Bases: object
Represents a running or completed scan job.
- Parameters:
-
-
id: str
-
folder: str
-
scan_type: str
-
status: str = 'pending'
-
progress: float = 0.0
-
message: str = ''
-
error: str | None = None
-
image_groups: list[list[dict]]
-
video_groups: list[list[dict]]
-
image_infos: dict[str, ImageInfo]
-
video_infos: dict[str, VideoInfo]
-
total_files_found: int = 0
-
total_files_processed: int = 0
-
space_savings: int = 0
-
started_at: float = 0.0
-
finished_at: float = 0.0
-
image_threshold: float = 0.9
-
video_threshold: float = 0.85
-
morphic.dupfinder.scanner.get_job(job_id)[source]
Retrieve a scan job by ID.
- Parameters:
job_id (str)
- Return type:
ScanJob | None
-
morphic.dupfinder.scanner.start_job(folder, scan_type, image_threshold=0.9, video_threshold=0.85)[source]
Create and launch a new scan job. Returns the job ID.
- Parameters:
-
- Return type:
str