Skip to content

Scanner

Please read Details about scanning and watching first.

Scanners are divided into built-in scanners and external scanners. All *.py files in the working directory will be loaded as external scanners.

For each candidate comic file (folder), scanners are executed sequentially in the order of the file name (regardless of whether it is built-in or not) to complete different tasks, like an assembly line.

You can find some external scanners in the official repository, or download those written by others. In addition, you can write it by yourself as long as you know some simple Python.

Built-in scanners

10-zip.py

Treat *.zip files as comic files.

11-archive.py

Treat regular archived files, except zip, as comic files. It requires 7-Zip, download it from https://7-zip.org/download.html and make sure 7zzs or 7zz or 7z is in the working directory or the directory indicated by PATH. 7-Zip obtained from some other sources may not support rar files.

20-ccloli.py

Parse comic files downloaded via ccloli/E-Hentai-Downloader.

21-hath.py

Parse comic folders downloaded via Hentai@Home.

22-ehviewer.py

Parse comic folders downloaded via EhViewer.

23-xeHentai.py

Parse comic files downloaded via xeHentai.

30-importEHdb.py

Import the corresponding metadata from ehentai metadata database. The database api_dump.sqlite must be downloaded to the working directory before use, otherwise it will be skipped.

Environment variable settings:

Environment variables Description Default value
importEHdb_thumb Whether to import thumbnails from it (True/False) True
importEHdb_matchtitle Whether to match based on the title (True/False/exact), exact for exact matching, True for fuzzy matching True
importEHdb_matchtorrent Whether to match based on the torrent file names (True/False) True
importEHdb_database_URI URI of the ehentai metadata database file:api_dump.sqlite?mode=rw

Matching based on ehentai gid is always enabled.

40-thumb.py

Generate homepage thumbnails from comic files.

Write your own scanners

Since scanners determine the order of execution based on file names, a number can be added at the beginning of the file name. General agreement:

  • Scanners starting with 1 determine whether it is a comic file based on the file type, and perform basic analysis (such as using the file name as the title and counting the number of pages);
  • Scanners starting with 2 parse metadata of files downloaded through specific downloaders;
  • Scanners starting with 3 perform post-processing on the previously obtained metadata;
  • Scanners starting with 4 are responsible for generating thumbnails.

The basic structure of scanners is as follows

from pathlib import Path
from typing import Union
from pydantic import Field
from pydantic_settings import BaseSettings
# some import and pre-process
# this may be executed multiple times, thus should avoid things like opening files

# optional
class Settings(BaseSettings):
    myscanner_settingA: bool = True  # It is recommended to prefix with the scanner name
    myscanner_settingB: Union[bool, str] = Field(default=True, union_mode='left_to_right')
settings = Settings()

class Scanner:
    '''Your docstrings'''

    # optional
    def __init__(self) -> None:
        # Some initializations will only be executed once

    def scan(self, path: Path, id: str, metadata: dict, prev_scanners: list[str]) -> bool:
        # Process each file/directory
        if xxx:
            ...
            return True
        else:
            return False
where Scanner.scan is the function that actually scans each file/directory.

The return value indicates whether the scanners were successfully processed (e.g. the file/directory is considered a valid comic, or whether there is modified metadata). If all scripts get False after scanning, ComicLib considers that the file/directory is not a valid comic and will not save it in the database. This return value is also used as a reference for subsequent scripts.

Parameters of Scanner.scan:

  • path: file/directory path for the input comic.
  • id: The unique ID pre-generated by ComicLib, which is a hash of the relative path to CONTENT. The database uses the ID given by metadata[id] instead, see the description of custom ID below.
  • metadata: Metadata obtained after processing by the previous script. The fields include id, title, subtitle source, pagecount, tags, categories. The initial values are None or set(), except for id. scanners write the resulting metadata into this dict.
  • prev_scanners: The name of the script that previously returned True.

custom ID (Experimental)

ComicLib first pre-generates a unique ID based on the path, starting with 00 as the value of the parameter id. Initially this ID is the same as metadata[id]. The scanner can generate a new ID based on id, metadata[id] modified by the previous scanning script and other information, and write it into metadata[id]. It is generally agreed that the first several characters of the ID represent the meaning of the ID. For example, the built-in scanner uses EH to represent its designed ID with ehentai gid information. The final metadata[id] is written to the database as a unique identifier for the comic. ID must be unique, and be 40 characters. Custom IDs will not work for updating metadata during rescanning