СоНоты

dpsearch-4.53-14072009

Последнем снапшоте DataparkSearch добавлена поддержка библиотеки libextractor.

При помощи этой бибилиотеки DataparkSearch может индексировать ключевые слова из файлов следующих форматов: PDF, PS, OLE2 (DOC, XLS, PPT), OpenOffice (sxw), StarOffice (sdw), DVI, MAN, FLAC, MP3 (ID3v1 and ID3v2), NSF(E) (NES music), SID (C64 music), OGG, WAV, EXIV2, JPEG, GIF, PNG, TIFF, DEB, RPM, TAR(.GZ), ZIP, ELF, S3M (Scream Tracker 3), XM (eXtended Module), IT (Impulse Tracker), FLV, REAL, RIFF (AVI), MPEG, QT and ASF.

Ниже приводится соответствие между типами keyword в libextractor и именами секций DataparkSearch:

Тип keyword Имя секции
EXTRACTOR_FILENAME Filename
EXTRACTOR_MIMETYPE Mimetype
EXTRACTOR_TITLE Title
EXTRACTOR_AUTHOR Author
EXTRACTOR_ARTIST Artist
EXTRACTOR_DESCRIPTION Description
EXTRACTOR_COMMENT Comment
EXTRACTOR_DATE Date
EXTRACTOR_PUBLISHER Publisher
EXTRACTOR_LANGUAGE Content-Language
EXTRACTOR_ALBUM Album
EXTRACTOR_GENRE Genre
EXTRACTOR_LOCATION Location
EXTRACTOR_VERSIONNUMBER VersionNumber
EXTRACTOR_ORGANIZATION Organization
EXTRACTOR_COPYRIGHT Copyright
EXTRACTOR_SUBJECT Subject
EXTRACTOR_KEYWORDS Meta.Keywords
EXTRACTOR_CONTRIBUTOR Contributor
EXTRACTOR_RESOURCE_TYPE Resource-Type
EXTRACTOR_FORMAT Format
EXTRACTOR_RESOURCE_IDENTIFIER Resource-Idendifier
EXTRACTOR_SOURCE Source
EXTRACTOR_RELATION Relation
EXTRACTOR_COVERAGE Coverage
EXTRACTOR_SOFTWARE Software
EXTRACTOR_DISCLAIMER Disclaimer
EXTRACTOR_WARNING Warning
EXTRACTOR_TRANSLATED Translated
EXTRACTOR_CREATION_DATE Creation-Date
EXTRACTOR_MODIFICATION_DATE Modification-Date
EXTRACTOR_CREATOR Creator
EXTRACTOR_PRODUCER Producer
EXTRACTOR_PAGE_COUNT Page-Count
EXTRACTOR_PAGE_ORIENTATION Page-Orientation
EXTRACTOR_PAPER_SIZE Paper-Size
EXTRACTOR_USED_FONTS Used-Fonts
EXTRACTOR_PAGE_ORDER Page-Order
EXTRACTOR_CREATED_FOR Created-For
EXTRACTOR_MAGNIFICATION Magnification
EXTRACTOR_RELEASE Release
EXTRACTOR_GROUP Group
EXTRACTOR_SIZE Size
EXTRACTOR_SUMMARY Summary
EXTRACTOR_PACKAGER Packager
EXTRACTOR_VENDOR Vendor
EXTRACTOR_LICENSE License
EXTRACTOR_DISTRIBUTION Distribution
EXTRACTOR_BUILDHOST BuildHost
EXTRACTOR_OS OS
EXTRACTOR_DEPENDENCY Dependency
EXTRACTOR_HASH_MD4 Hash-MD4
EXTRACTOR_HASH_MD5 Hash-MD5
EXTRACTOR_HASH_SHA0 Hash-SHA0
EXTRACTOR_HASH_SHA1 Hash-SHA1
EXTRACTOR_HASH_RMD160 Hash-RMD160
EXTRACTOR_RESOLUTION Resolution
EXTRACTOR_CATEGORY Ext.Category
EXTRACTOR_BOOKTITLE BookTitle
EXTRACTOR_PRIORITY Priority
EXTRACTOR_CONFLICTS Conflicts
EXTRACTOR_REPLACES Replaces
EXTRACTOR_PROVIDES Provides
EXTRACTOR_CONDUCTOR Conductor
EXTRACTOR_INTERPRET Interpret
EXTRACTOR_OWNER Owner
EXTRACTOR_LYRICS Lyrics
EXTRACTOR_MEDIA_TYPE Media-Type
EXTRACTOR_CONTACT Contact
EXTRACTOR_THUMBNAIL_DATA Thumbnail-Data
EXTRACTOR_PUBLICATION_DATE Publication-Date
EXTRACTOR_CAMERA_MAKE Camera-Make
EXTRACTOR_CAMERA_MODEL Camera-Model
EXTRACTOR_EXPOSURE Exposure
EXTRACTOR_APERTURE Aperture
EXTRACTOR_EXPOSURE_BIAS Exposure-Bias
EXTRACTOR_FLASH Flash
EXTRACTOR_FLASH_BIAS Flash-Bias
EXTRACTOR_FOCAL_LENGTH Focal-Length
EXTRACTOR_FOCAL_LENGTH_35MM Focal-Length-35MM
EXTRACTOR_ISO_SPEED ISO-Speed
EXTRACTOR_EXPOSURE_MODE Exposure-Mode
EXTRACTOR_METERING_MODE Metering-Mode
EXTRACTOR_MACRO_MODE Macro-Mode
EXTRACTOR_IMAGE_QUALITY Image-Quality
EXTRACTOR_WHITE_BALANCE White-Balance
EXTRACTOR_ORIENTATION Orientation
EXTRACTOR_TEMPLATE Template
EXTRACTOR_SPLIT Split
EXTRACTOR_PRODUCTVERSION ProductVersion
EXTRACTOR_LAST_SAVED_BY Last-Saved-By
EXTRACTOR_LAST_PRINTED Last-Printed
EXTRACTOR_WORD_COUNT Word-Count
EXTRACTOR_CHARACTER_COUNT Character-Count
EXTRACTOR_TOTAL_EDITING_TIME Total-Editing-Time
EXTRACTOR_THUMBNAILS Thumbnails
EXTRACTOR_SECURITY Security
EXTRACTOR_CREATED_BY_SOFTWARE Created-By-Software
EXTRACTOR_MODIFIED_BY_SOFTWARE Modified-By-Software
EXTRACTOR_REVISION_HISTORY Revision-History
EXTRACTOR_LOWERCASE Lowercase
EXTRACTOR_COMPANY Company
EXTRACTOR_GENERATOR Generator
EXTRACTOR_CHARACTER_SET Meta-Charset
EXTRACTOR_LINE_COUNT Line-Count
EXTRACTOR_PARAGRAPH_COUNT Paragraph-Count
EXTRACTOR_EDITING_CYCLES Editing-Cycles
EXTRACTOR_SCALE Scale
EXTRACTOR_MANAGER Manager
EXTRACTOR_MOVIE_DIRECTOR Movie-Director
EXTRACTOR_DURATION Duration
EXTRACTOR_INFORMATION Information
EXTRACTOR_FULL_NAME Full-Name
EXTRACTOR_CHAPTER Chapter
EXTRACTOR_YEAR Year
EXTRACTOR_LINK Link
EXTRACTOR_MUSIC_CD_IDENTIFIER Music-CD-Identifier
EXTRACTOR_PLAY_COUNTER Play-Counter
EXTRACTOR_POPULARITY_METER Popularity-Meter
EXTRACTOR_CONTENT_TYPE Ext.Content-Type
EXTRACTOR_ENCODED_BY Encoded-By
EXTRACTOR_TIME Time
EXTRACTOR_MUSICIAN_CREDITS_LIST Musician-Credits-List
EXTRACTOR_MOOD Mood
EXTRACTOR_FORMAT_VERSION Format-Version
EXTRACTOR_TELEVISION_SYSTEM Television-System
EXTRACTOR_SONG_COUNT Song-Count
EXTRACTOR_STARTING_SONG Strting-Song
EXTRACTOR_HARDWARE_DEPENDENCY Hardware-Dependency
EXTRACTOR_RIPPER Ripper
EXTRACTOR_FILE_SIZE File-Size
EXTRACTOR_TRACK_NUMBER Track-Number
EXTRACTOR_ISRC ISRC
EXTRACTOR_DISC_NUMBER Disc-Number

Если секция из списка выше не указана в файле sections.conf, значение ключевого словас оответствующего типа будет записано как секция "body". Тоже самое относится к ключевым словам неизвестно типа.