EML to SQLite Indexer Skill (V7.0.0 - Management & Export Edition)
This skill indexes EML email files from a specified directory into an SQLite database and provides a feature-rich web interface for searching and management. It includes automatic deduplication, IP access control, Excel export, and a JSON-formatted scheduled backup and restore system configurable via the web interface.
Features
- - Efficient Indexing: Uses MD5 fingerprinting for automatic deduplication, ensuring no duplicate emails are imported. Supports processing millions of email records.
- Key Information Extraction: Automatically parses and stores email sender, recipient, subject, body content, and sent time.
- Web Query Interface: Provides a Flask-based web interface with:
-
Advanced Search: Keywords (subject/body, case-insensitive), sender (fuzzy matching), recipient (fuzzy matching), and date range filtering.
-
Excel Export: Export search results to an Excel-compatible CSV file, including the
original file path.
-
File Deletion: Delete specific emails from both the database and the
physical disk (Admin only).
-
Pagination: Optimized for large datasets to prevent browser slowdowns.
- - IP Access Control: Configurable whitelist of allowed IP addresses for web access, enhancing security. By default, only
localhost and 127.0.0.1 are allowed. - Web-Configurable Scheduled Backup:
-
Scheduled Mode: Configure via the web interface to set the backup frequency (e.g., every X days) and specific hour (0-23) for execution.
-
JSON Format: Backups export email data as structured JSON and compress it into a ZIP file, named
eml-indexer_YYYYMMDD.zip, offering excellent cross-platform compatibility.
-
Automatic Circular Overwrite: The system automatically retains a configured number of backups, deleting the oldest one when the limit is exceeded.
-
Manual Management: One-click download of JSON backup ZIPs and upload of ZIPs for database restoration via the web interface.
- - Admin-Exclusive Interface: When accessed from
localhost or 127.0.0.1, a "⚙️ System Settings" tab is displayed, providing IP management, backup configuration, and deletion functionalities.
Installation & Deployment
1. Environment Requirements
- - Python 3.8+ (latest stable version recommended)
- Recommended OS: Windows, Linux
2. Dependency Installation
Ensure your Python environment has the following packages installed:
CODEBLOCK0
3. File Structure
CODEBLOCK1
Usage
1. Index EML Emails
Run
indexer.py to import EML files from a specified directory into the database. It automatically skips already indexed emails on subsequent runs.
CODEBLOCK2
2. Start Web Query Interface (with Scheduled Backup)
Execute
app.py to start the Flask web server. The background scheduled backup thread will also start automatically.
python app.py
After starting, visit
http://localhost:5000 in your browser.
3. Manage System Settings
When accessing the web interface from
localhost or
127.0.0.1, click the "⚙️ System Settings" tab. Here you can:
- - IP Management: Add or remove allowed IP addresses.
- Backup Settings: Configure "Backup Interval (Days)", "Backup Time (Hour)", and "Number of Backups to Retain".
- Manual Backup: Click "Create and Download Backup (ZIP)" to generate an immediate JSON backup.
- Manual Restore: Upload a JSON backup ZIP file to restore the database.
Version History
- Added
Excel Export functionality (includes file paths).
- Added
Physical File Deletion functionality (Admin only).
- Updated documentation to include Traditional Chinese version in
references/SKILL-TW.md.
- Web-configurable scheduled backups (days interval and specific hour).
- Backup filenames formatted as
eml-indexer_YYYYMMDD.zip.
- Integrated scheduled check logic into the background thread.
- SKILL.md updated to English, with author changed to "威廉陳". Original Traditional Chinese SKILL.md moved to
references/SKILL-TW.md.
License
MIT License
技能名称: EML 转 SQLite 索引器
详细描述:
EML 转 SQLite 索引器技能 (V7.0.0 - 管理与导出版)
该技能将指定目录中的 EML 邮件文件索引到 SQLite 数据库中,并提供一个功能丰富的 Web 界面用于搜索和管理。它包含自动去重、IP 访问控制、Excel 导出以及一个可通过 Web 界面配置的 JSON 格式定时备份与恢复系统。
功能特点
- - 高效索引:使用 MD5 指纹进行自动去重,确保不会导入重复邮件。支持处理数百万条邮件记录。
- 关键信息提取:自动解析并存储邮件的发件人、收件人、主题、正文内容和发送时间。
- Web 查询界面:提供基于 Flask 的 Web 界面,包含:
-
高级搜索:关键词(主题/正文,不区分大小写)、发件人(模糊匹配)、收件人(模糊匹配)以及日期范围筛选。
-
Excel 导出:将搜索结果导出为兼容 Excel 的 CSV 文件,包含
原始文件路径。
-
文件删除:从数据库和
物理磁盘 中删除特定邮件(仅管理员)。
-
分页:针对大数据集优化,防止浏览器卡顿。
- - IP 访问控制:可配置允许访问 Web 界面的 IP 地址白名单,增强安全性。默认仅允许 localhost 和 127.0.0.1。
- Web 可配置定时备份:
-
定时模式:通过 Web 界面配置备份频率(例如每 X 天)和具体执行小时(0-23)。
-
JSON 格式:备份将邮件数据导出为结构化 JSON 并压缩成 ZIP 文件,命名为 eml-indexer_YYYYMMDD.zip,具有出色的跨平台兼容性。
-
自动循环覆盖:系统自动保留配置数量的备份,超出限制时删除最旧的备份。
-
手动管理:一键下载 JSON 备份 ZIP 文件,并通过 Web 界面上传 ZIP 文件进行数据库恢复。
- - 管理员专属界面:当从 localhost 或 127.0.0.1 访问时,会显示一个“⚙️ 系统设置”选项卡,提供 IP 管理、备份配置和删除功能。
安装与部署
1. 环境要求
- - Python 3.8+(建议使用最新稳定版)
- 推荐操作系统:Windows、Linux
2. 依赖安装
确保你的 Python 环境已安装以下包:
bash
pip install -r requirements.txt
或手动安装:
pip install Flask tqdm
3. 文件结构
eml_indexer/
├── app.py # Web 应用程序 (Flask) - 包含定时备份线程
├── indexer.py # 核心 EML 索引脚本
├── requirements.txt # Python 依赖列表
├── SKILL.md # 技能文档(英文)
├── config.json # 运行时配置(允许的 IP、备份频率、保留数量)
├── emails.db # SQLite 数据库文件(运行 indexer.py 后生成)
├── backups/ # JSON 备份目录(自动创建)
└── templates/
├── detail.html # 邮件详情页面模板
└── index.html # 邮件搜索与管理主页面模板
└── references/
└── SKILL-TW.md # 技能文档繁体中文版
使用方法
1. 索引 EML 邮件
运行 indexer.py 将指定目录中的 EML 文件导入数据库。后续运行时会自动跳过已索引的邮件。
bash
python indexer.py
<数据库路径 (默认: emails.db)>
2. 启动 Web 查询界面(含定时备份)
执行 app.py 启动 Flask Web 服务器。后台定时备份线程也会自动启动。
bash
python app.py
启动后,在浏览器中访问 http://localhost:5000。
3. 管理系统设置
当从 localhost 或 127.0.0.1 访问 Web 界面时,点击“⚙️ 系统设置”选项卡。在此处你可以:
- - IP 管理:添加或移除允许的 IP 地址。
- 备份设置:配置“备份间隔(天)”、“备份时间(小时)”和“保留备份数量”。
- 手动备份:点击“创建并下载备份 (ZIP)”立即生成一个 JSON 备份。
- 手动恢复:上传一个 JSON 备份 ZIP 文件以恢复数据库。
版本历史
- 新增 Excel 导出 功能(包含文件路径)。
- 新增 物理文件删除 功能(仅管理员)。
- 更新文档,在 references/SKILL-TW.md 中包含繁体中文版。
- Web 可配置的定时备份(天间隔和具体小时)。
- 备份文件名格式化为 eml-indexer_YYYYMMDD.zip。
- 将定时检查逻辑集成到后台线程中。
- SKILL.md 更新为英文,作者改为“威廉陳”。原始繁体中文 SKILL.md 移至 references/SKILL-TW.md。
许可证
MIT 许可证