利用 GitHub Actions 为 Algolia 索引实现零停机时间的蓝绿部署自动化

DevOps

文章字数: 3.3k

阅读时长: 13 分

我们团队最初同步 Algolia 索引的方式非常直接：在部署流程的最后，执行一个 PHP 脚本，调用 scout:import 命令。对于一个只有几千条记录的小型项目，这没什么问题。但当数据量增长到百万级别，这个操作就成了发布的噩梦。一次完整的 re-index 需要几十分钟，在这期间，线上的搜索服务处于一种不稳定的中间状态：旧数据被部分删除，新数据还未完全写入。用户要么搜不到结果，要么搜到的是残缺不全的混合数据。这在生产环境中是完全无法接受的。

问题的根源在于操作的非原子性。我们需要一种机制，能在不影响当前生产索引的前提下，在后台构建一个全新的、完整的索引，然后在构建完成后，通过一次原子操作，瞬时将流量切换到新索引上。这正是蓝绿部署（Blue-Green Deployment）的核心思想，只不过我们这次要将它应用在数据索引上，而非应用服务。

初步的构想是这样的：

创建新索引（Blue）: 每次同步时，不再清空现有索引，而是创建一个带时间戳后缀的新索引，例如 products_20231027103000。
后台全量同步: 将所有数据完整地推送到这个新创建的 products_blue 索引中。在此期间，所有的线上搜索流量仍然打在旧的、稳定的 products_green 索引上。
原子切换: 当新索引完全准备就绪后，通过 Algolia 的 moveIndex API，将主别名（例如 products）原子地从旧索引指向新索引。这是一个毫秒级的操作，对用户无感知。
清理旧索引: 切换成功后，保留一到两个旧版本的索引作为快速回滚的备份，并删除更早的无用索引，以控制成本。

整个流程必须完全自动化，并集成到我们现有的 CI/CD 流程中。技术选型很明确：核心业务逻辑在 PHP 应用中，因此索引脚本由 PHP 实现；自动化流程则交给 GitHub Actions。

第一步：构建健壮的 PHP 索引器

我们需要一个比 scout:import 更智能的命令。这个命令需要承担起创建、填充、切换和清理的全部职责。我们决定在 Laravel 框架内创建一个自定义的 Artisan 命令。

<?php

namespace App\Console\Commands;

use Algolia\AlgoliaSearch\SearchClient;
use App\Models\Product;
use Illuminate\Console\Command;
use Illuminate\Support\Facades\Log;
use Throwable;

class AlgoliaBlueGreenIndexCommand extends Command
{
    /**
     * The name and signature of the console command.
     *
     * @var string
     */
    protected $signature = 'algolia:reindex 
                            {--model= : The model class to index, e.g., App\\Models\\Product}
                            {--alias= : The Algolia index alias name, e.g., products}
                            {--keep=2 : The number of old indices to keep for rollback}';

    /**
     * The console command description.
     *
     * @var string
     */
    protected $description = 'Performs a zero-downtime re-indexing of a model using a blue-green strategy.';

    private SearchClient $algolia;

    public function __construct()
    {
        parent::__construct();
        // 在生产环境中，这些凭证应当通过环境变量注入，而不是硬编码。
        // GitHub Actions Secrets 会负责注入这些值。
        $appId = config('scout.algolia.id');
        $adminKey = config('scout.algolia.key');

        if (!$appId || !$adminKey) {
            throw new \InvalidArgumentException('Algolia credentials are not configured.');
        }

        $this->algolia = SearchClient::create($appId, $adminKey);
    }

    /**
     * Execute the console command.
     *
     * @return int
     */
    public function handle(): int
    {
        $modelClass = $this->option('model');
        $aliasName = $this->option('alias');
        $indicesToKeep = (int) $this->option('keep');

        if (!class_exists($modelClass)) {
            $this->error("Model class [{$modelClass}] does not exist.");
            return Command::FAILURE;
        }

        if (empty($aliasName)) {
            $this->error("The --alias option is required.");
            return Command::FAILURE;
        }

        $newIndexName = $aliasName . '_' . now()->format('Ymd_His');
        $this->info("🚀 Starting blue-green re-indexing process for alias [{$aliasName}]");
        $this->line("   - New temporary index: <fg=cyan>{$newIndexName}</fg=cyan>");

        try {
            // 步骤 1: 初始化新索引并配置
            $newIndex = $this->algolia->initIndex($newIndexName);
            $this->info("   - Creating and configuring new index...");
            
            // 复制现有索引的设置为基础模板
            // 如果别名指向的旧索引存在，则复制其设置
            $sourceSettings = $this->getSettingsFromExistingIndex($aliasName);
            if (!empty($sourceSettings)) {
                $newIndex->setSettings($sourceSettings);
                $this->info("   - Settings copied from existing index.");
            } else {
                $this->warn("   - No existing index found for alias. Using default settings.");
                // 可在此处设置默认索引配置
            }
            
            // 步骤 2: 将数据推送到新索引
            $this->info("   - Indexing data from model [{$modelClass}]...");
            $modelInstance = new $modelClass();
            
            // 使用 Scout 的 chunking 机制来处理大量数据，防止内存溢出
            $modelInstance::makeAllSearchable($newIndexName);

            $this->info("   - ✅ Data indexing complete.");

            // 步骤 3: 原子性地切换别名
            $this->info("   - Atomically moving alias [{$aliasName}] to point to [{$newIndexName}]...");
            $this->algolia->moveIndex($newIndexName, $aliasName);
            $this->info("   - ✅ Alias switch complete. Search is now served by the new index.");

            // 步骤 4: 清理旧索引
            $this->info("   - Cleaning up old indices, keeping the last {$indicesToKeep} versions...");
            $this->cleanupOldIndices($aliasName, $indicesToKeep);

            $this->info("🎉 Blue-green re-indexing process finished successfully.");
            return Command::SUCCESS;

        } catch (Throwable $e) {
            // 关键的错误处理：如果过程中任何一步失败，必须清理掉创建的临时索引。
            $this->error("❌ An error occurred during the process: " . $e->getMessage());
            Log::error('AlgoliaBlueGreenIndexCommand failed', [
                'alias' => $aliasName,
                'temp_index' => $newIndexName,
                'exception' => $e,
            ]);

            $this->info("   - Cleaning up failed temporary index [{$newIndexName}]...");
            if ($this->algolia->initIndex($newIndexName)->exists()) {
                $this->algolia->initIndex($newIndexName)->delete();
                $this->warn("   - Temporary index [{$newIndexName}] has been deleted.");
            }
            
            return Command::FAILURE;
        }
    }

    /**
     * 获取别名当前指向的索引的配置
     */
    private function getSettingsFromExistingIndex(string $aliasName): array
    {
        try {
            $aliasIndex = $this->algolia->initIndex($aliasName);
            if ($aliasIndex->exists()) {
                return $aliasIndex->getSettings();
            }
        } catch (Throwable $e) {
            // 如果别名不存在或查询失败，则返回空数组
            $this->warn("   - Could not retrieve settings for alias [{$aliasName}]: " . $e->getMessage());
        }
        return [];
    }

    /**
     * 删除旧的索引，只保留指定数量的最新版本
     */
    private function cleanupOldIndices(string $aliasName, int $keepCount): void
    {
        $indices = $this->algolia->listIndices()['items'] ?? [];
        
        $relatedIndices = collect($indices)
            ->filter(fn($index) => str_starts_with($index['name'], $aliasName . '_'))
            ->sortByDesc('createdAt')
            ->values();

        if ($relatedIndices->count() <= $keepCount) {
            $this->info("   - No old indices to clean up (found {$relatedIndices->count()}, keeping {$keepCount}).");
            return;
        }

        $indicesToDelete = $relatedIndices->slice($keepCount);

        if ($indicesToDelete->isEmpty()) {
            return;
        }
        
        $this->warn("   - Found {$indicesToDelete->count()} old indices to delete.");

        foreach ($indicesToDelete as $index) {
            try {
                $this->algolia->initIndex($index['name'])->delete();
                $this->line("     - Deleted: <fg=gray>{$index['name']}</fg=gray>");
            } catch (Throwable $e) {
                $this->error("     - Failed to delete index [{$index['name']}]: " . $e->getMessage());
                Log::warning('Failed to delete old Algolia index', [
                    'index_name' => $index['name'],
                    'exception' => $e,
                ]);
            }
        }
    }
}

这个脚本是整个流程的核心。几个关键的设计考量：

参数化: 命令通过 --model 和 --alias 参数驱动，使其可以复用于项目中任何需要被索引的模型，提高了可维护性。
配置继承: 新索引会自动复制旧索引的设置（如可搜索属性、排名规则等）。这确保了索引行为的一致性，避免了每次手动配置的疏漏。
错误处理与回滚: try-catch 块是生命线。如果数据同步失败，脚本会捕获异常，并主动删除已经创建但未填充完成的临时索引，避免留下垃圾数据。这是生产级脚本必须具备的鲁棒性。
资源清理: 自动清理旧索引的逻辑至关重要。否则，随着每次发布，Algolia 账户中的索引数量会无限增长，导致管理混乱和成本飙升。

第二步：集成到 GitHub Actions

有了可靠的执行脚本，下一步就是将其自动化。我们创建了一个新的 GitHub Actions workflow 文件 .github/workflows/algolia-reindex.yml。

目标是当 main 分支有新的代码合并时，自动触发这个索引过程。同时，我们也需要一个手动触发的选项（workflow_dispatch），以便在需要时可以强制执行全量同步。

name: Algolia Zero-Downtime Re-index

on:
  push:
    branches:
      - main
    paths:
      - 'app/Models/Product.php' # 只在特定模型或数据源变化时触发
      - 'database/migrations/**' # 或者在数据库结构变化时
  workflow_dispatch: # 允许手动触发

jobs:
  blue-green-reindex:
    name: Run Algolia Blue-Green Indexing
    runs-on: ubuntu-latest
    # 设置超时，防止任务因数据量过大而无限期运行
    timeout-minutes: 60 

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Setup PHP
        uses: shivammathur/setup-php@v2
        with:
          php-version: '8.2'
          extensions: dom, curl, libxml, mbstring, zip, pcntl, pdo, sqlite, pdo_sqlite
          coverage: none

      - name: Get Composer Cache Directory
        id: composer-cache
        run: echo "dir=$(composer config cache-files-dir)" >> $GITHUB_OUTPUT

      - name: Cache Composer dependencies
        uses: actions/cache@v4
        with:
          path: ${{ steps.composer-cache.outputs.dir }}
          key: ${{ runner.os }}-composer-${{ hashFiles('**/composer.lock') }}
          restore-keys: |
            ${{ runner.os }}-composer-

      - name: Install Dependencies
        run: composer install --prefer-dist --no-progress --no-suggest

      - name: Run Algolia Blue-Green Re-indexing
        # 这是核心执行步骤
        # 使用 GitHub Secrets 来安全地管理敏感凭证
        env:
          # Laravel 需要 .env 文件
          APP_KEY: ${{ secrets.APP_KEY }}
          DB_CONNECTION: sqlite
          DB_DATABASE: database/database.sqlite
          # Algolia 凭证
          SCOUT_DRIVER: algolia
          ALGOLIA_APP_ID: ${{ secrets.ALGOLIA_APP_ID }}
          ALGOLIA_SECRET: ${{ secrets.ALGOLIA_ADMIN_KEY }} # 必须使用 Admin Key
        run: |
          # 对于 CI 环境，我们创建一个临时的 sqlite 数据库来运行 Artisan 命令
          touch database/database.sqlite
          php artisan migrate --force

          # 执行我们的自定义命令
          php artisan algolia:reindex --model="App\\Models\\Product" --alias="products" --keep=2

这个 Workflow 的设计同样包含了生产实践中的考量：

触发条件 (on): 精确控制触发时机。我们不希望每次代码提交都触发耗时的索引重建。只有当 Product 模型（数据结构）或数据库迁移文件（可能影响数据源）发生变化时，才自动运行。这是一种成本和效率的平衡。
凭证管理: ALGOLIA_APP_ID 和 ALGOLIA_ADMIN_KEY 绝对不能硬编码在代码或 workflow 文件中。它们被配置在 GitHub 仓库的 Settings > Secrets and variables > Actions 中，并通过 env 块安全地注入到运行环境中。
依赖缓存: actions/cache 的使用极大地加速了后续的 workflow 运行。Composer 的依赖包会被缓存起来，除非 composer.lock 文件发生变化，否则不需要重新下载。
环境设置: 注意到 DB_CONNECTION 设置为 sqlite。这是因为在 CI 环境中，我们通常不需要连接到生产数据库。Artisan 命令本身能够加载 Eloquent 模型并获取其 toSearchableArray 的数据结构，即使数据库是空的，只要模型逻辑不依赖于实际数据查询，这个过程也能成功。如果索引过程需要查询数据库，这里则需要配置一个可访问的、只读的生产数据库副本。
超时控制: timeout-minutes: 60 是一个保险丝。如果索引过程因为未知原因卡住，整个 job 会在1小时后自动失败，而不是永远挂起，消耗计算资源。

流程可视化

为了更清晰地理解这个流程，我们可以用图表来描述整个生命周期。

graph TD
    subgraph "GitHub Repository"
        A[Push to main branch]
    end

    subgraph "GitHub Actions Runner"
        B(Workflow Triggered) --> C{Setup Environment};
        C --> D[Install PHP & Composer Deps];
        D --> E[Execute `php artisan algolia:reindex`];
    end

    subgraph "Algolia Infrastructure"
        F(Alias: products -> products_v1);
        G(New Index: products_v2);
        H(Atomic Alias Switch);
        I(Alias: products -> products_v2);
        J(Delete Old Index: products_v0);
    end
    
    A --> B;
    
    E -- 1. Create New Index --> G;
    E -- 2. Push Data --> G;
    E -- 3. Move Alias --> H;
    
    F -- Before Switch --> H;
    G -- After Switch --> H;
    
    H --> I;
    
    E -- 4. Cleanup --> J;

这个图表直观地展示了代码提交如何触发一个自动化的、多阶段的索引更新过程，最终在 Algolia 端完成了一次无缝切换。

潜在的陷阱与局限性

尽管这个方案解决了零停机更新的核心问题，但在真实项目中，它并非银弹。

首先，成本问题。在蓝绿切换的窗口期，我们实际上在 Algolia 中维持了至少两份全量数据。对于千万甚至上亿级别记录的索引，这意味着双倍的存储成本。虽然这个窗口期不长，但对于成本敏感的业务，这是一个需要权衡的因素。清理策略中的 --keep=2 参数也需要根据业务的回滚需求和成本预算进行调整。

其次，超大规模数据的处理。当单次全量索引的时间超过 GitHub Actions 的最大运行时限（公共 runner 为6小时），或者成本变得无法接受时，这种全量同步的模式就走到了尽头。届时，必须转向更为复杂的增量更新或准实时同步方案，例如通过监听数据库的变更数据捕获（CDC）日志（如使用 Debezium），将变更实时推送到 Algolia。这套方案的架构复杂度会呈指数级上升。

最后，数据一致性窗口。从开始创建新索引到最后切换别名，这期间如果生产数据库有新的数据写入，这些新数据是不会出现在 products_blue 索引中的。这意味着在切换完成的瞬间，索引会短暂地“落后”于主数据库。这个延迟窗口通常可以接受，但对于数据实时性要求极高的场景（如金融交易数据），这是一个必须正视的问题。解决方案可能是在全量同步开始时记录一个时间戳，在切换别名前，再拉取一次从该时间戳之后的所有增量数据。

这个基于 GitHub Actions 和 PHP 的蓝绿部署方案，为中大型项目提供了一个兼具可靠性、自动化和可维护性的 Algolia 索引策略。它将复杂的索引生命周期管理，抽象成了一行代码和一段 YAML，让开发者可以专注于业务本身，而不是每次发布时都为搜索服务提心吊胆。

CI/CD GitHub Actions Algolia PHP DevOps

构建基于 Fastify 与 ASP.NET Core 的异构 MLOps 推理服务

2023-10-27 架构与设计

Fastify ASP.NET Core MLflow OCI DigitalOcean

在 DigitalOcean 上使用 Redis Streams 构建具备幂等消费与死信队列的事件处理器

2023-10-27 后端架构

Redis DigitalOcean 事件驱动架构 (EDA) NoSQL Go