How to Run Batch Processing with a Custom Drush 9 Command in Drupal 8

Batch processing is usually an important aspect of any Drupal project and even more when we need to process huge amounts of data.

The main advantage of using batch is that it allows large amounts of data to be processed into small chunks or page requests that run without any manual intervention. Rather than use a single page load to process lots of data, the batch API allows the data to be processed as number of small page requests.

Using batch processing, we divide a large process into small pieces, each one of them executed as a separate page request, thereby avoiding stress on the server. This means that we can easily process 10,000 items without using up all of the server resources in a single page load.

This helps to ensure that the processing is not interrupted due to PHP timeouts, while users are still able to receive feedback on the progress of the ongoing operations.

Some uses of the batch API in Drupal:

  • Import or migrate data from an external source
  • Clean-up internal data
  • Run an action on several nodes
  • Communicate with an external API

Normally, batch jobs are launched by a form. However, what can we do if we want a *nix crontab to launch them on a regular basis? One of the best solutions is to use an external command like a custom Drush command and launch it from this crontab.

In this post we are going to create a custom Drush 9 command that loads all the nodes of a content type passed in as an argument (page, article ...). Then a batch process will simulate a long operation on each node. Next, we'll see how to run this drush command from crontab.

You can find the code for this module here: https://github.com/KarimBoudjema/Drupal8-ex-batch-with-drush9-command

This is the tree of the module:

tree web/modules/custom/ex_batch_drush9/
web/modules/custom/ex_batch_drush9/
|-- composer.json
|-- drush.services.yml
|-- ex_batch_drush9.info.yml
|-- README.txt
`-- src
    |-- BatchService.php
    `-- Commands
        `-- ExBatchDrush9Commands.php

We are going to proceed in three steps:

  1. Create a class to host our two main callback methods for batch processing (BatchService.php).
  2. Create our custom Drush 9 command to retrieve nodes, to create and process the batch sets (ExBatchDrush9Commands.php).
  3. Create a crontab task to run automatically the Drush command at scheduled times.

 

1. Create a BatchService class for the batch operations

A batch process is made of two main callbacks functions, one for processing each batch and the other for post-processing operations.

So in our class will have two methods, processMyNode() for processing each batch and processMyNodeFinished() to be launched when the batch processing is finished.

It's best practice to store the callback functions in their own file. This keeps them separate from anything else that your module might be doing. In this case I prefer to store them in a class that we can reuse later as a service.

Here is the code of BatchService.php

<?php
namespace Drupal\ex_batch_drush9;
/**
 * Class BatchService.
 */
class BatchService {
  /**
   * Batch process callback.
   *
   * @param int $id
   *   Id of the batch.
   * @param string $operation_details
   *   Details of the operation.
   * @param object $context
   *   Context for operations.
   */
  public function processMyNode($id, $operation_details, &$context) {
    // Simulate long process by waiting 100 microseconds.
    usleep(100);
    // Store some results for post-processing in the 'finished' callback.
    // The contents of 'results' will be available as $results in the
    // 'finished' function (in this example, processMyNodeFinished()).
    $context['results'][] = $id;
    // Optional message displayed under the progressbar.
    $context['message'] = t('Running Batch "@id" @details',
      ['@id' => $id, '@details' => $operation_details]
    );
  }
  /**
   * Batch Finished callback.
   *
   * @param bool $success
   *   Success of the operation.
   * @param array $results
   *   Array of results for post processing.
   * @param array $operations
   *   Array of operations.
   */
  public function processMyNodeFinished($success, array $results, array $operations) {
    $messenger = \Drupal::messenger();
    if ($success) {
      // Here we could do something meaningful with the results.
      // We just display the number of nodes we processed...
      $messenger->addMessage(t('@count results processed.', ['@count' => count($results)]));
    }
    else {
      // An error occurred.
      // $operations contains the operations that remained unprocessed.
      $error_operation = reset($operations);
      $messenger->addMessage(
        t('An error occurred while processing @operation with arguments : @args',
          [
            '@operation' => $error_operation[0],
            '@args' => print_r($error_operation[0], TRUE),
          ]
        )
      );
    }
  }
}

In the processMyNode()method we are going to process each element of our batch. As you can see, in this method we just simulate a long operation with the usleep() PHP function. Here we could load each node or connect with an external API. We also grab some information for post-processing.

In the processMyNodeFinished() method, we display relevant information to the user and we can even save the unprocessed operations for a later process.

2. Create the custom Drush 9 command to launch the batch

This is the most important part of our module. With this Drush command, we'll retrieve the data and fire the batch processing on those data.

Drush commands are now based on classes and the Annotated Command format. This will change the fundamental structure of custom Drush commands. This is great because we can now inject services in our command class and take advantage of all the OO power of Drupal 8.

A Drush command is composed of three files:

drush.services.yml - This is the file where our Drush command definition goes into. This is a Symfony service definition. Do not use your module's regular services.yml as you may have done in Drush 8 or else you will confuse the legacy Drush, which will lead to a PHP error.

You'll can see that in our example we inject two core services in our command class: entity_type.manager to access the nodes to process and logger.factory to log some pre-process and post-process information.

services:
  ex_batch_drush9.commands:
    class: \Drupal\ex_batch_drush9\Commands\ExBatchDrush9Commands
    tags:
      - { name: drush.command }
    arguments: ['@entity_type.manager', '@logger.factory']

composer.json - This is where we declare the location of the Drush command file for each version of Drush by adding the extra.drush.services section to the composer.json file of the implementing module. This is now optional, but will be required for Drush 10.

{
    "name": "org/ex_batch_drush9",
    "description": "This extension provides new commands for Drush.",
    "type": "drupal-drush",
    "authors": [
        {
            "name": "Author name",
            "email": "author@example.com"
        }
    ],
    "require": {
        "php": ">=5.6.0"
    },
    "extra": {
        "drush": {
            "services": {
                "drush.services.yml": "^9"
            }
        }
    }
}

MyModuleCommands.php - (src/Commands/ExBatchDrush9Commands.php in our case) It's in this class that we are going to define the custom Drush commands of our module. This class uses the Annotated method for commands, which means that each command is now a separate function with annotations that define its name, alias, arguments, etc. This class can also be used to define hooks with @hook annotation.

Some of the annotations available for use are:

@command: This annotation is used to define the Drush command. Make sure that you follow Symfony’s module:command structure for all your commands.
@aliases: An alias for your command.
@param: Defines the input parameters. For example, @param: integer $number
@option: Defines the options available for the commands. This should be an associative array where the name of the option is the key and the value could be - false, true, string, InputOption::VALUE_REQUIRED, InputOption::VALUE_OPTIONAL or an empty array.
@default: Defines the default value for options.
@usage: Demonstrates how the command should be used. For example, @usage: mymodule:command --option
@hook: Defines a hook to be fired. The default format is @hook type target, where type determines when the hook is called and target determines where the hook is called.

For a complete list of all the hooks available and their usage, refer to: https://github.com/consolidation/annotated-command

Here is the code for our Drush command.

<?php
namespace Drupal\ex_batch_drush9\Commands;
use Drupal\Core\Entity\EntityTypeManagerInterface;
use Drupal\Core\Logger\LoggerChannelFactoryInterface;
use Drush\Commands\DrushCommands;
/**
 * A Drush commandfile.
 *
 * In addition to this file, you need a drush.services.yml
 * in root of your module, and a composer.json file that provides the name
 * of the services file to use.
 *
 * See these files for an example of injecting Drupal services:
 *   - http://cgit.drupalcode.org/devel/tree/src/Commands/DevelCommands.php
 *   - http://cgit.drupalcode.org/devel/tree/drush.services.yml
 */
class ExBatchDrush9Commands extends DrushCommands {
  /**
   * Entity type service.
   *
   * @var \Drupal\Core\Entity\EntityTypeManagerInterface
   */
  private $entityTypeManager;
  /**
   * Logger service.
   *
   * @var \Drupal\Core\Logger\LoggerChannelFactoryInterface
   */
  private $loggerChannelFactory;
  /**
   * Constructs a new UpdateVideosStatsController object.
   *
   * @param \Drupal\Core\Entity\EntityTypeManagerInterface $entityTypeManager
   *   Entity type service.
   * @param \Drupal\Core\Logger\LoggerChannelFactoryInterface $loggerChannelFactory
   *   Logger service.
   */
  public function __construct(EntityTypeManagerInterface $entityTypeManager, LoggerChannelFactoryInterface $loggerChannelFactory) {
    $this->entityTypeManager = $entityTypeManager;
    $this->loggerChannelFactory = $loggerChannelFactory;
  }
  /**
   * Update Node.
   *
   * @param string $type
   *   Type of node to update
   *   Argument provided to the drush command.
   *
   * @command update:node
   * @aliases update-node
   *
   * @usage update:node foo
   *   foo is the type of node to update
   */
  public function updateNode($type = '') {
    // 1. Log the start of the script.
    $this->loggerChannelFactory->get('ex_batch_drush9')->info('Update nodes batch operations start');
    // Check the type of node given as argument, if not, set article as default.
    if (strlen($type) == 0) {
      $type = 'article';
    }
    // 2. Retrieve all nodes of this type.
    try {
      $storage = $this->entityTypeManager->getStorage('node');
      $query = $storage->getQuery()
        ->condition('type', $type)
        ->condition('status', '1');
      $nids = $query->execute();
    }
    catch (\Exception $e) {
      $this->output()->writeln($e);
      $this->loggerChannelFactory->get('ex_batch_drush9')->warning('Error found @e', ['@e' => $e]);
    }
    // 3. Create the operations array for the batch.
    $operations = [];
    $numOperations = 0;
    $batchId = 1;
    if (!empty($nids)) {
      foreach ($nids as $nid) {
        // Prepare the operation. Here we could do other operations on nodes.
        $this->output()->writeln("Preparing batch: " . $batchId);
        $operations[] = [
          '\Drupal\ex_batch_drush9\BatchService::processMyNode',
          [
            $batchId,
            t('Updating node @nid', ['@nid' => $nid]),
          ],
        ];
        $batchId++;
        $numOperations++;
      }
    }
    else {
      $this->logger()->warning('No nodes of this type @type', ['@type' => $type]);
    }
    // 4. Create the batch.
    $batch = [
      'title' => t('Updating @num node(s)', ['@num' => $numOperations]),
      'operations' => $operations,
      'finished' => '\Drupal\ex_batch_drush9\BatchService::processMyNodeFinished',
    ];
    // 5. Add batch operations as new batch sets.
    batch_set($batch);
    // 6. Process the batch sets.
    drush_backend_batch_process();
    // 6. Show some information.
    $this->logger()->notice("Batch operations end.");
    // 7. Log some information.
    $this->loggerChannelFactory->get('ex_batch_drush9')->info('Update batch operations end.');
  }
}

In this class, we first inject our two core services in the __construct() method: entity_type.manager and logger.factory.

Next, in the updateNode() annotated method we define our command with three annotations:

@param string $type - Defines the input parameter, the content type in our case.
@command update:node - Defines the name of the Drush command. In this case it's "update:node" so we would launch the command with: drush update:node
@aliases update-node - Defines an alias for the command

The main part of this command is the creation of the operations array for our batch processing (See points 3,4,5 and 6). Nothing strange here, we just define our operations array (3 and 4) pointing to the two callback functions that are located in our BatchService.php class.

Once the batch operations are added as new batch sets (5), we process the batch sets with the function drush_backend_batch_process(). This is a drop in replacement for the existing batch_process() function of Drupal. It will process a Drupal batch by spawning multiple Drush processes.

Finally, we show information to the user and log some information for a later use.

That's it! We can now test our brand new Drush 9 custom command!

To do so, just clear the cache with drush cr and launch the command with drush update:node

3. Bonus: Run the Drush command from crontab

As noted before, we want to run this custom command from crontab to perform the update automatically at the scheduled times.

The steps taken may vary based on your server's operating system. With Mac, Linux and Unix servers, we manage scheduled tasks by creating and editing crontabs that execute cron jobs at specified intervals.

1. Open a terminal window on your computer and enter the following command to edit your crontab - this will invoke your default editor (usually a flavor of vi):

crontab -e

2. Enter the following scheduled task with our Drush command (where [docroot_path] is your server's docroot):

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
*/5 * * * * drush --root=[docroot_path] update:node

We need to add an environment path in our cron table first line (cron commands run one after anonther).
This command will run our custom Drush command every five minutes.

3. Restart the cron service with the following command:

systemctl restart cron

4. Check if the command is running
We can check the log of our Drupal application every five minutes since we logged some information with the command itself, but we can also use another way with the following command:

sudo tail -f /var/mail/root

 

Recap.

- We created a class to host our two main callback methods for batch processing (BatchService.php).
- We created a custom Drush 9 command to retrieve nodes, to create and process the batch sets, injecting core services, like entity_type.manager to access the nodes to process and logger.factory to log some pre-process and post-process information.
- We created a crontab task to run automatically the Drush command at scheduled times.

This way, we can now run heavy processes, on a regular basis without putting undue stress on the server.