A newer version of this documentation is available: View the Latest Version

Import processes

There are several ways how to put data into Magento:
  • Already existing flat-file import of Magento

  • Using a WebAPI, which also supports bulk and asynchronous operations

Why do we need the Pacemaker Import Framework?

There are pros and cons to the existing solutions and pros and cons of using the Pacemaker Import Framework.

Magento’s flat file import

Magento provides an implementation for the import of flat files:
  • The file must get uploaded via the Admin UI

  • The synchronous process validates and persists the data from the uploaded CSV file

Compared to a WebAPI, this approach can quickly process immense amounts of data. Still, for this to happen, the web server must get configured not to abort this long-running request and accept essential data as a POST method.

  • The import will then get executed immediately

    • Depending on a server load, this can lead to performance problems, which of course, depends on your infrastructure

  • The import will cause a re-indexing of the data if your indexer mode is on schedule, which will also load the server resources

Magento’s WebAPI

  • The WebAPI is not designed to handle extensive data

  • The bulk and asynchronous WebAPI can handle large amounts of data, but it is not particularly powerful

  • Updating a single product can take 1-3 seconds, depending on the infrastructure

A catalog with 20k products would result in a processing time between 5 and 8 hours, whereas you may find stores with several million products.

The Pacemaker Import Framework import approach

The basic idea behind the Pacemaker Import Framework is to decouple the third-party system and user interactions with resource-intensive processes on the Magento side.

However, the import process is not just about getting the data into the database. The pipelines are also capable of:
  • Data retrieve, transform and persist

  • And to perform post-import processes

Common Process Design

Import processes are mainly designed as following:
  1. Trigger import process chain
    e.g. periodically (daily), via web service notification, by observing the file system, etc.

  2. Fetching data from source
    e.g. reading files, calling WebAPIs, etc.

  3. Transforming data to target format
    e.g. executing own scripts, use external libraries, etc.

  4. Execute the import
    e.g. running M2IF library, etc.

  5. Run indexers and invalidate caches
    e.g. using Magento’s APIs

By default, the Pacemaker import pipelines provide an observer for the local filesystem, which triggers the pipeline initialization.

The transformation step is for customization, since it depends on your data source whether the files need to be transformed or not.

By default, Pacemaker Import Framework library is running the import, and there are executors for the Magento indexers and cache invalidation.

Import files observer

We use the techdivision/pacemaker-pipeline-initializer package to trigger the import pipelines, once the required files are present in the file system.

What is a file bunch?

Since Pacemaker Import Framework is using Pacemaker Import Framework it is possible to split all import files into multiple files.

And because Pacemaker Import Framework is running attribute-set, attribute, category, and product import in one pipeline, a bunch could grow to a large number of files.

All these files need the same identifier in the file name. This identifier is defined in the File Name Pattern configuration within this part of the regular expression (?P<identifier>[0-9a-z\-]*).

According to the default expression, the filenames need to be in the following pattern:

<IMPORT_TYPE>-import_<BUNCH_IDENTIFIER>_<COUNTER>.<SUFFIX>.

There are example files provided in Pacemaker Import Framework packages, please refer to Run your first predefined import jobs.

Of course, you can change the expression if necessary, just take care to define an identifier within the pattern.

Examples

The following files would result in one import pipeline because the identifier is the same for all files.

Also, only the steps attribute and product import would be executed. Attribute-set and category import would be skipped because there are no files given.

- attribute-import_20190627_01.csv
- attribute-import_20190627.ok
- product-import_20190627_01.csv
- product-import_20190627_02.csv
- product-import_20190627_03.csv
- product-import_20190627.ok

The following files would result in two import pipelines, while the first bunch import all entities, and the second bunch imports only product data.

- attribute-set-import_20190627-1_01.csv
- attribute-set-import_20190627-1.ok
- attribute-import_20190627-1_01.csv
- attribute-import_20190627-1.ok
- category-import_20190627-1_01.csv
- category-import_20190627-1.ok
- product-import_20190627-1_01.csv
- product-import_20190627-1_02.csv
- product-import_20190627-1_03.csv
- product-import_20190627-1.ok
- product-import_20190627-2_01.csv
- product-import_20190627-2_02.csv
- product-import_20190627-2_03.csv
- product-import_20190627-2.ok