Import processes
- There are several ways how to put data into Magento:
-
-
Already existing flat-file import of Magento
-
Using a WebAPI, which also supports bulk and asynchronous operations
-
Why do we need the Pacemaker Import Framework?
There are pros and cons to the existing solutions and pros and cons of using the Pacemaker Import Framework.
Magento’s flat file import
- Magento provides an implementation for the import of flat files:
-
-
The file must get uploaded via the Admin UI
-
The synchronous process validates and persists the data from the uploaded CSV file
-
Compared to a WebAPI, this approach can quickly process immense amounts of data. Still, for this to happen, the web server must get configured not to abort this long-running request and accept essential data as a POST method.
-
The import will then get executed immediately
-
Depending on a server load, this can lead to performance problems, which of course, depends on your infrastructure
-
-
The import will cause a re-indexing of the data if your indexer mode is
on schedule
, which will also load the server resources
Magento’s WebAPI
A catalog with 20k products would result in a processing time between 5 and 8 hours, whereas you may find stores with several million products. |
The Pacemaker Import Framework import approach
The basic idea behind the Pacemaker Import Framework is to decouple the third-party system and user interactions with resource-intensive processes on the Magento side.
-
To handle a significant amount of data, we use the Pacemaker Import Framework, a robust Framework
- However, the import process is not just about getting the data into the database. The pipelines are also capable of:
-
-
Data retrieve, transform and persist
-
And to perform post-import processes
-
Common Process Design
- Import processes are mainly designed as following:
-
-
Trigger import process chain
e.g. periodically (daily), via web service notification, by observing the file system, etc. -
Fetching data from source
e.g. reading files, calling WebAPIs, etc. -
Transforming data to target format
e.g. executing own scripts, use external libraries, etc. -
Execute the import
e.g. running M2IF library, etc. -
Run indexers and invalidate caches
e.g. using Magento’s APIs
-
By default, the Pacemaker import pipelines provide an observer for the local filesystem, which triggers the pipeline initialization.
The transformation step is for customization, since it depends on your data source whether the files need to be transformed or not.
By default, Pacemaker Import Framework library is running the import, and there are executors for the Magento indexers and cache invalidation.
Import files observer
We use the techdivision/pacemaker-pipeline-initializer
package to trigger the import pipelines, once the required files are present in the file system.
What is a file bunch?
Since Pacemaker Import Framework is using Pacemaker Import Framework it is possible to split all import files into multiple files.
And because Pacemaker Import Framework is running attribute-set, attribute, category, and product import in one pipeline, a bunch could grow to a large number of files.
All these files need the same identifier in the file name. This identifier is defined in the File Name Pattern
configuration within this part of the regular expression (?P<identifier>[0-9a-z\-]*)
.
According to the default expression, the filenames need to be in the following pattern:
<IMPORT_TYPE>-import_<BUNCH_IDENTIFIER>_<COUNTER>.<SUFFIX>
.
There are example files provided in Pacemaker Import Framework packages, please refer to Run your first predefined import jobs.
Of course, you can change the expression if necessary, just take care to define an identifier
within the pattern.
Examples
The following files would result in one import pipeline because the identifier
is the same
for all files.
Also, only the steps attribute and product import would be executed. Attribute-set and category import would be skipped because there are no files given.
- attribute-import_20190627_01.csv
- attribute-import_20190627.ok
- product-import_20190627_01.csv
- product-import_20190627_02.csv
- product-import_20190627_03.csv
- product-import_20190627.ok
The following files would result in two import pipelines, while the first bunch import all entities, and the second bunch imports only product data.
- attribute-set-import_20190627-1_01.csv
- attribute-set-import_20190627-1.ok
- attribute-import_20190627-1_01.csv
- attribute-import_20190627-1.ok
- category-import_20190627-1_01.csv
- category-import_20190627-1.ok
- product-import_20190627-1_01.csv
- product-import_20190627-1_02.csv
- product-import_20190627-1_03.csv
- product-import_20190627-1.ok
- product-import_20190627-2_01.csv
- product-import_20190627-2_02.csv
- product-import_20190627-2_03.csv
- product-import_20190627-2.ok