Bundle org.nuxeo.importer.stream
This module defines a producer/consumer pattern and uses the Log features provided by Nuxeo Stream.
Producer/Consumer pattern with automation operations
The Log is used to perform mass import.
It decouples the Extraction/Transformation from the Load (using the ETL terminology).
The extraction and transformation is done by a document message producer with custom logic.
This module comes with a random document and a random blob generator, that does the same job as the random importer of the
module.The load into Nuxeo is done with a generic consumer.
Automation operations are exposed to run producers and consumers.
Two steps import: Generate and Import documents with blobs
- Run a random producers of document messages, these message represent Folder and File document a blob. The total number of document created is:
nbThreads * nbDocuments
curl -X POST 'http://localhost:8080/nuxeo/site/automation/StreamImporter.runRandomDocumentProducers' -u Administrator:Administrator -H 'content-type: application/json' \ -d '{"params":{"nbDocuments": 100, "nbThreads": 5}}'
Params Default Description nbDocuments
The number of documents to generate per producer thread nbThreads
The number of concurrent producer to run avgBlobSizeKB
The average blob size fo each file documents in KB. If set to 0
create File document without blob.lang
The locale used for the generated content, can be fr_FR
The name of the Log. logSize
The number of partitions in the Log which will fix the maximum number of consumer threads logBlobInfo
A Log name containing blob information to use, see section below for use case - Run consumers of document messages creating Nuxeo documents, the concurrency will match the previous nbThreads producers parameters
curl -X POST 'http://localhost:8080/nuxeo/site/automation/StreamImporter.runDocumentConsumers' -u Administrator:Administrator -H 'content-type: application/json' \ -d '{"params":{"rootFolder": "/default-domain/workspaces"}}'
Params Default Description rootFolder
The path of the Nuxeo container to import documents, this document must exists repositoryName
The repository name used to import documents nbThreads
The number of concurrent consumer, should not be greater than the number of partition in the Log batchSize
The consumer commit documents every batch size batchThresholdS
The consumer commit documents if the transaction is longer that this threshold retryMax
Number of time a consumer retry to import in case of failure retryDelayS
Delay between retries logName
The name of the Log to tail useBulkMode
Process asynchronous listeners in bulk mode blockIndexing
Do not index created document with Elasticsearch blockAsyncListeners
Do not process any asynchronous listeners blockPostCommitListeners
Do not process any post commit listeners blockDefaultSyncListeners
Disable some default synchronous listeners: dublincore, mimetype, notification, template, binarymetadata and uid 4 steps import: Generate and Import blobs, then Generate and Import documents
- Run producers of random blob messages
curl -X POST 'http://localhost:8080/nuxeo/site/automation/StreamImporter.runRandomBlobProducers' -u Administrator:Administrator -H 'content-type: application/json' \ -d '{"params":{"nbBlobs": 100, "nbThreads": 5}}'
Params Default Description nbBlobs
The number of blobs to generate per producer thread nbThreads
The number of concurrent producer to run avgBlobSizeKB
The average blob size fo each file documents in KB lang
The locale used for the generated content, can be "fr_FR" or "en_US" logName
The name of the Log to append blobs. logSize
The number of partitions in the Log which will fix the maximum number of consumer threads - Run consumers of blob messages importing into the Nuxeo binary store, saving blob information into a new Log.
curl -X POST 'http://localhost:8080/nuxeo/site/automation/StreamImporter.runBlobConsumers' -u Administrator:Administrator -H 'content-type: application/json' \ -d '{"params":{"blobProviderName": "default", "logBlobInfo": "blob-info"}}'
Params Default Description blobProviderName
The name of the binary store blob provider logName
The name of the Log that contains the blob logBlobInfo
The name of the Log to append blob information about imported blobs nbThreads
The number of concurrent consumer, should not be greater than the number of partitions in the Log retryMax
Number of time a consumer retry to import in case of failure retryDelayS
Delay between retries - Run producers of random Nuxeo document messages which use produced blobs created in step 2
curl -X POST 'http://localhost:8080/nuxeo/site/automation/StreamImporter.runRandomDocumentProducers' -u Administrator:Administrator -H 'content-type: application/json' \ -d '{"params":{"nbDocuments": 200, "nbThreads": 5, "logBlobInfo": "blob-info"}}'
Same params listed in the previous previous runRandomDocumentProducers call, here we set the
parameter.- Run consumers of document messages
curl -X POST 'http://localhost:8080/nuxeo/site/automation/StreamImporter.runDocumentConsumers' -u Administrator:Administrator -H 'content-type: application/json' \ -d '{"params":{"rootFolder": "/default-domain/workspaces"}}'
Same params listed in the previous previous runDocumentConsumers call.
Create blobs using existing files
Create a file containing the list of files to import then:
- Generate blob messages corresponding to the files, dispatch the messages into 4 partitions:
curl -X POST 'http://localhost:8080/nuxeo/site/automation/StreamImporter.runFileBlobProducers' -u Administrator:Administrator -H 'content-type: application/json' \ -d '{"params":{"listFile": "/tmp/my-file-list.txt", "logSize": 4}}'
Params Default Description listFile
The path to the listing file basePath
'' The base path to use as prefix of each file listed in the listFile
0 The number of blobs to generate per producer thread, 0 means all entries, loop on listFile
entries if necessarynbThreads
The number of concurrent producer to run logName
The name of the Log to append blobs. logSize
The number of partitions in the Log which will fix the maximum number of consumer threads The you can use the 3 others steps describes the above section to import blobs with 4 threads and create documents.
Note that the type of document will be adapted to the detected mime type of the file so that
- image file will generate a
document - video file will generate a
document - other type will be translated to
Generate random file for testing purpose
For testing purpose it can be handy to generate different file from an existing one, the goal is to generate lots of unique files with a limited set of files.
To do this you need to first generates blob messages pointing to file (see previous section) and choose the
corresponding to the expected number of blob to import, (use a greater number that the existing files).The next step is to add some special option to blob consumer so that instead of importing the existing file, a watermark will be added to the blob before importing it.
- Run consumers of blob messages adding watermark to file and importing into the Nuxeo binary store, saving blob information into a new Log.
curl -X POST 'http://localhost:8080/nuxeo/site/automation/StreamImporter.runBlobConsumers' -u Administrator:Administrator -H 'content-type: application/json' \ -d '{"params":{"watermark": "foo"}}'
The additional parameters are:
Params Default Description watermark
Ask to add a watermark to the file before importing it, use the provided string if possible. persistBlobPath
Use a path if you want to keep the generated files on disk blobProviderName
If blank there is no Nuxeo blob import, this can be useful for import with Gatling/Redis Continue with other steps described above to generate and create documents.
Note that only few mime type are supported for watermark so far:
: Insert a uniq tag at the beginning of text.image/jpeg
: Set the exif software tag to a uniq tag.video/mp4
: Set the title with the uniq tag.
Import document using REST API via Gatling/Redis
Instead of doing mass import creating document by batch with the efficient internal API, you can save them into Redis in a way it can be used by Gatling simulation, this way we can stress the REST API.
To do this instead of the document creationg step 4 we do:
- Run Redis consumers of document messages
curl -X POST 'http://localhost:8080/nuxeo/site/automation/StreamImporter.runRedisDocumentConsumers' -u Administrator:Administrator -H 'content-type: application/json' \ -d '{"params":{"rootFolder": "/default-domain/workspaces"}}'
Note that the Nuxeo must be configured with Redis (
).After this you need to use simulations in
:# init the infra, creating a group of test users and a workspace mvn -nsu gatling:test -Dgatling.simulationClass=org.nuxeo.cap.bench.Sim00Setup -Pbench -DredisDb=0 -Durl=http://localhost:8080/nuxeo # import the folder structure mvn -nsu gatling:test -Dgatling.simulationClass=org.nuxeo.cap.bench.Sim10CreateFolders -Pbench -DredisDb=0 -Durl=http://localhost:8080/nuxeo # import the documents using 8 concurrent users mvn -nsu gatling:test -Dgatling.simulationClass=org.nuxeo.cap.bench.Sim20CreateDocuments -Pbench -DredisDb=0 -Dusers=8 -Durl=http://localhost:8080/nuxeo
The node running the Gatling simulation must have access to the files to import.
Here is an overview of possible usage to generate mass import and load tests with the stream importer:
Visit nuxe-jsf-ui-gatling for more information.
To build and run the tests, simply start the Maven build:
mvn clean install
About Nuxeo
Nuxeo dramatically improves how content-based applications are built, managed and deployed, making customers more agile, innovative and successful. Nuxeo provides a next generation, enterprise ready platform for building traditional and cutting-edge content oriented applications. Combining a powerful application development environment with SaaS-based tools and a modular architecture, the Nuxeo Platform and Products provide clear business value to some of the most recognizable brands including Verizon, Electronic Arts, Sharp, FICO, the U.S. Navy, and Boeing. Nuxeo is headquartered in New York and Paris. More information is available at www.nuxeo.com.
- Run a random producers of document messages, these message represent Folder and File document a blob. The total number of document created is:
Parent Documentation: README.md
Nuxeo Platform Importer
About Nuxeo Platform Importer
The file importer comes as a Java library (with nuxeo runtime service) and a sample JAX-RS interface to launch, monitor and abort import jobs. This project is an on-going project, supported by Nuxeo
How to Build Nuxeo Platform Importer
Build the Nuxeo Platform Importer with Maven:
$ mvn install -Dmaven.test.skip=true
Nuxeo Platform Importer is available as two package add-ons [from the Nuxeo Marketplace] https://connect.nuxeo.com/nuxeo/site/marketplace/package/nuxeo-platform-importer https://connect.nuxeo.com/nuxeo/site/marketplace/package/nuxeo-scan-importer
The documentation for Nuxeo Platform Importer is available in our Documentation Center: http://doc.nuxeo.com/x/gYBVAQ
Reporting Issues
You can follow the developments in the Nuxeo Platform project of our JIRA bug tracker, which includes a Nuxeo Platform Importer component: https://jira.nuxeo.com/browse/NXP/component/10621
You can report issues on: http://answers.nuxeo.com/
About Nuxeo
Nuxeo dramatically improves how content-based applications are built, managed and deployed, making customers more agile, innovative and successful. Nuxeo provides a next generation, enterprise ready platform for building traditional and cutting-edge content oriented applications. Combining a powerful application development environment with SaaS-based tools and a modular architecture, the Nuxeo Platform and Products provide clear business value to some of the most recognizable brands including Verizon, Electronic Arts, Sharp, FICO, the U.S. Navy, and Boeing. Nuxeo is headquartered in New York and Paris. More information is available at www.nuxeo.com.
Resolution Order
You can influence this order by adding "require" tags in the component declaration, to make sure it is resolved after another component. It will also impact the order in which contributions are registered on their target extension point (see "Registration Order" on contributions).
Maven Artifact
File | nuxeo-importer-stream-2021.54.6.jar |
Group Id | org.nuxeo.ecm.platform |
Artifact Id | nuxeo-importer-stream |
Version | 2021.54.6 |
Manifest-Version: 1.0
Archiver-Version: Plexus Archiver
Created-By: Apache Maven
Built-By: root
Build-Jdk: 11.0.23
Bundle-ManifestVersion: 1
Bundle-Version: 2021.54.6-t20240514-134315
Bundle-SymbolicName: org.nuxeo.importer.stream;singleton:=true
Bundle-Name: Nuxeo Importer Stream
Bundle-Vendor: Nuxeo
Nuxeo-Component: OSGI-INF/operations-contrib.xml
- Json Export Default Json serialization
- Json Graph Json dependency graph
- Json Contribution Stats Json statistics for contributions
- CSV Contribution Stats CSV statistics for contributions
Raw Data: Json Contribution Stats