Edge Platform Monitoring
The health monitoring interface allows to track the health status of all the Gateway Agent components by querying them to provide the diagnostic information.
The health monitoring mechanism is implemented as a shared library (.so), which is loaded by all the system components, providing common health monitoring mechanism for the Gateway Agent platform.
The health monitoring mechanism is used by the Configuration Manager to detect the started Gateway Agent components, based on periodic reports received from them. For more information, see the Configuration Manager's documentation.
The health monitoring mechanism is based on the following assumptions:
Every Gateway Agent component sends its heartbeat messages periodically.
Every Gateway Agent component is obliged to respond to discovery and status requests.
Discovery requests and detailed status requests are broadcast to all the Gateway Agent components at once. All the components which are able to respond must do it as soon as possible.
For additional information on health monitoring, refer to the API documentation.
Supported Services
Service | Description |
---|---|
Discovery service | Every Gateway Agent component is required to respond to discovery broadcasts with a predefined JSON packet containing basic identification and status information. The response is a short report about the Gateway Agent component. |
Status service | This service enables obtaining detailed and specific information about a single Gateway Agent component (e.g. Protocol Adapter). The response is a full report about a Gateway Agent component. |
Heartbeat | A periodic message published by every Gateway Agent component. This message contains basic information about the component and its current status. |
Startup messaging | Every Gateway Agent component is required to publish a predefined JSON packet upon every startup or restart. The publication of a startup message marks the end of component's initialization sequence. |
Shutdown messaging | Every Gateway Agent component is required to publish a predefined JSON packet when the component shutdown is initiated (e.g. on receiving SIGTERM signal). |
Component Identification
Each instance of the Gateway Agent component is identified by a unique ID. The instance ID is a 8-character hex string calculated as CRC32 of the absolute path to the component instance configuration file.
In order to discover all Gateway Agent components and their instances running on a platform a broadcast Discovery Request message is used. Each component must respond and send back a Discovery Response message containing a unique ID of the component instance.
Here is an example of a discovery request:
mosquitto_pub -h 127.0.0.1 -p 1883 -t "v1/services" -m "short?"
It results in the following discovery response:
v1/services/F26FCC04 {
"id": "F26FCC04",
"name": "gwa-analytics",
"pid": "9431",
"report-type": "SHORT",
"project-config": null,
"project-config-schema": null,
"project-config-reload": null,
"version": "2.1.0",
"build-tag": "b10",
"build-time": "Fri Apr 22 09:41:31 CEST 2018",
"revision": "79df48306c70bff1c8cda4db66c33f604ec0ff9c",
"state": "RUNNING"
}
v1/services/E92C08EA {
"id": "E92C08EA",
"name": "gwa-storage-service",
"pid": "9147",
"report-type": "SHORT",
"project-config": null,
"project-config-schema": null,
"project-config-reload": null,
"version": "2.1.0",
"build-tag": "custom",
"build-time": "Fri Apr 22 09:41:34 CEST 2018",
"revision": "c7d68d82b605e4b90ab294c53aa3d0e6da18f6d8",
"state": "RUNNING"
}
As illustrated above, a discovery message posted to the v1/services
topic results in multiple responses from multiple components and their instances. Every response must be published to the v1/services/<custom ID>
topic.
Instance ID is implemented as CRC32 checksum of the absolute path to the configuration file of the instance. So in case of configuration file modifications, the CRC32 checksum is still unchanged and the instance is still uniquely identified.
Message Flow
All status and health reports sent by the Gateway Agent components are published on the v1/services/<component ID>
topic. The type of the given report is defined in the message body.
The following message types are supported:
Message | Description |
---|---|
STARTUP | Sent when a Gateway Agent component is fully initialized and ready to work. |
HEARTBEAT | Sent periodically at a specified time interval. The default heartbeat_interval is 60 seconds, but it is configurable in the configuration file of each Gateway Agent component. |
SHUTDOWN | Sent when a Gateway Agent component starts the shutdown sequence. |
SHORT | Sent on demand, used to discover the topology of the Gateway Agent system. |
FULL | Sent on demand, used to gather full status data of the Gateway Agent system. |
See here for examples of the message types listed above.
The discovery and full status requests are both sent to v1/services
topic, with the following request bodies:
- For short discovery:
"short?"
- For full status:
"full?"
Even though both requests are broadcast to all Gateway Agent components, it is not recommended to use full status report for discovery purposes. The reason is that a full status report may contain data which is time consuming to retrieve. Therefore, the response time may be too long for it to be a reliable discovery mechanism.
All report messages are encoded as JSON objects containing a subset of the following fields:
Configuration Setting | Description |
---|---|
id | Unique identifier of a Gateway Agent component instance (CRC32 of the absolute path to the configuration file). |
name | Name of the Gateway Agent component. |
pid | PID of the running process. |
report-type | Type of the report message (STARTUP, HEARTBEAT, SHUTDOWN, SHORT, FULL ). |
project-config | Path to the project configuration file of the instance; null means the component does not define project level configuration. |
project-config-schema | Path to the project configuration schema file of the component; null means that the component does not define project level configuration. |
project-config-reload | Optional command for triggering the configuration reload; null means the configuration is applied dynamically by the component. |
version | Gateway Agent release version string. |
build-tag | Gateway Agent component build identifier. |
build-time | Gateway Agent component build time. |
revision | Source code revision. |
state | State of the Gateway Agent component (RUNNING, TERMINATING, ZOMBIE ). |
signal | Optional field set to signum if the Gateway Agent component is handling a signal and set to 0 if no signals are handled. |
uptime | Number of milliseconds elapsed from the Gateway Agent component startup. |
diagnostics | Custom diagnostics data encoded as JSON object. The content is specific to the given Gateway Agent component, it may contain both simple and complex JSON values. |
Message Examples
This section provides examples of messages for all available report types.
Startup Message
A startup message is published when the component initialization is finished:
v1/services/F26FCC04 {
"id": "F26FCC04",
"name": "gwa-analytics",
"pid": "9431",
"report-type": "STARTUP",
"project-config": null,
"project-config-schema": null,
"project-config-reload": null,
"version": "2.1.0",
"build-tag": "custom",
"build-time": "Fri Apr 22 09:41:31 CEST 2018",
"revision": "79df48306c70bff1c8cda4db66c33f604ec0ff9c",
"state": "RUNNING"
}
Heartbeat Message
A heartbeat message is published periodically for each Gateway Agent component:
v1/services/F26FCC04 {
"id": "F26FCC04",
"name": "gwa-analytics",
"pid": "9431",
"report-type": "HEARTBEAT",
"state": "RUNNING"
}
Shutdown Message
A shutdown message is published when the component initiates the shutdown sequence:
v1/services/F26FCC04 {
"id": "F26FCC04",
"name": "gwa-analytics",
"pid": "9431",
"report-type": "SHUTDOWN",
"state": "TERMINATING"
}
Discovery Report Message
A short discovery report is published on demand:
v1/services/F26FCC04 {
"id": "F26FCC04",
"name": "gwa-analytics",
"pid": "24112",
"report-type": "SHORT",
"project-config": null,
"project-config-schema": null,
"project-config-reload": null,
"version": "2.1.0",
"build-tag": "custom",
"build-time": "Fri Apr 22 09:41:31 CEST 2018",
"revision": "79df48306c70bff1c8cda4db66c33f604ec0ff9c",
"state": "RUNNING"
}
The discovery response and startup messages are basically the same, the only difference is the
report-type
field.
Full Status Report Message
A full status report is published on demand:
v1/services/F26FCC04 {
"id": "F26FCC04",
"name": "gwa-analytics",
"pid": "24112",
"report-type": "FULL",
"project-config": null,
"project-config-schema": null,
"project-config-reload": null,
"version": "2.1.0",
"build-tag": "custom",
"build-time": "Fri Apr 22 09:41:31 CEST 2018",
"revision": "79df48306c70bff1c8cda4db66c33f604ec0ff9c",
"state": "RUNNING",
"uptime": 40354,
"diagnostics": {
"custom-key-1": "custom-value-1",
"custom-key-2": "custom-value-2"
}
}