Sensor Data & IOT Technologies

Machine Learning Artificial Intelligence Digital Transformation Stream Data Processing Probabilistic Generative Model Support Vector Machine Sparse Modeling Relational Data Learning  Time Series Data Analysis Navigation of this blog

Sensor Data & IOT Technologies

The use of sensor information is a central element of IOT technology. There are various types of sensor data, but here we will focus on one-dimensional, time-varying information.

There are two types of IOT approaches: one is to set up individual sensors for specific measurement targets and analyze their characteristics in detail, and the other is to set up multiple sensors for multiple targets and select specific data from the obtained data, as described in “Application of Sparse Model to Anomaly Detection.  “and select specific data from the obtained data to make decisions such as detecting anomalies for specific targets.

As shown in the figure below, the basic steps of the former approach are as follows: first, there is the target natural phenomenon (analog information), the sensor device observes it and acquires information, and then, as described in “Extraction of features from one-dimensional data (audio),” A/D conversion and setting a window that matches the characteristics of the target natural phenomenon are performed. and then the noise is removed by statistical methods as described in “Statistical Analysis of Time Series Data. These data are then judged for similarity to the standard data generated from the teacher data as described in “Application of Hidden Markov Models to One-Dimensional Data” and “Application of Dynamic Programming to One-Dimensional Data“.

In addition, as described in “Machine Learning and System Architecture for Data Streams (Time Series Data),” the determined data can be combined with ontology data to perform real-time inference, or probabilistic inference (Markov Logic Network) can be performed as described in “Similarity in Global Matching (5) Probabilistic Approach.

These approaches can be combined with plant, failure analysis, and enterprise knowledge representation (ontology) as described in “Plant Engineering Ontology ISO15926“, “Failure Risk Analysis and Ontology (FEMA, HAZID)“, and “Application of Ontology to Enterprise Data“. ontologies) can be build Real-time sensor applications for Industry 4.0, smart cities, smart buildings, etc.

The physical placement of sensors can also be optimized using discrete data optimization methods such as those described in “Submodular Optimization for Sensor Placement“.

In this blog, we will discuss the following topics.

Implementation

  • Preprocessing for IoT

Pre-processing for processing Internet of Things (IoT) data is an important step in shaping the data collected from devices and sensors into a form that can be analyzed and used to feed machine learning models and applications. Below we discuss various methods related to IoT data preprocessing.

  • Random Forest Ranking Overview, Algorithm and Implementation Examples

Random Forest is a very popular ensemble learning method in the field of machine learning (a method that combines multiple machine learning models to obtain better performance than individual models). This approach combines multiple Decision Trees to build a more powerful model. There are many variations in ranking features using random forests.

  • Overview of Kalman Filter Smoother and Examples of Algorithms and Implementations

Kalman Filter Smoother, a type of Kalman filtering, is a technique used to improve state estimation of time series data. The method usually models the state of a dynamic system and combines it with observed data for more precise state estimation.

Federated Learning is a new approach to training machine learning models that addresses the challenges of privacy protection and efficient model training in distributed data environments. Unlike traditional centralized model training, Federated Learning trains models on the device or client itself and performs distributed learning without sending models to a central server. This section provides an overview of Federated Learning, its various algorithms, and examples of implementations.

Automatic machine learning (AutoML) refers to methods and tools for automating the process of designing, training, and optimizing machine learning models.AutoML is particularly useful for users with limited machine learning expertise or those seeking to develop efficient models, with the following main goals. This section provides an overview of this AutoML and examples of various implementations.

Similarity is a concept that describes the degree to which two or more objects or things have common features or properties and are considered similar to each other, and plays an important role in evaluating, classifying, and grouping objects in terms of comparison and relatedness. This section describes the concept of similarity and general calculation methods for various cases.

Search Algorithm (Search Algorithm) refers to a family of computational methods used to find a target within a problem space. These algorithms have a wide range of applications in a variety of domains, including information retrieval, combinatorial optimization, game play, route planning, and more. This section describes various algorithms, their applications, and specific implementations with respect to these search algorithms.

Multi-Objective Search Algorithm (Multi-Objective Optimization Algorithm) is an algorithm for optimizing multiple objective functions simultaneously. Multi-objective optimization aims to find a balanced solution (Pareto optimal solution set) among multiple optimal solutions rather than a single optimal solution, and such problems have been applied to many complex systems and decision-making problems in the real world. This section provides an overview of this multi-objective search algorithm and examples of algorithms and implementations.

Model Predictive Control (MPC) is a control theory technique that uses a model of the control target to predict future states and outputs, and an online optimization method to calculate optimal control inputs. MPC is used in a variety of industrial and control applications.

WoT (Web of Things) will be a standardized architecture and protocol for interconnecting various devices on the Internet and enabling communication and interaction between devices. The WoT is intended to extend the Internet of Things (IoT), simplify interactions with devices, and increase interoperability.

This article describes general implementation procedures, libraries, platforms, and concrete examples of WoT implementations in python and C.

A distributed Internet of Things (IOT) system refers to a system in which different devices and sensors communicate with each other, share information, and work together. In this article, we will provide an overview and implementation examples of inter-device communication technology in this distributed IOT system.

Self-Supervised Learning is a type of machine learning and can be considered as a type of supervised learning. While supervised learning uses labeled data to train models, self-supervised learning uses the data itself instead of labels to train models. This section describes various algorithms, applications, and implementations of self-supervised learning.

Sparse modeling is a technique that takes advantage of sparsity in the representation of signals and data. Sparsity refers to the property that non-zero elements in data or signals are limited to a very small portion. The purpose of sparse modeling is to efficiently represent data by utilizing sparsity, and to perform tasks such as noise removal, feature selection, and compression.

This section provides an overview of sparse modeling algorithms such as Lasso, compression estimation, Ridge regularization, elastic nets, Fused Lasso, group regularization, message propagation algorithms, dictionary learning, etc., as well as a description of the various algorithms used in image processing, natural language processing, recommendation, signal processing The paper describes the implementation of the algorithms in various applications such as image processing, natural language processing, recommendation, machine learning, signal processing, brain science, and so on.

The Bandit problem is a type of reinforcement learning problem in which a decision-making agent learns which action to choose in an unknown environment. The goal of this problem is to find a method for selecting the optimal action among multiple actions.

In this section, we provide an overview and implementation of the main algorithms for this bandit problem, including the ε-Greedy method, UCB algorithm, Thompson sampling, softmax selection, substitution rule method, and Exp3 algorithm, as well as examples of their application to online advertising distribution, drug discovery, and stock investment, The paper also describes application examples such as online advertisement distribution, drug discovery, stock investment, and clinical trial optimization, and their implementation procedures.

Submodular optimization is a type of combinatorial optimization that solves the problem of maximizing or minimizing a submodular function, a function with specific properties. This section describes various algorithms, their applications, and their implementations for submodular optimization.

Robust Principal Component Analysis (RPCA) is a method for finding a basis in data, and is characterized by its robustness to data containing outliers and noise. This paper describes various applications of RPCA and its concrete implementation using pyhton.

Online learning is a method of learning by sequentially updating a model in a situation where data arrives sequentially. Unlike batch learning in ordinary machine learning, this algorithm is characterized by the fact that the model is updated each time new data arrives. This section describes various algorithms and examples of applications of on-run learning, as well as examples of implementations in python.

Online Prediction (Online Prediction) is a technique that uses models to make predictions in real time under conditions where data arrive sequentially.” Online learning, as described in “Overview of Online Learning, Various Algorithms, Application Examples, and Specific Implementations,” is characterized by the fact that models are learned sequentially but the immediacy of model application is not clearly defined, whereas online prediction is characterized by the fact that predictions are made immediately upon the arrival of new data and the results are used. characteristic.

This section discusses various applications and specific implementation examples for this online forecasting.

Bayesian Structural Time Series Model (BSTS) is a type of statistical model that models phenomena that change over time and is used for forecasting and causal inference. This section provides an overview of BSTS and its various applications and implementations.

Rust is a programming language developed by Mozilla Research for systems programming, designed with an emphasis on high performance, memory safety, parallelism, and multi-threaded processing. It is also a language focused on bug prevention through strong static type checking at compile time.

This section provides an overview of Rust, its basic syntax, various applications, and concrete implementations.

Raspberry Pi is a Single Board Computer (SBC), a small computer developed by the Raspberry Pi Foundation in the UK. Its name comes from a dessert called “Raspberry Pi,” which is popular in the UK.

This section provides an overview of the Raspberry Pi and describes various applications and concrete implementation examples.

Typically, IOT devices are small devices with sensors and actuators, and use wireless communication to collect sensor data and control actuators. Various communication protocols and technologies are used for wireless IoT control. This section describes examples of IoT implementations using this wireless technology in various languages.

Technical Topics

Radio waves, the medium of wireless communications, are called “radio waves” or “Hertzian waves” in English, and are sometimes abbreviated as “radio.

Radio waves are based on Maxwell’s equations, a theory of electromagnetic fields that James Clerk Maxwell predicted in 1864 (the year of the Clam Gomon Incident in Japan) that “light is an electromagnetic wave in the form of a wave,” and on the theory of the electromagnetic field he discovered 13 years later in 1887 (the 20th year of Meiji in Japan). Heinrich Hertz deduced the existence of electromagnetic waves (radio waves) with a frequency lower than that of light from Maxwell’s equations and demonstrated their existence by devising and producing experimental equipment that enabled the generation and detection of electromagnetic waves.

RFID is an abbreviation for “Radio Frequency Identification,” a technology that uses wireless communications to read identification information on goods, animals, etc. This RFID system mainly consists of three elements: RFID tags, RFID readers, and a central database. RFID is used in various fields such as logistics, agriculture, medicine, and manufacturing. Furthermore, combining RFID technology with AI technology is expected to optimize and streamline business processes.

Wireless technology is most often used as a means of connecting IOT devices to ICT. Its strength is that it can be easily installed anywhere without wiring, but it also has some issues to be considered such as noise resistance, limitation of the amount of data that can be sent at a time, and securing a power source.

In this article, we will discuss the connection to BLE (Bluetooth Low Energy), one of the short-range wireless communication technologies. BLE is an extension of Bluetooth, and as the name suggests, it is characterized by its ability to communicate at extremely low power.

In this article, I would like to discuss the actual communication of BLE. First of all, let’s try to communicate with BLE using a javascript library called bluejelly as the simplest way.

bluejelly is a wrapper for a lavascript library called Web Bluetooth API, and it works with only three files: html file, bluejelly.js, and style.css. It can be used to connect to various BLEs by writing html files.

In the previous article, I introduced bluejelly.js, which is the simplest of all the libraries for connecting to the BLE. In this article, I would like to introduce noble, which runs on node.js and can be combined with the server-side applications I have mentioned so far.

Noble is a javascript module that runs on node.js. To use noble, install the module with “npm install noble” and use the following code for example for the csan of the ble device mentioned above.

This article summarizes the results of our investigation into the Python BLE control library.

Some business applications may need to react asynchronously to external stimuli such as network traffic. For example, IOT applications that receive and process streaming data, or applications that track company stock prices and information in the stock market.

If these applications are executed using the sequential/synchronous processing of a normal computer, the overhead of synchronizing asynchronous data becomes a bottleneck when the amount of input data increases, making it difficult for the application to achieve effective speed.

This time, asynchronous processing will be performed in the server-side back-end processing language, and these processes will be implemented for the actual application.

This section describes WoT (Web of Things) technology used in Artificial Intelligence and IOT technologies. WoT is an abbreviation for Web of Things, which was defined by W3C, the Internet standards organization, to solve existing IoT issues.

WoT addresses one of the challenges of the IoT, which is the lack of compatibility (at present, in many cases, sensors, platforms, or operating systems work only with certain systems), by addressing the issues of existing web technologies that are already widely used (HTML, Javascript, JSON, etc.) and By using protocols to provide IoT services and applications, we can increase interoperability of devices and add features such as security and access control at the application level, as well as semantic usage of data combined with Semantic Web technologies. The goal is to enable the creation of a wide variety of services.

Semantic Web (SW) technology considers the entire Web to be one huge information DB, and is a technology for efficiently processing the vast amount of information that exists there through automatic software processing. SW service technology is a technology for automatically constructing services that meet individual needs by modularizing Web services using these technologies.

For the extension of SW to IOT technology, data stream technology with semantic interpretation has been developed.

The Smart city project is a concrete example of the integration of IOT and SW technologies.

The topic of Daniel Metz’s dissertation at the Business & Information Systems Engineering (BISE) Institute at Siegen University is an analysis of the Real Time Enterprise (RTE) concept and supporting technologies over the past decade. Its main objective is to identify shortcomings. Subsequently, leveraging the Event Driven Architecture (EDA) and Complex Event Processing (CEP) paradigms, a reference architecture was developed that overcomes the gaps in temporal and semantic vertical integration across different enterprise levels that are essential to realize the RTE concept. The developed reference architecture has been implemented and validated in a foundry with typical characteristics of SMEs

In recent years, drones have been put to practical use in various fields such as transportation, agriculture, surveying, and disaster prevention, and they have also become an important technology in geospatial information. Drones are small unmanned aerial vehicles (UAVs) that can be remotely controlled by radio and can fly along set courses. UAVs can approach dangerous areas, such as disaster sites, unmanned, and take detailed pictures of the situation with their onboard cameras.

On the other hand, recent advances in computer vision technology and improvements in computer performance have made it possible to automatically reconstruct the shape of an object in three dimensions from a large number of images taken by a camera. This technique is called SfM (Structure from Motion) and MVS (Multi view Stereo). By combining this 3D model restoration technique with images taken by drones, terrain models and ornate images can be created quickly and easily.

In this article, we describe an implementation of the Kalman filter, one of the applications of the state-space model, in Clojure. The Kalman filter is an infinite impulse response filter used to estimate time-varying quantities (e.g., position and velocity of an object) from discrete observations with errors, and is used in a wide range of engineering fields such as radar and computer vision due to its ease of use. Specific examples of its use include integrating information with errors from device built-in accelerometers and GPS to estimate the ever-changing position of vehicles, as well as in satellite and rocket control.

The Kalman filter is a state-space model with hidden states and observation data generated from them, similar to the hidden Markov model described previously, in which the states are continuous and changes in state variables are statistically described using noise following a Gaussian distribution.

The precision matrix is, by definition, the inverse of the covariance matrix. Therefore, we can simply compute the inverse of the sample matrix from the standardized data using the following equation. However, achieving a sparse matrix is not easy. This is because the condition of “positive definiteness” cannot be satisfied if the absolute values of the correlation coefficients are set to zero except for a certain threshold value, which is a manual method, and thus cannot be interpreted as a graphical model.

Speech analysis is the process of extracting the features necessary for speech recognition from speech. There are two major roles of statistical analysis extraction. The first is to save computational resources. Since handling raw data as it is requires a large amount of computational resources, feature extraction is used to reduce the necessary information, thereby reducing the storage space and speeding up the computational speed. Another is noise elimination. When an application is specified, data fluctuations that are not related to that application become noise. For example, in speech recognition, features representing phonological characteristics are important, but features representing speaker characteristics are noise. By removing the noise, the accuracy of recognition becomes higher.

One of the objectives of time series data analysis is to find meaning (time-dependent) in the relationship between each observation point, that is, the order and forward/backward relationship of the data, from the data in which the observed values and observation points are recorded.

There are several types of speech recognition depending on the application, and the methods differ accordingly. First, there are discrete word recognition and continuous speech recognition based on the content that can be generated, and then there are continuous word recognition and sentence recognition. From the user’s point of view, there are two types of recognition: specific speaker recognition and unspecified speaker recognition.

The method used to create the speech recognition model is simple and cannot be called machine learning. In general, it is said that there is a wide variety in human speech, and even though humans speak in the same way, in reality, no two utterances are exactly the same. Probability theory is effective in dealing with fluctuations that are difficult to define by such rules. By expressing fluctuations in terms of probability distributions, it is possible to obtain plausible recognition results in most cases. In this article, we describe a method for constructing a robust recognition model from a large amount of data based on such a concept.

This world is full of dynamic data, not static data. For example, factories, plants, transportation, economy, social networks, etc. form a vast amount of dynamic data. In the case of factories and plants, sensors on a typical oil production platform make 10,000 observations per minute, reaching 100,000 o/m during peak hours, and in the case of mobile data, a mobile user in Milan makes 20,000 calls/SMS/data connections per minute and 80,000 connections per minute during peak hours. In the case of social networks, for example, Facebook, as of May 2013, observed 3 million “Likes” per minute. were observed.

A submodular function is a concept corresponding to a convex function on discrete variables. The application of convex functional concepts to “discrete” data allows for optimization, which is important in the field of machine learning. These are mainly used in the area of combinatorial optimization, such as sensor placement and graph cutting in computer vision.

    Context awareness enables services and applications to adapt their behaviour to the current situation for the benefit of their users. It is considered as a key technology within the IT industry, for its potential to provide a significant competitive advantage to services providers and to give subtantial differentiation among existing services. Automated learning of contexts will improve the efficiency of Context Aware Services (CAS) development. In this paper we present a system which supports storing, analyzing and exploiting an history of sensors and equipments data collected over time, using data mining techniques and tools. This approach allows us to identify parameters (context dimensions), that are relevant to adapt a service, to identify contexts that needs to be distinguished, and finally to identify adaptation models for CAS such as the one which would automatically switch off/on of lights when needed. In this paper, we introduce our approach and describe the architecture of our system which implements this approach. We then presents the results obtained when applied on a simple but realistic scenario of a person moving around in her flat. For instance the corresponding dataset has been produced by devices such as white goods equipment, lights and mobile terminal based sensors which we can retrieve the location, position and posture of its owner from. The method is able to detect recurring patterns. For instance, all patterns found were relevant for automating the control (switching on/off) of the light in the room the person is located. We discuss further these results, position our work with respect to work done elsewhere and conclude with some perspectives.

    Sensor network deployments are a primary source of massive amounts of data about the real world that surrounds us, measuring a wide range of physical properties in real time. However, in large-scale deployments it becomes hard to effectively exploit the data captured by the sensors, since there is no precise information about what devices are available and what properties they measure. Even when metadata is available, users need to know low-level details such as database schemas or names of properties that are specific to a device or platform. Therefore the task of coherently searching, correlating and combining sensor data becomes very challenging. We propose an ontology-based approach, that consists in exposing sensor observations in terms of ontologies enriched with semantic metadata, providing information such as: which sensor recorded what, where, when, and in which conditions. For this, we allow defining virtual semantic streams, whose ontological terms are related to the underlying sensor data schemas through declarative mappings, and can be queried in terms of a high level sensor network ontology.

    .Sensor Web researchers are currently investigating middleware to aid in the dynamic discovery, integration and analysis of vast quantities of high quality, but distributed and heterogeneous earth observation data. Key challenges being investigated include dynamic data integration and analysis, service discovery and semantic interoperability. However, few efforts deal with the management of both knowledge and system dynamism. Two emerging technologies that have shown promise in dealing with these issues are ontologies and software agents. This paper introduces the idea and identifies key requirements for a Knowledge Driven Sensor Web and presents our efforts towards developing an associated semantic infrastructure within the Sensor Web Agent Platform.

    Sensor observations are usually offered in relation to a specific purpose, e.g., for reporting fine dust emissions, following strict procedures, and spatio-temporal scales. Consequently, the huge amount of data gathered by today’s public and private sensor networks is most often not reused outside of its initial creation context. Fostering the reusability of observations and derived applications calls for (i) spatial, temporal, and thematic aggregation of measured values, and (ii) easy integration mechanisms with external data sources. In this paper, we investigate how work on sensor observation aggregation can be incorporated into a Linked Data framework focusing on external linkage as well as provenance information. We show that Linked Data adds new aspects to the aggregation problem, e.g., whether external links from one of the original observations can be preserved for the aggregate. The Stimulus-SensorObservation (SSO) ontology design pattern is extended by classes and relations necessary to model the aggregation of sensor observations.

    Over the past few years there has been a proliferation in the use of sensors within different applications. The increase in the quantity of sensor data makes it difficult for end users to understand situations within the environments where the sensors are deployed. Thus, there is a need for situation assessment mechanisms upon the sensor networks to assist users to interpret sensor data when making decisions. However, one of the challenges to realize such a mechanism is the need to integrate real-time sensor readings with contextual data sources from legacy systems. This paper tackles the data enrichment problem for sensor data. It builds upon Linked Data principles as a valid basis for a unified enrichment infrastructure and proposes a dynamic enrichment approach that sees enrichment as a process driven by situations of interest. The approach is demonstrated through examples and a proof-of-concept prototype based on an energy management use case.

    Semantic service discovery is necessary to facilitate the potential of service providers (many sensors, different characteristics) to change the sensor configuration in a generic surveillance application without modifications to the application’s business logic. To combine efficiency and flexibility, semantic annotation of sensors and semantic aware match making components are needed. This short paper gives the reader an understandig of the SOAR component for semantic SWE support and rule based sensor selection.

    In this short paper, we present an architecture to deploy lightweight Semantic Sensor Networks easily based on widely available Android Devices. This approach essentially relies on deploying a SPARQL endpoint on the device, and federating queries to multiple devices to build Semantic Sensor Network applications.

    The explosion in user-generated content (on the Social Web) published from mobile devices has seen microblog platforms like Twitter grow exponentially. Twitter is a microblogging platform founded in 2006, which by October 2010 had roughly 175m users and as of June 2011, Twitter processed 200m posts per day. Twitter data has been utilised to predict/report natural disasters, civil unrest, and media topics. Smartphones and other mobile devices contain an array of sensors but are under-utilised on the Social Web. In this paper, we propose a method for annotating microblog posts with multi-sensor data by representing it with ontologies such as SSN and SIOC. We present an alignment of these ontologies and outline an enhanced Twitter client that would allow users to enter an emergency mode where all or most of the available sensor data would be published as annotations to the users post, allowing relief organisations to use any data relevant.

    The challenges of the sensor web have been well documented, and the use of appropriate semantic web technologies promises to offer potential solutions to some of these challenges (for example, how to represent sensor data, integrate it with other data sets, publish it, and reason with the data streams). To date a large amount of work in this area has focused on sensor networks based on“traditional” hardware sensors. In recent years, citizen sensing has became a relatively well-established approach for incorporating humans as sensors within a system. Often facilitated via some mobile platform, citizen sensing may incorporate observational data generated by hardware (e.g. a GPS device) or directly by the human observer. Such human observations can easily be imperfect (e.g. erroneous or fake), and sensor properties that would typically be used to detect and reason about such data, such as measurements of accuracy and sampling rate do not exist. In this paper we discuss our work as part of the Informed Rural Passenger project, in which the passengers themselves are our main source for transport related sensing (such as vehicle occupancy levels, available facilities). We discuss the challenges of incorporating and using such observational data in a real world system, and describe how we are using semantic web technologies, combined with models of provenance to address them.

    This paper introduces the Sensapp platform, a semantic and OGCbased sensor application platform to enable users to register, annotate, search, visualize, and compose OGC-based sensors and services for creating addedvalue services and applications. Functionalities of Sensapp such as sensor registration, sensor data visualization, visual composition and generation of executable service compositions are presented through the demo

    Multi-modal sensor networks are dicult to program and dicult to use dynamically. We show how to use an ontology in the user interface to support end users to describe events of interest arising dynamically in sensor networks, to generate program code for the network devices to collect the necessary data, and to generate alerts when those described events are detected. The ontology is used for semantic optimisation at various points in the processing architecture

    This paper demonstrates a Semantic Web enabled system for collecting and processing sensor data within a rescue environment. The realtime system collects heterogeneous raw sensor data from rescue robots through a wireless sensor network. The raw sensor data is converted to RDF using the Semantic Sensor Network (SSN) ontology and further processed to generate abstractions used for event detection in emergency scenarios.

    The emergence of dynamic information sources – including sensor networks – has led to large streams of real-time data on the Web. Research studies suggest, these dynamic networks have created more data in the last three years than in the entire history of civilization, and this trend will only increase in the coming years [1]. With this coming data explosion, real-time analytics software must either adapt or die [2]. This paper focuses on the task of integrating and analyzing multiple heterogeneous streams of sensor data with the goal of creating meaningful abstractions, or features. These features are then temporally aggregated into feature streams. We will demonstrate an implemented framework, based on Semantic Web technologies, that creates feature-streams from sensor streams in real-time, and publishes these streams as Linked Data. The generation of feature streams can be accomplished in reasonable time and results in massive data reduction.

    Next generations of spatial information infrastructures call for more dynamic service composition, more sources of information, as well as stronger capabilities for their integration. Sensor networks have been identified as a major data provider for such infrastructures, while Semantic Web technologies have demonstrated their integration capabilities. Most sensor data is stored and accessed using the Observations & Measurements (O&M) standard of the Open Geospatial Consortium (OGC) as data model. However, with the advent of the Semantic Sensor Web, work on an ontological model gained importance within Sensor Web Enablement (SWE). The ongoing paradigm shift to Linked Sensor Data complements this attempt and also adds interlinking as a new challenge. In this demonstration paper, we briefly present a Linked Data model and a RESTful proxy for OGC’s Sensor Observation Service (SOS) to improve integration and inter-linkage of observation data.

    Submodular Optimization and Machine Learning

    A submodular function is a concept that corresponds to a convex function on discrete variables. The application of the convex function concept to “discrete” data enables the processing of optimization, which is important in the field of machine learning. These are mainly used in the area of combinatorial optimization, such as sensor placement and graph cutting in computer vision.

    When considering submodular functions, we consider the combinatorial aspect of machine learning. By “combinatorial” we mean the procedure of “selecting a part of some selectable collection,” and the various computational properties associated with it.

    For example, when considering the required length of hospitalization for a new patient based on patient data stored in a hospital, it is common to use machine learning to solve this problem by building a regression model for the length of hospitalization based on various examination and survey data items using the patient data stored in the hospital and applying it to the patient. At this point, the patient’s weight and history of visits are likely to be useful data, but the patient’s hobbies and occupation are not likely to be of much use. If a regression model is built including such unrelated data, the prediction will be influenced by these items and the performance of the model will deteriorate.

    In order to avoid such a situation, it is important to select and use a portion of the data that may be useful, and the approach to systematically extract such data is the submodular function approach.

    In the following pages of this blog, we describe the theory and implementation of this machine learning approach to submodular optimization.

    Stream Data Technology

    This world is full of dynamic data, not static data. For example, a huge amount of dynamic data is formed in factories, plants, transportation, economy, social networks, and so on. In the case of factories and plants, a typical sensor on an oil production platform makes 10,000 observations per minute, peaking at 100,000 o/m. In the case of mobile data, a mobile user in Milan makes 20,000 calls/SMS/data connections per minute, and 20,000 connections per minute. In the case of mobile data, mobile users in Milan make 20,000 calls/sms/data connections per minute, reaching 20,000 connections per minute and 80,000 connections at peak times, and in the case of social networks, Facebook, for example, observed 3 million likes per minute as of May 2013. as of May 2013.

    Use cases where these data appear include “What is the expected timing of failure when the turbine barring starts to vibrate in the last 10 minutes? What is the expected failure time when the turbine barring starts to vibrate, as detected in the last 10 minutes?” or “Is there public transportation where people are?” or “Who is discussing the top ten topics? These are just a few of the many granular issues that arise, and solutions to them are needed.

    In the following pages of this blog, we discuss real-time distributed processing frameworks for handling such stream data, machine learning processing of time series data, and application examples such as smart cities and Industry 4.0 that utilize these frameworks.

    Analysis of Time Series Data

    A time series is a series of values obtained by continuously (or discontinuously at certain intervals) observing the temporal changes of a phenomenon, and collecting time series data is a process of “assuming that the observed objects or their attributes change in continuous time and recording them at predetermined time intervals. The main task of time series analysis is to collect time series data. At this point, one of the main purposes of time series analysis is to model the observation target, that is, to read from the time series data a function whose argument is continuous time that represents the observation target well.

    In general, these models are represented by probability distributions, such as AR (AutoRegressive) model (Autoregressive model), ARCH (AutoRegressive Conditional Heteroscedasticity) model (Autoregressive Conditional Heteroscedasticity model), GARCH (Generalized ARCH) model, or state space model.

    In the following pages of this blog, we discuss theoretical overview, specific algorithms, and various applications of this time series data analysis.

    Anomaly detection and change detection

    In any business setting, it is extremely important to be able to detect changes or signs of anomalies. For example, by detecting changes in sales, we can quickly take the next step, or by detecting signs of abnormalities in a chemical plant in operation, we can prevent serious accidents from occurring. This will be very meaningful when considering digital transformation and artificial intelligence tasks.

    In addition to extracting rules, it is now possible to build a practical anomaly and change detection system by using statistical machine learning techniques. This is a general-purpose method that uses the probability distribution p(x) of the possible values of the observed value x to describe the conditions for anomalies and changes in mathematical expressions.

    In the following pages of this blog, I have described various approaches to anomaly and change detection, starting from Hotelling’s T2 method, Bayesian method, neighborhood method, mixture distribution model, support vector machine, Gaussian process regression, and sparse structure learning.

          コメント

          タイトルとURLをコピーしました