Computing Science Division
Chalmers University of Technology
EDIT building | Floor 6V Office 6476
412 96 Gothenburg, Sweden
hazemto at chalmers.se
I am a tenure-track Assistant Professor at the Computer Science and Engineering Department at Chalmers University of Technology. I lead the lab for Safe and Trustworthy Autonomous Reasoning. Previously, I was a postdoctoral researcher in the EECS Department at UC Berkeley, USA. I obtained my Ph.D. in 2019 from Saarland University, Germany. My research interests are the formal specification, verification, and synthesis of cyber-physical systems. I especially lay the focus on the investigation of quantitative approaches for verifying and explaining the behavior of cyber-physical systems. My research is funded by the Wallenberg AI, Autonomous Systems, and Software Program.
For my full CV click hereWe present ULGEN, a runtime assurance (RTA) framework for programming safe cyber-physical systems (CPS). In ULGEN, a system is implemented as a collection of asynchronous processes executing RTA modules which are generalizations of the well-known Simplex architecture. An RTA module is composed of a set of safe controllers (SCs), designed to guarantee certain safety specifications, and a set of advanced controllers (ACs), optimized for performance, each defined to run under the specific conditions of the operating environment, and a decision module implementing the switching logic between the controllers.A source of complexity in achieving safe CPS is that these systems often involve concurrently interacting components with different execution semantics. To this end, ULGEN allows for the definition of RTA modules with either event-driven or time-driven execution semantics and encapsulates such components into RTA modules. It further provides primitives for implementing priority-based communication between asynchronous processes, which is a necessary feature for task prioritization mechanisms such as contingency plans and interrupt service routines. The framework also provides formal guarantees on the safe execution of RTA modules based on a formal definition of well-formedness. In ULGEN, a well-formed RTA module combines SCs and ACs in a way that guarantees the underlying safety specifications assured by the SCs while delivering the desired performance offered by the ACs.We compare the safety guarantees of ULGEN against other state-of-the-art RTA frameworks and demonstrate its efficacy in implementing safe and performant CPS by presenting an extensive experimental evaluation of five case studies both in a simulation environment and on a real robotic platform.
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
We investigate the role of ensemble methods in learning runtime monitors for operational design domains of autonomous systems. An operational design domain (ODD) of a system captures the conditions under which we can trust the components of the system to maintain its safety. A runtime monitor of an ODD predicts, based on a sequence of monitorable observations, whether the system is about to exit the ODD. For black-box systems, a key challenge in learning an ODD monitor is obtaining a monitor with a high degree of accuracy. While statistical theories such as that of probably approximate learning (PAC) allow us to provide guarantees on the accuracy of a learned ODD monitor up to a certain confidence probability (by bounding the number of needed training examples), practically, there will always remain a chance, that using such a one-shot approach will result in monitors with a high misclassification rate. To address this challenge we consider well-known ensemble learning algorithms and utilize them for learning ODD ensembles. We derive theoretical bounds on the estimated misclassification risk of ensembles, showing that it reduces exponentially with the number of monitors and linearly with the risk of individual monitors. An empirical evaluation of the impact of different ensemble learning methods on a case study from autonomous driving demonstrates the advantage of this approach.
We present a framework for the compositional simulation-based analysis of AI-based autonomous systems for Markovian safety specifications. Our compositional approach allows us to cut down the cost of executing a large number of long-running simulations, by decomposing a simulation-based analysis task into several shorter and more efficient ones. Results obtained from the individual analyses are then stitched together to generate a result for the overall simulation-based task. Our approach is based on a decomposition of scenarios formalized as concurrent hierarchical probabilistic extended state machines that describe sequential and parallel compositions of scenarios. We present two instantiations of our framework for falsification and statistical verification. Using case studies from the autonomous driving domain, we demonstrate the scalability of our compositional approach in comparison to a monolithic analysis approach.
Autonomous systems are increasingly relying on machine learning (ML) components to perform a variety of complex tasks in perception, prediction, and control. To guarantee the safety of ML-based autonomous systems, it is important to capture their operational design domain (ODD), i.e., the conditions under which using the ML components does not endanger the safety of the system. In this paper, we present a framework for learning runtime monitors for ODDs of autonomous systems with black-box ML components. A runtime monitor of an ODD predicts based on a sequence of monitorable observations whether the system is about to exit its ODD. We particularly investigate the learning of optimal monitors based on counterexample-guided refinement and conformance testing. We evaluate our approach on a case study from the domain of autonomous driving.
AI-based autonomous systems are increasingly relying on machine learning (ML) components to perform a variety of complex tasks in perception, prediction, and control. The use of ML components is projected to grow and with it the concern of using these components in systems that operate in safety-critical settings. To guarantee a safe operation of autonomous systems, it is important to run an ML component in its operational design domain (ODD), i.e., the conditions under which using the component does not endanger the safety of the system. Building safe and reliable autonomous systems which may use machine-learning-based components, calls therefore for automated techniques that allow to systematically capture the ODD of systems. In this paper, we present a framework for learning runtime monitors that capture the ODDs of black-box systems. A runtime monitor of an ODD predicts based on a sequence of monitorable observations whether the system is about to exit the ODD. We particularly investigate the learning of optimal monitors based on counterexample-guided refinement and conformance testing. We evaluate the applicability of our approach on a case study from the domain of autonomous driving.
International Symposium on Automated Technology for Verification and Analysis. 2022.
We present a new multi-objective optimization approach for synthesizing interpretations that ''explain'' the behavior of black-box machine learning models. Constructing human-understandable interpretations for black-box models often requires balancing conflicting objectives. A simple interpretation may be easier to understand for humans while being less precise in its predictions vis-a-vis a complex interpretation. Existing methods for synthesizing interpretations use a single objective function and are optimized for a single class of interpretations. In contrast, we provide a more general and multi-objective synthesis framework that allows users to choose (1) the class of syntactic templates from which an interpretation should be synthesized, and (2) quantitative measures on both the correctness and explainability of an interpretation. For a given black-box, our approach yields a set of Pareto-optimal interpretations with respect to the correctness and explainability measures. We show that the underlying multi-objective optimization problem can be solved via a reduction to quantitative constraint solving, such as weighted maximum satisfiability. To demonstrate the benefits of our approach, we have applied it to synthesize interpretations for black-box neural-network classifiers. Our experiments show that there often exists a rich and varied set of choices for interpretations that are missed by existing approaches.
Autonomous systems are increasingly deployed in safety-critical applications and rely more on high-performance AI/ML-based components. Runtime monitors play an important role in raising the level of assurance in AI/ML-based autonomous systems by ensuring that the autonomous system stays safe within its operating environment. In this tutorial, we present VerifAI, an open-source toolkit for the formal design and analysis of systems that include AI/ML components. VerifAI provides features supporting a variety of use cases including formal modeling of the autonomous system and its environment, automatic falsification of system-level specifications as well as other simulation-based verification and testing methods, automated diagnosis of errors, and automatic specification-driven parameter and component synthesis. In particular, we describe the use of VerifAI for generating runtime monitors that capture the safe operational environment of systems with AI/ML components. We illustrate the advantages and applicability of VerifAI in real-life applications using a case study from the domain of autonomous aviation.
We investigate the problem of monitoring partially observable systems with nondeterministic and probabilistic dynamics. In such systems, every state may be associated with a risk, e.g., the probability of an imminent crash. During runtime, we obtain partial information about the system state in form of observations. The monitor uses this information to estimate the risk of the (unobservable) current system state. Our results are threefold.First, we show that extensions of state estimation approaches do not scale due the combination of nondeterminism and probabilities. While exploiting a geometric interpretation of the state estimates improves the practical runtime, this cannot prevent an exponential memory blowup. Second, we present a tractable algorithm based on model checking conditional reachability probabilities. Third, we provide prototypical implementations and manifest the applicability of our algorithms to a range of benchmarks. The results highlight the possibilities and boundaries of our novel algorithms.
33rd International Conference on Computer Aided Verification.
We present an implementation of SOTER, a run-time assurance framework for building safe distributed mobile robotic (DMR) systems, on top of the Robot Operating System (ROS). The safety of DMR systems cannot always be guaranteed at design time, especially when complex, off-the-shelf components are used that cannot be verified easily. SOTER addresses this by providing a language-based approach for run-time assurance for DMR systems. SOTER implements the reactive robotic software using the language P, a domain-specific language designed for implementing asynchronous event-driven systems, along with an integrated run-time assurance system that allows programers to use uncertified components but still provide safety guarantees. We describe an implementation of SOTER for ROS and demonstrated its efficacy using a multi-robot surveillance case study, with multiple run-time assurance modules and show, through rigorous simulation, that SOTER enabled systems ensure safety, even when using unknown and untrusted components.
Hyperproperties are properties that describe the correctness of a system as a relation between multiple executions. Hyperproperties generalize trace properties and include information-flow security requirements, like noninterference, as well as requirements like symmetry, partial observation, robustness, and fault tolerance. We initiate the study of the specification and verification of hyperproperties of Markov decision processes (MDPs). We introduce the temporal logic PHL (Probabilistic Hyper Logic), which extends classic probabilistic logics with quantification over schedulers and traces. PHL can express a wide range of hyperproperties for probabilistic systems, including both classical applications, such as probabilistic noninterference, and novel applications in areas such as robotics and planning. While the model checking problem for PHL is in general undecidable, we provide methods both for proving and for refuting formulas from a fragment of the logic. The fragment includes many probabilistic hyperproperties of interest
18th International Symposium on Automated Technology for Verification and Analysis.
Reactive synthesis transforms a specification of a reactive system, given in a temporal logic, into an implementation. The main advantage of synthesis is that it is automatic. The main disadvantage is that the implementation is usually very difficult to understand. In this paper, we present a new synthesis process that explains the synthesized implementation to the user. The process starts with a simple version of the specification and a corresponding simple implementation. Then, desired properties are added one by one, and the corresponding transformations, repairing the implementation, are explained in terms of counterexample traces. We present SAT-based algorithms for the synthesis of repairs and explanations. The algorithms are evaluated on a range of examples including benchmarks taken from the SYNTCOMP competition.
18th International Symposium on Automated Technology for Verification and Analysis.
Automata over infinite words, also known as omega-automata, play a key role in the verification and synthesis of reactive systems. The spectrum of omega-automata is defined by two characteristics: the acceptance condition (e.g. Büchi or parity) and the determinism (e.g., deterministic or nondeterministic) of an automaton. These characteristics play a crucial role in applications of automata theory. For example, certain acceptance conditions can be handled more efficiently than others by dedicated tools and algorithms. Furthermore, some applications, such as synthesis and probabilistic model checking, require that properties are represented as some type of deterministic omega-automata. However, properties cannot always be represented by automata with the desired acceptance condition and determinism. In this paper we study the problem of approximating linear-time properties by automata in a given class. Our approximation is based on preserving the language up to a user-defined precision given in terms of the size of the finite lasso representation of infinite executions that are preserved. We study the state complexity of different types of approximating automata, and provide constructions for the approximation within different automata classes, for example, for approximating a given automaton by one with a simpler acceptance condition.
17th International Symposium on Automated Technology for Verification and Analysis.
In stream-based runtime monitoring, streams of data, called input streams, which involve data collected from the system at runtime, are translated into new streams of data, called output streams, which define statistical measures and verdicts on the system based on the input data. The advantage of this setup is an easy-to-use and modular way for specifying monitors with rich verdicts, provided with formal guarantees on the complexity of the monitor.
In this tutorial, we give an overview of the different classes of stream specification languages, in particular those with real-time features. With the help of the real-time stream specification language RTLola, we illustrate which features are necessary for the definition of the various types of real-time properties and we discuss how these features need to be implemented in order to guarantee memory efficient and reliable monitors. To demonstrate the expressive power of the different classes of stream specification languages and the complexity of the different features, we use a series of examples based on our experience with monitoring problems from the areas of unmanned aerial systems and telecommunication networks.
An essential part of cyber-physical systems is the online evaluation of real-time data streams. Especially in systems that are intrinsically safety-critical, a dedicated monitoring component inspecting data streams to detect problems at runtime greatly increases the confidence in a safe execution. Such a monitor needs to be based on a specification language capable of expressing complex, high-level properties using only the accessible low-level signals. Moreover, tight constraints on computational resources exacerbate the requirements on the monitor. Thus, several existing approaches to monitoring are not applicable due to their dependence on an operating system. We present an FPGA-based monitoring approach by compiling an RTLola specification into synthesizable VHDL code. RTLola is a stream-based specification language capable of expressing complex real-time properties while providing an upper bound on the execution time and memory requirements. The statically determined memory bound allows for a compilation to an FPGA with a fixed size. An advantage of FPGAs is a simple integration process in existing systems and superb executing time. The compilation results in a highly parallel implementation thanks to the modular nature of RTLola specifications. This further increases the maximal event rate the monitor can handle.
The unrealizability of a specification is often due to the assumption that the behavior of the environment is unrestricted. In this paper, we present algorithms for synthesis in bounded environments, where the environment can only generate input sequences that are ultimately periodic words (lassos) with finite representations of bounded size. We provide automata-theoretic and symbolic approaches for solving this synthesis problem, and also study the synthesis of approximative implementations from unrealizable specifications. Such implementations may violate the specification in general, but are guaranteed to satisfy the specification on at least a specified portion of the bounded-size lassos. We evaluate the algorithms on different arbiter specifications.
31st International Conference on Computer Aided Verification
With ever increasing autonomy of cyber-physical systems, monitoring becomes an integral part for ensuring the safety of the system at runtime. StreamLAB is a monitoring framework with high degree of expressibility and strong correctness guarantees. Specifications are written in RTLola, a stream-based specification language with formal semantics. StreamLAB provides an extensive analysis of the specification, including the computation of memory consumption and run-time guarantees. We demonstrate the applicability of StreamLAB on typical monitoring tasks for cyber-physical systems, such as sensor validation and system health checks.
31st International Conference on Computer Aided Verification
Hyperproperties elevate the traditional view of trace properties form sets of traces to sets of sets of traces and provide a formalism for expressing information-flow policies. For trace properties, algorithms for verification, monitoring, and synthesis are typically based on a representation of the properties as omega-automata. For hyperproperties, a similar, canonical automata-theoretic representation is, so far, missing. This is a serious obstacle for the development of algorithms, because basic constructions, such as learning algorithms, cannot be applied. In this paper, we present a canonical representation for the widely used class of regular k-safety hyperproperties, which includes important polices such as noninterference. We show that a regular k-safety hyperproperty S can be represented by a finite automaton, where each word accepted by the automaton represents a violation of S. The representation provides an automata-theoretic approach to regular k-safety hyperproperties and allows us to compare regular k-safety hyperproperties, simplify them, and learn such hyperproperties. We investigate the problem of constructing automata for regular k-safety hyperproperties in general and from formulas in HyperLTL, and provide complexity bounds for the different translations. We also present a learning algorithm for regular k-safety hyperproperties based on the L* learning algorithm for deterministic finite automata.
Hyperproperties are properties of sets of computation traces. In this paper, we study quantitative hyperproperties, which we define as hyperproperties that express a bound on the number of traces that may appear in a certain relation. For example, quantitative non-interference limits the amount of information about certain secret inputs that is leaked through the observable outputs of a system. Quantitative non-interference thus bounds the number of traces that have the same observable input but different observable output. We study quantitative hyperproperties in the setting of HyperLTL, a temporal logic for hyperproperties. We show that, while quantitative hyperproperties can be expressed in HyperLTL, the running time of the HyperLTL model checking algorithm is, depending on the type of property, exponential or even doubly exponential in the quantitative bound. We improve this complexity with a new model checking algorithm based on model-counting. The new algorithm needs only logarithmic space in the bound and therefore improves, depending on the property, exponentially or even doubly exponentially over the model checking algorithm of HyperLTL. In the worst case, the new algorithm needs polynomial space in the size of the system. Our Max#Sat-based prototype implementation demonstrates, however, that the counting approach is viable on systems with nontrivial quantitative information flow requirements such as a passcode checker.
30st International Conference on Computer Aided Verification
In formal synthesis of reactive systems an implementation of a system is automatically constructed from its formal specification. The great advantage of synthesis is that the resulting implementation is correct by construction; therefore there is no need for manual programming and tedious debugging tasks. Developers remain, nevertheless, hesitant to using automatic synthesis tools and still favor manually writing code. A common argument against synthesis is that the resulting implementation does not always give a clear picture on what decisions were made during the synthesis process. The outcome of synthesis tools is mostly unreadable and hinders the developer from understanding the functionality of the resulting implementation. Many attempts have been made in the last years to make the synthesis process more transparent to users. Either by structuring the outcome of synthesis tools or by providing additional automated support to help users with the specification process. In this paper we discuss the challenges in writing specifications for reactive systems and give a survey on what tools have been developed to guide users in specifying reactive systems and understanding the outcome of synthesis tools.
We determine the complexity of counting models of bounded size of specifications expressed in linear-time temporal logic. Counting word-models is #P-complete, if the bound is given in unary, and as hard as counting accepting runs of nondeterministic polynomial space Turing machines, if the bound is given in binary. Counting tree-models is as hard as counting accepting runs of nondeterministic exponential time Turing machines, if the bound is given in unary. For a binary encoding of the bound, the problem is at least as hard as counting accepting runs of nondeterministic exponential space Turing machines, and not harder than counting accepting runs of nondeterministic doubly-exponential time Turing machines. Finally, counting arbitrary transition systems satisfying a formula is #P-hard and not harder than counting accepting runs of nondeterministic polynomial time Turing machines with a PSPACE oracle, if the bound is given in unary. If the bound is given in binary, then counting arbitrary models is as hard as counting accepting runs of nondeterministic exponential time Turing machines.
Finding models for linear-time properties is a central problem in verification and planning. We study the distribution of linear-time models by investigating the density of linear-time properties over the space of ultimately periodic words. The density of a property over a bound n is the ratio of the number of lasso-shaped words of length n, that satisfy the property, to the total number of lasso-shaped words of length n. We investigate the problem of computing the density for both linear-time properties in general and for the important special case of omega-regular properties. For general linear-time properties, the density is not necessarily convergent and can oscillates indefinitely for certain properties. However, we show that the oscillation is bounded by the growth of the sets of bad- and good-prefix of the property. For omega-regular properties, we show that the density is always convergent and provide a general algorithm for computing the density of omega-regular properties as well as more specialized algorithms for certain sub-classes and their combinations.
15th International Symposium on Automated Technology for Verification and Analysis
We introduce Lola2.0, a stream-based specification language for the precise description of complex security properties in network traffic. The language extends the specification language Lola with two new features: template stream expressions, which allow input data to be carried along the stream, and dynamic stream generation, where new monitors can be invoked during the monitoring process for the monitoring of new subtasks on their own time scale. Lola 2.0 is simple and expressive: it combines the ease-of-use of rule-based specification languages like Snort with the expressiveness of heavy-weight scripting languages or temporal logics previously needed for the description of complex stateful dependencies and statistical measures. Lola 2.0 specifications are monitored by incrementally constructing output streams from input streams, while maintaining a store of partially evaluated expressions. We demonstrate the flexibility and expressivity of Lola 2.0 using a prototype implementation on several practical examples.
We present an analysis technique for temporal specifications of reactive systems that identifies, on the level of individual system outputs over time, which parts of the implementation are determined by the specification, and which parts are still open. This information is represented in the form of a labeled transition system, which we call skeleton. Each state of the skeleton is labeled with a three-valued assignment to the output variables: each output can be true, false, or open, where true or false means that the value must be true or false, respectively, and open means that either value is still possible. We present algorithms for the verification of skeletons and for the learning-based synthesis of skeletons from specifications in linear-time temporal logic (LTL). The algorithm returns a skeleton that satisfies the given LTL specification in time polynomial in the size of the minimal skeleton. Our new analysis technique can be used to recognize and repair specifications that underspecify critical situations. The technique thus complements existing methods for the recognition and repair of overspecifications via the identification of unrealizable cores.
14th International Symposium on Automated Technology for Verification and Analysis
We determine the complexity of counting models of bounded size of specifications expressed in Linear-time Temporal Logic. Counting word-models is #P-complete, if the bound is given in unary, and as hard as counting accepting runs of nondeterministic polynomial space Turing machines, if the bound is given in binary. Counting tree-models is as hard as counting accepting runs of nondeterministic exponential time Turing machines, if the bound is given in unary. For a binary encoding of the bound, the problem is at least as hard as counting accepting runs of nondeterministic exponential space Turing machines. On the other hand, it is not harder than counting accepting runs of nondeterministic doubly-exponential time Turing machines.
34th International Conference on Foundation of Software Technology and Theoretical Computer Science
We investigate the model counting problem for safety specifications expressed in linear-time temporal logic (LTL). Model counting has previously been studied for propositional logic; in planning, for example, propositional model counting is used to compute the plan’s robustness in an incomplete domain. Counting the models of an LTL formula opens up new applications in verification and synthesis. We distinguish word and tree models of an LTL formula. Word models are labeled sequences that satisfy the formula. Counting the number of word models can be used in model checking to determine the number of errors in a system. Tree models are labeled trees where every branch satisfies the formula. Counting the number of tree models can be used in synthesis to determine the number of implementations that satisfy a given formula. We present algorithms for the word and tree model counting problems, and compare these direct constructions to an indirect approach based on encodings into propositional logic.
8th International Conference on Language and Automata Theory and Applications