Planetary-Scale Agent Simulations

Peter Barnes (14-ERD-062)

Abstract

The nation faces many problems that are global in scale, especially in the field of national security, yet lacks the capability of simulating those problems at worldwide scale to help develop the best possible countermeasures. During this project, we developed global-scale models to predict outcomes, evaluate courses of action, and develop new analysis techniques for addressing global problems. To demonstrate the capabilities of our global modeling approach, we developed applications based on discrete-event simulation, which models the operation of a system as a discrete sequence of events in time, with each event occurring at a particular instant and marking a change of state in the system. For this project, we created two new application models: viral evolution during replication and security and policy models for stock markets. This work enhanced the Laboratory’s core competencies in bioscience, bioengineering, high-performance computing, simulation, and data science in support of the mission foci of biological and cyber security, with applications to other global security fields.

Background and Research Objectives

There are no local problems left in the world. Every facet of modern life, especially national security concerns, have global drivers and impact. Atmospheric contaminants spread worldwide in weeks; goods ship from anywhere to anywhere in days; diseases travel at airplane speed; cyber attacks have global impact in minutes. To address global problems we need the ability to simulate them at worldwide scale. Many of these emerging security concerns are not defined by a set of governing partial differential equations, but rather by a set of interactions among model entities and agents at discrete points in time and space. These are best modeled with the tools of parallel discrete event simulation.

During this project, we researched,developed and demonstrated new techniques for parallel discrete event simulation. In particular, we had previously demonstrated world record performance and scalability of these tools using “optimistic” synchronization.¹ “Conservative” synchronization is much more widely used because of the difficulty of writing model event functions in an optimistic way, which requires being able to rollback all updates to model state caused by any model event. To enable simulations up to planetary scale we addressed three fundamental research challenges in parallel discrete event simulation: automatic generation of reversing source code, dynamic load balancing, and global synchronization using only one-sided communication primitives. We illustrated these techniques with two application models: (1) viral evolution during replication and (2) security and policy models for stock markets.

Scientific Approach and Accomplishments

Prior research had demonstrated a proof-of-concept tool, called Backstroke, which used Livermore’s ROSE compiler toolkit to generate rollback functions for every model event in the conservative source code. Our initial plan was to extend this proof of concept to address all features of the C++ language. As we came to understand the problem better we realized we could use ROSE to create a different transformation: instrument the original, conservative, source code to record updates to model state in minimal form. We could then implement rollback simply by restoring the old values to the original memory locations. This became Backstroke 2.0, which can now handle all features of the C++ language, including multi-threading and exceptions.² It’s hard to overstate the significance of this development. Writing model code for optimistic execution used to be completely different from writing for conservative execution, and at least three times as hard. Now, one only has to write the conservative code, and Backstroke 2.0 takes care of the rest completely transparently.

The second technical challenge we addressed is dynamic load balancing. Essentially all parallel discrete event simulators today require static partitioning of model objects across the processors executing the simulation. However, in essentially all realistic models the load imbalances are dynamic. For example, the local density of nodes in a mobile networking model as nodes congregate and disperse is intrinsically dynamic, and has a large impact on model performance. Charm++, an object-oriented parallel runtime system developed by the University of Illinois at Urbana Champaign, supports asynchronous message passing and dynamic load balancing by migrating model objects.³ To explore the issues in dynamic load balancing for parallel discrete event simulation, we ported a particular type of simulator called a Rensselaer's Optimistic Simulation System (more commonly known as ROSS) to use Charm++, instead of using a traditional message passing interface.

Based on the successful port of ROSS to Charm++, we embarked on the first systematic study of dynamic load balancing policy, instrumentation, and metrics ever done for parallel discrete event simulation.⁴ This was just the first step in what promises to be a rich research area.

To demonstrate the use of parallel discrete event simulation in new application areas, we developed two new complete models. The first model was based on an existing, time-stepped Livermore model for evolution of the Dengue virus during replication.⁵ We ported to ROSS and dramatically extended it to support many more features and aspects of the biology.⁶ Figure 1.A illustrates the simulation events representing viral infection and replication within a single cell; Figure 1.B illustrates the events representing an extended multi-cellular tissue with viral transport by explicit diffusion between “pixels.” This model presents several novel challenges for parallel discrete event simulation: maintenance of a distributed time-stamped database of virion sequences, and representing diffusion by explicit particle transport events. In both cases our initial implementations have led us to sketch novel approaches for future development. This work will be a key component of a future whole cell model we are planning with university collaborators.

Figure 1. viral model components and simulation events representing (a) viral binding, entry, replication, and bursting, releasing mutated virions into the inter-cellular medium within a volume element, and (b) a single virion diffusion between volume elements (in this case on a 2-dimensional grid). — Figure 1. Viral model components and simulation events representing (A) viral binding, entry, replication, and bursting, releasing mutated virions into the inter-cellular medium within a volume element, and (B) a single virion diffusion between volume elements (in this case on a 2-dimensional grid).

For a very different kind of model, in collaboration with the RAND Corporation we constructed a model of stock markets, in order to examine policy and security questions of national significance.⁷ Our model includes multiple exchanges, various order types at each exchange, and seven different trading algorithms from the literature. In addition we represent trading messages and communication latencies explicitly. (See Figure 2.) To our knowledge this is the first markets model that includes so many trader types as well as the complete market communication infrastructure. We received favorable reactions when this model was presented to Treasury and SEC officials, and expect a funded project to result from those interactions.

Figure 2. an example stock market simulation demonstrating latency arbitrage across two exchanges. this model uses two exchanges, two designated market makers, two latency arbitrageurs (la) with low latency communication to the exchanges and the securities information processor (sip), and 200 noise traders. — Figure 2. An example stock market simulation demonstrating latency arbitrage across two exchanges. This model uses two exchanges, two designated market makers, two latency arbitrageurs (LA) with low latency communication to the exchanges and the Securities Information Processor (SIP), and 200 noise traders.

Impact on Mission

This project supports the Laboratory's strategic focus area in cyber security, space, and intelligence, and core competency in high-performance computing, simulation, and data science by developing a new, scalable discrete-event simulation system that will support the predictive modeling of complex systems critical to national security. This project also supports Lawrence Livermore's bioscience and bioengineering core competency by advancing computational biology tools for outcome prediction, especially using genomic sequences, related protein sequences, and protein structure models to help predict how viruses might evolve.

As a result of this project, we have committed to developing a new optimistic simulator, dubbed xpdes, which is based on Charm++, Backstroke 2.0, and the ns-3 open-source discrete-event network simulator. The ns-3 simulator is a production quality simulator, which will obviate addressing many limitations in the research-oriented ROSS simulator, and it also provides a high quality, comprehensive communication modeling library, which is an important application area that will directly benefit from optimistic synchronization and dynamic load balancing. Based on the work accomplished in this project, and our plans for xpdes, we have already received funding for one communications modeling project from a DoD sponsor, and are finalizing plans for a second project from a second DoD sponsor.

Also as a direct result of the university collaborations with the University of Illinois and Rensselaer Polytechnic Institute, the Laboratory has hired three recent Ph.D. graduates.

Conclusion

As noted above, the work accomplished under this project is directly responsible for renewed interest in applying parallel discrete event simulation to problems of national significance in communications modeling, biology, and economic systems. We are just beginning to see the fruits of collaboration and agency sponsorship in these areas.

References

Barnes, Jr., P. D., et al., “Warp speed: Executing time warp on 1,966,080 Cores.” Proc. 1st ACM SIGSIM Conf. on Principals of Advanced Discrete Simulation. ACM, New York, NY (2013).
Schordan, M., et al., Automatic generation of reversible C++ code and its performance in a scalable kinetic Monte-Carlo application. ACM SIGSSIM PADS, Banff, Alberta, Canada, May 15–18, 2016. LLNL-CONF-681318.
Kalé, L. et al, “Migratable objects + active messages + adaptive runtime = productivity + performance.” (2016). http://charm.cs.illinois.edu/newPapers/12-47/paper.pdf.
Mikida, E., et al., Towards PDES in a message-driven paradigm: A preliminary case study using Charm++. ACM SIGSSIM PADS, Banff, Alberta, Canada, May 15–18, 2016. LLNL-CONF-687005.
Kostova, T., “Interplay of node connectivity and epidemic rates in the dynamics of epidemic networks,” J. Diff. Equat. App. 15(4), 415 (2009). LLNL-JRNL-404031. http://dx.doi.org/10.1080/10236190902766835
Yeom, J.-S., et al., Simulating the evolution of RNA viruses, SC16, Salt Lake City, UT, Nov. 14–17, 2016.
Barnes, Jr., P. D., and P. A. Dreyer, Jr., Modeling the security and resilience of a coupled economic system with high-performance computing. Livermore, CA (2016). LLNL-GS-2016-0055.

Publications and Presentations

Barnes, Jr., P. D. and P. A. Dreyer, Jr., Modeling the security and resilience of a coupled economic system with high-performance computing. (2016). LLNL-GS-2016-0055.
Mikida, E., et al., Towards PDES in a message-driven paradigm: A preliminary case study using Charm++. ACM SIGSSIM PADS, Banff, Alberta, Canada, May 15–18, 2016. LLNL-CONF-687005.
Nikolaev, S., et al., “Pushing the envelope in distributed ns-3 simulations: One billion nodes.” WNS3 '15: Proc. 2015 Workshop on ns-3, ACM, New York, NY (2015). LLNL-CONF-667630.
Schordan, M., et al., Automatic generation of reversible C++ code and its performance in a scalable kinetic Monte-Carlo application. ACM SIGSSIM PADS, Banff, Alberta, Canada, May 15–18, 2016. LLNL-CONF-681318.
Smith, S. G., et al., Improving per processor memory use of ns-3 to enable large scale simulations. WNS3, Barcelona, Spain, May 13–14, 2015. LLNL-CONF-667822.