Wednesday, November 11, 2020

Benford’s Law and the US 2020 Presidential Election Votes

Benford’s law states that if you get a big range of data from the real world and you look at the lead digit of each of the values you get significantly more 1s than other digits if the numbers span multiple magnitudes.

As one application, Benford’s law is used to detect fraud in accounting. There typically, the pairs of the two first digits are analyzed and plotted according to their frequency in order to detect anomalies. An anomaly can have different explanations though.

For example, in the US 2020 presidential elections, the proportion of digits 1 and 2 on first digits for votes for Mr. Biden is lower than expected, while for votes for Mr. Trump the proportion of digits 1 and 2 on first digits is slightly higher.


In the video below, Matt Parker analyzes the situation and shows that the more densely populated areas in the US, where a majority of Mr. Biden's votes are coming from, have precincts with mostly the same size. Thus here the condition of having data spanning multiple magnitudes is not fulfilled, hence we get a distribution of first digits that deviates from the prediction by Benford’s law.

When looking at the frequency of the last digits, there is an anomaly in the voter data for Mr. Trump. Instead of having a roughly equal distribution of frequency of last digits, the lower digits are much higher. This is due to the fact that a majority of votes for Mr. Trump come from smaller precincts thus favoring the smaller numbers.

 

Thus, the deviation of voting counts (from precincts with a standardized size) from Benford’s Law is not an indicaton of voter fraud but rather a phenomenon to be expected.

Further reading:
Deckert, J., Myagkov, M., & Ordeshook, P. (2011). Benford's Law and the Detection of Election Fraud. Political Analysis, 19(3), 245-268. doi:10.1093/pan/mpr014

Wednesday, September 9, 2020

Swarm Intelligence and Cyber-Physical Systems


Swarm Intelligence (SI) is a popular multi-agent framework that has been originally inspired by swarm behaviors observed in natural systems, such as ant and bee colonies. In a system designed after swarm intelligence, each agent acts autonomously, reacts on dynamic inputs, and, implicitly or explicitly, works collaboratively with other swarm members without a central control. The system as a whole is expected to exhibit global patterns and behaviors.

When is it advantageous to use a Swarm approach?
The scaling principle depicts a range where a swarm
outperforms a linear system of the same size

Although well-designed swarms can show advantages in adaptability, robustness, and scalability, it must be noted that SI system have not really found their way from lab demonstrations to real-world applications, so far. This is particularly true for embodied SI, where the agents are physical entities, such as in swarm robotics scenarios.

In the paper 

Melanie Schranz, Gianni di Caro, Thomas Schmickl, Wilfried Elmenreich, Farshad Arvin, Ahmet Sekercioglu, and Micha Sende. Swarm Intelligence and Cyber-Physical Systems: Concepts, challenges and future trends. Swarm and Evolutionary Computation, 60, 2020. (doi:10.1016/j.swevo.2020.100762)

we start from these observations, outline different definitions and characterizations, and then discuss present challenges in the perspective of future use of swarm intelligence. These include application ideas, research topics, and new sources of inspiration from biology, physics, and human cognition. To motivate future applications of swarms, we make use of the notion of cyber-physical systems (CPS). CPSs are a way to encompass the large spectrum of technologies including robotics, internet of things (IoT), Systems on Chip (SoC), embedded systems, and so on. Thereby, we give concrete examples for visionary applications and their challenges representing the physical embodiment of swarm intelligence in

  • autonomous driving and smart traffic,
  • emergency response,
  • environmental monitoring,
  • electric energy grids,
  • space missions,
  • medical applications,
  • and human networks.

In the future, swarm-based applications will play an important role when there is not enough information to solve the problem in a centralized way, when there are time constraints which do not allow to find an analytical solution, and when the operation needs to be performed in a dynamically changing environment. With an increasing complexity in upcoming applications this will mean that SI will be applied to solve a significant part of ubiquitous complex problems.

Monday, July 27, 2020

Swarm Robotic Behaviors in Real-World Applications

Spiderino - a low-cost robot for swarm
research and educational purposes
With potential benefits from self-organization (e.g., resilience, scalability, and adaptivity to dynamic environments) the motivation is strong to apply swarm robotics in industrial applications. While there exist several swarm robotics research platforms that are developed for educational and scientific purposes, many industrial applications still rely on centralized control. Moreover, in cases where a multi-robot solution is employed, the principal idea of swarm robotics of distributed decision making is often not implemented. To address this topic, the paper

Melanie Schranz, Micha Sende, Martina Umlauft, and Wilfried Elmenreich. Swarm robotic behaviors and current applications. Frontiers in Robotics and AI, 7(36), 2020. (doi:10.3389/frobt.2020.00036)

The e-puck, a robot designed for
education in engineering
provides a collection and categorization of swarm robotic behaviors. Along with this taxonomy, the paper gives a comprehensive overview of research platforms and industrial projects and products, separated into terrestrial, aerial, aquatic, and outer space. In a final discussion, the authors identify several open issues including dependability, emergent characteristics, security and safety, communication as hindrances for the implementation of fully distributed autonomous swarm systems.

The paper was published as part of a Research Topic on Designing Self-Organization in the Physical Realm in the Frontiers in Robotics and AI journal.

In another paper in this issue,

Danesh Tarapore, Roderich Groß, and Klaus-Peter Zauner. Sparse robot swarms: Moving swarms to real-world applications. Frontiers in Robotics and AI, 7(36), 2020. (doi:10.3389/frobt.2020.00083)

the authors address a common property of swarms: the underlying assumption that the robots act in close proximity of each other (for example a few body lengths apart), and typically employ uninterrupted, situated, close-range communication for coordination. Many real-world applications, including environmental monitoring and precision agriculture, however, require scalable groups of robots to act jointly over larger distances (e.g., 1000 body lengths), rendering the use of dense swarms impractical. Using a dense swarm for such applications would be invasive to the environment and unrealistic in terms of mission deployment, maintenance, and post-mission recovery. To address this problem, the paper proposes a sparse swarm concept, which is illustrated via four application scenarios.

Monday, May 11, 2020

Remember the Conferences?

After a couple of weeks in self-isolation due to the global epidemic of the COVID-19 virus, we are getting used to having conferences entirely online. To cheer you up we are posting some impressions from one of our last conference visits that actually took place physically.
Casa Convalescència

We had been at WiMob 2019, the 15th International Conference on Wireless and Mobile Computing, Networking and Communications. The event brought together top researchers and practitioners and created a forum for the exchange of experience and knowledge among researchers and developers concerned with wireless and mobile technology.
In addition to presenting our paper at a top conference, we also enjoyed the nice conference place - the event took place in Barcelona and was organized at the venerable Casa Convalescència. The building is one of the great works of Catalan Modernism, and was declared Historical Artistic Monument in 1978 and World Cultural Heritage Site by UNESCO in 1997. The building is part of the historic site of the Hospital de la Santa Creu i Sant Pau.
Plenary at WiMob'19

Being in such an inspiring environment, the conference went great. Martina Umlauft presented our paper "Topology Characterization for Position-based Wireless Network Topology Generators" in front of an interesting crowd with great success. In the paper, we discuss methods to characterize network topology based solely on the spatial positions of the nodes on the terrain are necessary. Topologies are usually characterized in terms of their network graph; usually by investigating their degree frequency, rank/degree, or hop/count distributions. Wireless network simulation, on the other hand, typically does not use network graphs. Instead, in most wireless simulations, nodes are first positioned on the terrain based on some positioning algorithm and then a radio propagation model is used to determine connectivity dynamically at simulation run-time. We propose several metrics and show how they can be used to evaluate position-based topologies: the nearest neighbor distance distribution, a threshold, and a probabilistic node degree measure, and the application of an inhomogeneity measure for spatial distributions.

Please find the presented paper here:

Martina Umlauft and Wilfried Elmenreich. Topology Characterization for Position-based Wireless Network Topology Generators. In Anna Maria Vegni, editor, 15th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob 2019), Piscataway (NJ), Oktober 2019. IEEE.

We hope this little report helped to remember the better times. See you hopefully soon at the next real conference ;-)


Wednesday, April 1, 2020

Working from home in times of Corona

To be productive, you need top-of-the-line equipment
With the current numbers of Corona infections ramping up in almost every country, we can expect the duration of self-isolation and work from home for an indefinite period. So everybody should probably get familiar with working from home. I will tell you my secret: For successfully working from home you need some top-of-the-line equipment as your working computer. First of all, you need a lot of RAM, at least 128 kB. Forget what your friends say, 64 kB is not enough! Furthermore, you need a fast processor to handle the vast amount of data stored in the big RAM. Therefore, I have not only one, but two processors in my computer, running at up to 4 Mhz processing speed. You should also have a dedicated monitor, not a built-in display. In my setup, I can quickly switch between two different resolutions with a button on the monitor which is relaxing to my eyes.
Writing a program for visualization of Corona cases
But it is not only the hardware but also the software. With the advanced built-in programming language, I can easily visualize a graph. I quickly wrote a program visualizing the development of COVID-19 cases in Austria from March 1 to March 31. As you can see, there is an exponential development in the number of cases, which is why the graph has to rescale several times.

Results viewed on the high-resolution monitor
You can find my program, including the source code, at the CSDB repository.

If your computer can't run the program, it's probably because you don't have enough RAM or processing power on your system. But remember, today is April 1st, a good day to order a better computer.

Friday, March 6, 2020

Why it is important to share your code and make your paper accessible

In a recently published paper on "Making simulation results reproducible - Survey, guidelines, and examples based on Gradle and Docker" we asked researchers from all levels about their willingness to share the code of their simulations together with the paper. A little bit to our surprise, the answers were mostly very positive about sharing.
Still, we are currently far from a situation where every published paper is made accessible and has its code shared with the publication. Apart from the obvious cases where an industrial project might require confidentiality of some details, the most frequent reason is probably some laziness or other giving the effort to refactor your code properly for a publication a low priority. At the moment, not publishing the code is the normal habit, while providing the code is still an exception.
This needs to change for several reasons:
  • From the perspective of the researcher who reads the publication, having access to the code eases the understanding of the approach and allows building upon the work of others. The frequent argument that the code would be provided on request is mainly a lip service - first, it adds uncertainty to the reader if and when they would get the code. Second, the provider of the code might not have it prepared. Imagine the effort to dig up some code you wrote ten years ago and to make it proper so you can pass it on.
  • From the perspective of the researcher who publishes a paper, the chance to get their work read, appreciated and cited is much higher if they provide code and materials with it. Considering the time, money and effort that is put into publishing a paper, the effort of also publishing the code is well justified.
  • From a system's perspective, it is of utmost importance that we support each other. It does not make sense that brilliant minds spend time recreating implementations that have been done already. Reproducing research results is, of course, an important factor in science, but the overall ability to reproduce results and check an approach for errors increases with the possibility to have an insight into the code. 
For the same reasons, it is important to have our papers available online instead of locking them behind a paywall. Even if your university pays for the access to some literature databases there are many potential readers of your work that don't enjoy such a service, be it that their university does not provide such access or that they work from a different network at the instant they want to read your paper. My recommendation: go for open access! This could be the gold open access, where the journal provides open access to your paper at their website, however, this "gold" is usually expensive. Alternatively, several publishers offer a green open access model where you are allowed to keep a pre- or post-print version of your paper online at your private or your institution's webpage. To check if a particular publisher offers such a policy, look them up at this page about Publisher copyright policies & self-archiving.

Further reading:
Wilfried Elmenreich, Philipp Moll, Sebastian Theuermann, and Mathias Lux. Making simulation results reproducible - Survey, guidelines, and examples based on Gradle and Docker. PeerJ Computer Science, 5(e240):1–27, Dezember 2019. (doi:10.7717/peerj-cs.240)


Friday, February 7, 2020

The Wisdom of Crowds or Can many mediocre measurements produce a single good one?

In the introduction of his excellent book "The Wisdom of Crowds" James Surowiecki tells the story of the British Scientist Francis Galton who went to a country fair in 1906. There, a guessing game took place where one had to guess the weight of a bull.
800 people purchased a ticket and delivered their estimates on paper. Galton, who was curious about all kinds of things (I already told you he was a scientist, right) borrowed the tickets afterward and analyzed the results. He was expecting the average of the guesses to be far off because for each expert in the crowd (like for example a butcher) there were for sure a couple of inexperienced people. However, to his great surprise, the average guess of 1197 pounds was very close to the real result of 1198 pounds! This story, by the way, also documents the increase in weight of livestock - today a bull would be around twice as heavy!

If you are interested in the further aspects of collective human intelligence, I recommend reading the book:

James Surowiecki, The Wisdom of Crowds, Anchor, 2005.

But I was less interested in people, instead, I was wondering if this can be used for combining sensor measurements.

A couple of years ago I worked on a method for combining measurements from sensors with different accuracy. Translated to the story above this would mean if we know who are the experts and who are not, should we even bother to include the results of the latter? Actually, the answer is yes - given that the estimates have low correlation! But other than in the story above, the best way is to do a weighted average of values. The weights are derived from the error variance of the estimates, so a sensor with high error variance should get a low weight and a sensor with low error variance should get a high weight.

The resulting formula is surprisingly easy:
The paper explaining the approach in detail and showing how this can be integrated into a sensor network can be found here (link leads to freely accessible PDF):

W. Elmenreich. Fusion of continuous-valued sensor measurements using confidence-weighted averaging. Journal of Vibration and Control, 13(9-10):1303–1312, 2007. (doi:10.1177/1077546307077457)

So, just in case you have to guess the weight of a bull at a country fair, remember this approach :-)