The new law that will guide the future of information processing
The law of small numbers could impact the next generation of tools that deal with data.
Typically, the more samples of data taken, the more accurate the result. In probability, this is called the law of large numbers, and is the basis of flipping coins, the casino industry, and even information processing used for wifi and cell phones.
Distributing the task of data interpretation, however, creates an unintuitive phenomenon: fewer samples can provide a better result.
This is the new “law of small numbers,” being worked on collaboratively by researchers at the University of Michigan and University of Cambridge, and it could help power the distributed information processing required for future networks of robots, autonomous cars, sensors, and data centers.
“When you process information in a network, from what we call distributed information sources, then there is a correlation penalty to be paid for processing in long blocks of sample data,” S. Sandeep Pradhan, U-M professor of electrical and computer engineering, said. “We call this the law of small numbers.”
This new law will apply only for distributed information processing when information is gathered by a network of sensors, such as an array of security cameras. The information from all cameras needs to be aggregated at some point, but because the cameras’ footage will capture similar information, not all the data needs to be sent on to a central point.
“The goal is to exploit the relationship of the information that’s collected by the distributed sensors. That’s what we call distributed information processing.”
Distributed information processing is becoming more important as industries drown in a deluge of data. By keeping the work of processing raw data on network nodes, instead of centralized processing, distributed networks could keep up to demanding tasks such as monitoring and making decisions based on thousands of measurements in real-time environments for self-driving cars.
We believe the discovery of this new law of information is big, and will have an impact on the next generation of information processing networks.
S. Sandeep Pradhan
“Data processing is an extremely heavy computational task, so we don’t want to do all this data processing in a centralized fashion,” Pradhan said.
One application for the law of small numbers is distributed learning, or when a network needs to learn something based only on data that is local to its nodes. For example, a network could be asked to classify whether an image contains an animal. The nodes, or sensors, of the network might only see a part of the image, but they must make a decision based on their part. These decisions are then aggregated at a central processing unit, and a final decision is given.
Given the law of small numbers, the image sensors should avoid using large samples of data in their distributed processing for the most efficient and accurate result.
Working with Pradhan on this problem is Cambridge University Lecturer in Engineering Ramji Venkataramanan and U-M electrical engineering PhD student Mohsen Heidari. Venkataramanan, who was advised by Pradhan while earning his PhD at U-M, specializes in designing efficient algorithms for inference, compression, and communication.
“We now have different skills, as he has branched off in a new direction,” Pradhan said of Venkataramanan. “Collaborating with him and other faculty at both U-M and Cambridge will bring new perspectives and complementary sets of expertise to this problem.”
“We believe the discovery of this new law of information is big, and will have an impact on the next generation of information processing networks,” Pradhan concluded. “We expect new algorithms and protocols to come out of this.”