Structural Communication

Jan 15

Keywords: semantic communication, integrated sensing, channel estimation, pattern recognition.

Introduction

Structural Communication in fragmented forms appears in numerous contexts but there is a lack of publications drawing parallels, revealing and emphasizing its fundamental role. As a result, in the merging fields of information communication and AI, the research efforts, vision and understanding across the knowledge domains and modalities are not consolidated, do not enrich each other and are to a degree blind. This post aims to provide a consolidating perspective on online learning, integrated sensing, channel estimation, Level B (semantic) communication, human perception of different modalities, source separation, non-probabilistic AI, natural language understanding, question answering systems, stereo vision. Structural Communication is the concept that allows us to achieve this goal. Comparison with standard communication processes will be used throughout to demonstrate the concept.

Shannon introduced the Level B signal as a concept without giving any formalization. The meaning and semantics of a message, though, has been connected to Level B signal. Structure is a more specific term that has many potential realizations and is used here as a replacement for Level B signal. Moreover, at Silent AI most of the observations and properties described below were implemented for a very specific type of mathematical structures and this justifies the use of Structure and Structural Communication as a realistic embodiment of the concepts of Level B communication.

Types of Information

Shannon defined two levels of communication: Level A and Level B (Level C is not considered here). He did not mention that information transmitted on Level A and information transmitted on Level B are different types of information. Let’s think about what could be the difference between structural information and unstructured information. The second one is the classical notion of information measured in bits, raw bits, disregarding the underlying semantics. Structural information is a new concept and I want to demonstrate it with an example.

Imagine an art collection containing one billion paintings. Let’s assign an index to each of them in this catalogue. Each number from one to one billion requires only approximately 30 bits for representation. One can point to any of the paintings using just 30 bits. But does this mean that the painting itself occupies 30 bits? Clearly, even with compression applied, each image will occupy on the order of hundreds of kilobytes. This is the difference between a tag, a label, a name associated with a thing and structural information behind it. The description of all internals of a structure occupies a much larger amount of raw bits. Hypothetically, one can define a conversion rate between structural information bits and raw information bits. For example, a family of structures includes one billion different structures. Each structure is so complex that it requires 900 bits of description. Thus, we arrive at a ratio of 900:30, that is one semantic bit worth 30 raw bits of information.

So, instead of transferring all pixels of the painting, one can just transfer its index in the catalogue. But in order to do this the catalogue should be shared between the sender and the receiver in advance. This has consequences and in many communication scenarios the use of a pointer to actual data is completely not acceptable. In more detail these are considered in the next section.

There is a third type of information that emerges in the communication of structures. It is structure modulation (or deformation). This will be addressed below.

Need for Structure Communication

So, in the above example, we established that we have a choice of transferring a 30-bit tag pointing to a structure or we can transfer 900 bits of the full description of the structure. Let’s consider the consequences. Below are a few scenarios.

Online learning

In order to use the 30-bit tag the sender and the receiver need to share knowledge in advance. This can mean a shared database, a hardcoded procedure or this can mean that pretraining of a system is required (think training of neural nets) in which case training-to-inference data distribution shifts become a problem. Truly intelligent systems are often expected to acquire new knowledge during operation and this is only possible when new structures (new meanings) are communicated and understood by the receiver. What happens when a student learns a new concept in a lecture? The lecturer can spend an entire hour introducing just one concept to the students. That concept can be named by one word. It’s very easy to refer to that concept with one word after it is comprehended. But the concept itself requires an hour of talk and explanation. This is the transmission of structural information happening during the lecture.

When publications talk about more efficient communication that is enabled by semantic communication, what they really mean is that they want some sort of shared knowledge between sender and receiver and they want to navigate that shared knowledge through a domain-specific language relevant for this specific communication session. This is a different process from the process of transmitting the semantics of terms of that domain-specific language itself which is much more costly.

Human perception

Consider listening to music. Seeing note symbols on paper is quite different from experiencing the play itself. Humans actually need to live through the sound. So transferring the structure itself (in its audio form) is required, not the pointer to the structure. The same way, looking at a text prompt describing an image is different from looking at the image itself. Visual sense of colors and aesthetics is only available in the second case.

Channel probing

Communication channels often introduce distortions into the transmitted message. The classical approach to cope with them is to transmit a signal with special structure so that the distortions can be revealed on the receiver side and used to infer the characteristics of the channel. Another example of channel exposure is voice recognition. A human listener in parallel to recognition of the spoken text also recognizes the speaker's individual voice. Here, the uttered phonemes are the structures transmitted live (not tags) and their deformation is specific to the speaker and perceived as a voice. Imagine two speakers asked to utter a very simple sound, close to a sine wave. These two speakers would not be distinguishable while listening to this sound because the structure of the sine wave is too simple to probe the channel. We would rather want to listen to an uttered sentence to capture the voice character. So, the probe signal for human articulation apparatus is a much more complex structure. This view provides a new perspective on integrated sensing, radar signals, etc.

Harsh environment communication

In today’s communication systems, the channel probing signals are interleaved with user data chunks. The identified channel distortions are compensated when user data is to be decoded. This assumes that channel characteristics do not vary much between the probing signals. When the assumption does not hold, for example, underwater acoustic communication, user data transfer can be problematic. But notice that two people can easily talk in different environments, in a reverberating room, outside, via GSM channels, etc. Humans do not interleave actual phonetic content with probe signals to infer channel characteristics. It means that communication is possible that streams structures and receivers identify the structures and their deformations simultaneously. It is like having an alphabet of probing signals and communicating by them.

Pattern recognition

Here all sorts of recognitions are implied - speech, voice, visual objects, text meaning, etc. In the AI field recognition is often understood as pattern matching. But even Shannon’s definition says that transmission of meaning and semantics is the Level B communication. So pattern matching should be considered as identification of transmitted structure, the receiver’s task in the structural communication process. When reformulated this way, the need to do training on a dataset can become avoidable. Structural communication, understanding principles of how structures propagate through communication channels, is a path to non-probabilistic, generalizable AI. More about structure identification and non-probabilistic AI can be found here: https://www.silenti.ai/blog-research/intelligent-computation-paradigm.

Visual Metaphor

Let’s use a visual metaphor to demonstrate how a structure propagates through a communication channel.

Imagine a communication scenario in which a sender needs to send a certain amount of bytes to a receiver. For simplicity, we ignore the ordering of bytes and consider them a set. The communication channel is chosen in the form of an image. We assume for simplicity that the coordinate system origin is fixed in the center, axes are orthogonal and this is always known to the sender and the receiver. The sender converts each byte into coordinates of a point on this image using a fixed procedure. Thus, a set of bytes gets represented as a 2D point cloud on an image. The receiver then looks at the obtained image, measures coordinates of the points, runs the inverse conversion procedure and obtains the original bytes. See Figure 1.

Figure 1. Communication by means of 2D point clouds.

Figure 2. Channel distortions are not identifiable.

Immediately, one can notice a few problems: the sender and receiver do not synchronize the zoom and the orientation of the coordinate system. In addition, let’s imagine a nonlinear distortion that might happen in the channel in the form of a lens with its center in the origin. The receiver, looking at the point cloud, has no way to notice that data has undergone a distortion. See Figure 2 where a hypothetical received image is shown. Has rotation, scaling or lens distortion occurred during transmission? There are no clues to answer this question. Typically, a sender will run compression and/or optimal encoding schemes to remove all irregularities in its data to maximize channel capacity utilization. This means that the point clouds that are sent are made as uniform as possible. This makes it even harder for the receiver to have any clues about transformations that happened in the channel. This is a scenario specifically chosen to demonstrate the impossibility of transmitting raw data, with no meaning, no structure in it.

Figure 3. Lens distortion of a uniform point cloud.

Now, let’s add different forms of structure and see how it enables the transmission. First of all, if the sender sends a point cloud with a lot of points uniformly distributed and the receiver looks at this point cloud through a lens, the lens will change densities of points so that receiver can try to approximately infer the lens parameters. See Figure 3. Some blind channel estimation techniques exist that exploit this effect. They accumulate a large amount of data and assume the channel characteristics are constant for all this data.

The second strategy is to probe the channel by sending a pilot signal. This is not user data. This is of a special structure signal known to sender and receiver in advance that allows to expose the deformations happening in the channel. A possible example is given in Figure 4. In this case the sender first sends the pilot signal then it sends the data. The hope is that the channel does not change much between the transmissions. The receiver, looking at deformed pilot signal, can figure out rotation, scaling and lens parameters.

At this point it is convenient to showcase the third type of information - modulation of structure. For this purpose let’s assume that the channel is ideal and does not introduce any changes to images of point clouds. In this case the sender itself can artificially apply a scaling, rotation and lens distortion to the pilot signal that it sends. It only sends pilots frame by frame each time changing the scaling, rotation and lens parameters. So these parameters themselves can carry the user data. The receiver can identify them, because the structure is chosen such that this is possible.

Figure 5. Using local structures to enhance channel estimation.

The third strategy that can be taken is to upgrade each point into a small structure itself. For example, we replace a point with a triple of points such that they are close to each other and form an equilateral triangle with center in the point we are upgrading. The sender and the receiver then agree on the size and orientation of the triangle. Thus transmitted images contain a cloud of triples of points. A possible received image for this scenario is shown on Figure 5. Receivers can now try to identify the triples, infer the local deformation characteristics based on observed lengths of their sides, and have some clues about the change of scale and rotation. Averaging vertices of triangles, it can also be more precise about the location of the center, the actual data to extract. In the case of distortions that include scaling, rotation, and lens this strategy alone might not enable correct unambiguous decoding and would need to be combined with i.e. the second strategy. Such strategy is analogous to using error-correcting codes and joint channel estimation and decoding techniques.

In a mature communication system all the strategies can be combined together to achieve the optimal performance. Notice that when the pilot structure is sent the user data is not encoded into it and when user data is sent a (decent) structure is not encoded into it. Looking at Figure 6, a human can easily read the text through the channel distortions and simultaneously infer the distortion characteristics. There are no pilots in this image. The text both has the structure and carries user data at once. In other words, this is structural communication.

Figure 6. Text is readable through distortions.

Structural Communication Scheme

Figure 7. Principal diagram of structural communication.

Let’s consider Figure 7, where the principal scheme for structural communication is shown. A sender transmits a structure to a receiver. Distortions can happen in the channel and are to be identified by the receiver simultaneously with the transmitted structure. Sender and receiver share knowledge about what the family of structures is to be transmitted and what family of distortions is that they expect to happen in the channel. Structure is a Level B signal and it should be converted to a lower level representation or description, Level A signal. For example, these can be an idea in someone’s mind and its verbal expression in a language. To be transmitted over a medium, the description is to be transformed into a physical signal that propagates through it. During propagation, noise and distortions can alter the signal. On the receiver side, it observes the altered physical signal, demodulates it to an altered Level A description and then identifies the structure and the distortion, Level B signal.

Apart from communication of structures themselves, there is an option for the sender to deliberately modulate the structure parameters (if the family of structures has them), the description of the structure (Level A) and the final physical signal. The last two are expected to have an effect similar to the effects of distortions in the channel so special care has to be taken to ensure that families of modulations and distortions allow separation of their effects on the receiver side. Another option is to make them separable due to time scales (frequency bands).

For cases where modulation takes place the receiver should also know the family of modulations and simultaneously identifies structure, distortion and modulation.

The mainstream ML approach to cope with distortions is to augment the train set with distorted samples. In contrast, joint channel estimation and decoding, turbo equalization, bundle adjustment in stereo vision are examples of more advanced approaches. They exploit constraints enforced by the structure on the Level A representation and physical signals. They have a precise model of structure and distortion, so they can use a principle: what does not fit the structure comes from distortion, and what does not fit distortion comes from structure. AI and ML fields need to grow toward the structural communication principles.

Expected Properties of Structural Communication

Semantic communication has not been formally defined yet. Intuitively, it brings certain intelligence into the communication process. Some desirable expected properties that would be enabled by intelligent communication are listed below. Level A signal is denoted as A-signal; Level B signal – as B-signal.

Noise robustness and data integrity awareness. Presence of B-signal in A-signal implies a degree of redundancy, consistency, agreement between parts of A-signal. This can serve the purpose of error checking, error correction, filling gaps in A-signal, checking for “logical” completeness or incompleteness (nonsense) of A-signal.
Multiplexing and source separation. In the presence of several A-signals mixed or interleaved because of sharing the same medium and each of which has a distinct B-signal it is possible to do “source separation”, identify each of B-signals and trace back its A-signal contribution in the observed mix.
Distortion identification. Presence of B-level allows one not only to cope with random noise or corruption but can be used to identify distortions of A-signal. Distortions here differ from random noise in that there is some sort of a parametric model that describes alteration of A-signal in a deterministic way. Identified distortions can later be compensated or they can carry information themselves.
Variability and invariance. It is expected that the same B-signal allows a degree of freedom and variability of its Level A representation. And, in the opposite direction, A-signal is expected to have some form of essential features, invariants, compressed representation, that captures the B-signal expressed through it.
Intelligent communication. Passing around B-signals is often the goal of communication. It requires synthesis of corresponding A-signal on the transmitter side and recognition of B-signals on the receiver side due to physical limitations of transmission channels (media).

Through the prism of these properties one can analyse algorithms, classifiers, detectors, ML and AI system intelligent behaviours. Let’s consider a few of them. Table 1 contains the summary. It has a column for each of the desired properties. The rows correspond to the selected algorithms and methods.

Table 1. Presence of intelligent communication properties for selected algorithms.

More detailed analysis for each of the algorithms follows below.

Hough transform

It is designed to search for straight lines in images. Level A is represented by pixels and semantic Level B consists of the lines that the method aims to detect. The method organizes a special voting procedure where each pixel from Level A image contributes to a finite possible set of lines (structures from Level B) it could belong to, so that if a line received a critical number of votes at the end it is considered as “detected”. Analysing this method from the perspective of the desirable properties listed above:

Noise robustness – achieved to some degree. Randomly positioned dots would not interfere with well linearly aligned and large conglomerates of pixels.
Integrity awareness – very low. The first problem is that there is no way to define a threshold for the number of votes to satisfy all cases. Second, the method does not differentiate between dashed long lines, a series of dots along a line with large spacings, a continuous segment of line, or multiple segments along one line.
Multiplexing, source separation – present. Many lines on the same image can all be identified.
Distortions robustness – absent. In cases when, for example, a line is deformed due to a camera lens, there is no way to accommodate this distortion into the voting procedure.
Variability and invariance – present. The method detects a line independently of where the dots are exactly located on it, it just needs enough of them to pass the threshold. The method also provides flexibility in B to A projection by allowing variations of pixels to represent any given line.
Intelligent communication – present. One could organize transmission of lines via images with pixels belonging to them.

MIMO beamforming

In the case of multiple antennas used on the transmitter and receiver sides of a wireless communication channel there appears to be an extra ability to have multiple paths of propagation for the signal. The transmitter can send data along a specific path and the receiver can focus its reception on a specific path. This example is about this aspect of MIMO communication – the multiple configurations of beamforming. They are seen here as the possible structures in the transmitted signal. For example, there can be six paths available for communication and this means there are six Level B structures. The content that is transmitted is Level A signal and it does not matter, only the chosen path is relevant. Because MIMO is not about transmitting user data by switching active paths there is a red star * in the table given above saying that this is not an intended usage scenario. Analysing this case from the perspective of the properties:

Noise robustness – present. Focusing on a specific path reduces interference and can filter out noise coming from other paths.
Integrity awareness – present to a low degree. The beamforming procedure of the receiver always outputs a signal. How can it decide between an active or passive state for a beam? Two options: first, do thresholding of received signal energy which is far from the integrity awareness; second, run decoding of the received data and base on its content which is a functionality beyond the beam focusing mechanism itself.
Distortions robustness – absent. If the wireless channel conditions change, the beamforming configuration can become outdated. The receiver will not be able to identify distortions so as to reengage with the paths. This is analogous to the Hough Transform example where if a straight line was deformed by a camera lens then it would not appear on the receptive field of detected lines.
Intelligent communication – can be seen as present. The fact of transmission over each of the paths can be seen as a fact of presence of this B-level structure in the input. Although the engagement of different paths could be used as a mechanism for signaling it is not the way MIMO beamforming is used today.
Variability and invariance – can be seen as present. Variability is possible because any signal sent along a chosen path activates this path as a structure. Same applies to invariance, because it would not matter which exact data is transferred, the important thing is only which paths are engaged.
Multiplexing, source separation – present. Multiple paths can be engaged and detected simultaneously.

Beam search, Viterbi decoding

The purpose of these algorithms is to account for the contextual level of information. For example, a system is required to transcribe an audio record into the text. The system could use a word recognizer that takes a segment of audio and produces a list of most likely words spoken in that segment with the corresponding probabilities. If one only relies on this classifier, the context is not taken into account. This would be a Level A communication. Such a recognizer will lead to many errors in recognizing uttered sentences. If instead Level B is taken into account, the system can evaluate the overall probability of the full sequence of words and find the most probable one. Analysis:

Noise robustness – present. Because full sentences are taken into account, local word errors can be mitigated to some degree.
Integrity awareness – present to a low degree. The method does not have a criterion for meaningfulness of a sentence beyond statistical probabilities.
Distortions robustness – absent. The decisions made during beam search do not provide information back to the word classifier. There is no top-down flow of information. Only bottom-up. So there is no mechanism to guide distortion model identification and run joint optimization.
Variability and invariance – absent. Beam search outputs hypotheses that are most probable but it does not have a notion of Level B structures themselves and comparing or classifying them is completely out of scope. There is a form of invariance in the sense that multiple possible words with probabilities can result in the same final top hypothesis.
Multiplexing, source separation – absent. If one interleaves audio segments of words from multiple sentences, the beam search has no way to disentangle them based on its measure of sentence meaningfulness.
Intelligent communication – absent, not applicable.

Hyperdimensional computing, vector symbolic architectures

Brief analysis:

Integrity awareness – present to a low degree. This is because the criterion is based purely on the presence of a specific vector in a database of known vectors. There is no internal structure in the hyperdimensional vector that would make it meaningful. In fact, the vectors are generated randomly.
Multiplexing, source separation – present. This is the gist of these techniques: to provide a space and mechanisms that allow multiplexing and source separation.

Error-correcting codes

Brief analysis:

Integrity awareness – present to a certain limit. Ensured due to introduced constraints to the data representation.
Intelligent communication – absent. The structure carried by the codewords is the same for all of them. The chosen error-correcting scheme is the structure and the Level B signal here. The codewords are Level A signals. Imagine a communication protocol in which the scheme of ECC itself is the signal to be transmitted. Then codewords obeying rules of a scheme chosen for transmission play the role of the structure description, the Level A signal carrying the structure. The receiver has to identify the scheme based on several received codewords. Such a scenario would be considered as a structural communication process. In standard ECC, though, this is not what happens.

QR codes

These have several non-connected structural aspects. Grid layout serves the purpose of the pilot signal considered above in the Visual Metaphor section and allows it to cope with deformations. The pixel content has error-correcting code involved that ensures data Integrity awareness but is irrelevant for distortions identification. Neither of these aspects achieves all the desired properties so ECC was chosen to fill in the table above.

Turbo equalization

Brief analysis:

Distortions robustness – somewhat present. In these methods, the process of joint decoding and improving channel estimates works only in combination with original channel estimation due to pilot signals.

SilentAI’s Melody Transmitter and Smart Textures

These are based on our proprietary algorithms. They were built keeping Integrity in mind and therefore satisfy all the desired properties of structural communication. More details about Melody Transmitter can be found here: https://www.silenti.ai/blog-demos/melodies. More details about Smart Textures can be found here: https://www.silenti.ai/blog-demos/textures.

Structures Are Illusions

In order to communicate at Level B one needs to have two levels of communication. If only one level is present then it is always Level A. For example, we use standard communication protocols and transmit an image of a cat and meta-information about the image in the form of the word “cat”. Is this semantic communication? Not in the sense of structural communication introduced here. Raw bits of the image are not different from raw bits of the “cat” string. Both reside in Level A.

If a physical signal is observable the question of the existence of structure behind it is more entertaining. A Level B signal is always emergent, it is never embodied into observable in a way the Level A signal does. It is an illusory signal that only appears inside the sender and receiver that are ready for this. Relation between physical fields and matter can be taken as an example. While matter is observable, the fields are virtual, imaginary. Fields control the arrangement of matter but do they exist or are they abstractions? Note that all interactions known to physics are carried by fields, that is matter interacts via fields. So, communication, if seen as a process of influencing a receiver by a sender, could happen via fields, semantic fields (not yet discovered), structural signals, Level B signals.

Level B signal manifests itself in the arrangement of pieces of the Level A signal. The arrangement is the constraints that a structure exposes onto its Level A description.

There is a gap between matter and arrangement of matter, between observables and structure hidden inside observations. They belong to different worlds and different senses.

Sixth Sense

In addition to five well-known senses human beings have another one which is generally not recognized. It is the sense of Integrity, the sense of presence, absence, correctness or corruption of a structure. For example, one reads a paragraph of a text and feels that it does not make sense. This feeling is a meta-knowledge, it is independent of the actual topic of the paragraph. Other paragraphs in other knowledge domains can lead to exactly the same feeling of not making sense. When a structure can be identified in the set of facts it is felt as making sense. When a structure satisfying a set of facts is impossible it is felt as nonsense. When a structure can potentially yet be built but more facts are required it is felt like a desire to ask follow-up questions.

Question answering systems could identify a set of facts implied by a question and complete it with facts from a database such that an unambiguous structure emerges. This would be the answer to the question. Otherwise, what is the formal definition for “question” and “answer”?

The sense of Integrity works for all modalities and most likely exists for multimodal signals. Harmony and beauty in music and art tell us about the presence of Integrity. Conscience is another example. Absence of structure in any modality is perceived as a noise and does not hold human attention. Loss of Integrity feels like a pain.

More about Integrity and its connection to Structure can be found here: https://www.silenti.ai/blog-research/our-vision.

Purpose of Life for AI Agents

What could be a reason for an AI Entity to act? At this stage of AI development, the AI agents are reactive. They do not have their own motivation to act. They also do not have infancy, childhood, teenage, adult, wise phases of their lives. Human beings spend a good share of their lives asking questions and searching for answers. We are machines, searching for Integrity. In a sense, we are born incomplete and aim for completeness during our lifetime. What if the same principle is used for AI agents? Let’s embed in them a motivation to act in the form of a task to search for Integrities (structures)? At birth, an AI agent is given a bag of facts (this will define its innate personality). It is also given an ability to collect observations during its existence (in form of facts), analogs of senses will provide them. It also has a hardcoded task of searching for as many structures as it can, to walk around, to collect facts, and identify facts that compose structures. This continues as long as there are facts that “do not make sense”. Such an agent would have a lifetime of a self-motivated activity. This resembles human beings. But the Agent can have a controlled policy of when to gather random facts from the outside world, how much, when to execute the search for structures, etc. This can continue forever.

Such an objective for an AI agent could make it safe, not harmful to us. Because it cares about assembling rather than disassembling (destroying).

Conclusion

This post touched on a broad range of topics showcasing the central role of Integrity (structure) and drawing analogies across the board.

Structural communication is a process of transferring an Integrity from the mind of a sender to the mind of a receiver by disassembling the Integrity to a description that can fit into a communication channel and then reassembling it at the receiver side. Integrity ensures structural constraints of the description in such a way that channel distortions can be inferred on the receiver side. Transmitted structures and their distortions are two different types of information that can be transmitted simultaneously.

Structural communication gives a unifying view at the hardest questions that AI researchers and philosophers are puzzled by.

A list of desired properties of an intelligent communication system is given and many algorithms are analysed from their perspective.

We did not touch on the formal definitions of Structure. It has to make the desired properties possible and this is a separate topic.

The role of structural communication is shown both for engineered systems and for human beings making it clearer how the former can become more like the latter.

Oleksandr Korostylov

Structural Communication

Introduction

Types of Information

Need for Structure Communication

Visual Metaphor

Structural Communication Scheme

Expected Properties of Structural Communication

Structures Are Illusions

Sixth Sense

Purpose of Life for AI Agents

Conclusion

Silent AI

Location

Contact

Structural Communication

Introduction

Types of Information

Need for Structure Communication

Visual Metaphor

Structural Communication Scheme

Expected Properties of Structural Communication

Structures Are Illusions

Sixth Sense

Purpose of Life for AI Agents

Conclusion

Intelligent Computation Paradigm

Silent AI

Location

Contact