My former colleague Boris advised me to read French book Pourquoi la société ne se laisse pas mettre en équations (Why society cannot be put into equations) and I must thank him for the advice 🙂
Author Pablo Jensen – his personal and wikipedia page will tell you more – is a physicist and his book tries to explain why equations in social sciences (even in physics by the way) may be tricky. Truth is a complicated topic. But whereas there is (some) truth in natural sciences which can always be revisited, the concept of truth in social sciences is even more difficult, just because the human behavior is full of feedback loops so that what is true today, not to say yesterday might be taken intro account to modify the future… If you read French, it is really interesting, if not, let’s hope for a translation soon.
As a really nice illustration of truth in physics, Jensen mentions how Galileo struggled with the mechanics of falling bodies [pages 42-5].
Source: Galileo’s notes on motion, Folio 116, Biblioteca Nazionale Centrale, Florence; Istituto e Museo di Storia della Scienza, Florence; Max Planck Institute for the History of Science, Berlin
Galileo never published his data as he did not understand why they looked wrong. The answer is a sliding ball does not have the same speed has a rolling one. Rotation absorbs a fraction of the energy (apparently √(5/7) or 0.84) which was close to Galileo’s apparent mistake.
Then Jensen reminds us of how difficult weather forecast and climate modification are (chapter 4). So when he jumps to social sciences, he is quite convincing about the reason why mathematical modeling may be a very challenging task. On a study about analyzing tweets to predict success, he writes the following: Le résultat de leur étude est clair: même si l’on connaît toutes les caractéristiques des messages et des utilisateurs, le succès reste largement imprévisible. Techniquement, seule 20% de la variabilité du succès des différents messages est expliquée par ce modèle, portant très complexe, et d’ailleurs incompréhensible, comme cela arrive souvent pour les méthodes utilisant l’apprentissage automatique. Il est intéressant de noter qu’on peut doubler le niveau de prédiction en ajoutant une seule variable supplémentaire. Il s’agit du succès passé de l’utilisateur. de son nombre moyen de retweets jusque-là. […] La vie sociale est intrinsèquement imprédictible, de par les fortes interactions entre les personnes. […] La masse des données permet d’opérationnaliser des vieux dictions comme “qui se ressemble s’assemble”, “dis-moi ce que tu lis, je te dirai qui tu es” et surtout “je suis qui je suis”. (The result of their study is clear: even if one knows all the characteristics of the messages and the users, the success remains largely unpredictable.Technically, only 20% of the variability of the success of the different messages is explained by this model, even if it is very complex, and in fact incomprehensible, as it happens often for the methods using machine learning. It is interesting to note that the prediction level can be doubled by adding a single additional variable. This is the past success of the userm through its average number of retweets so far. […] Social life is inherently unpredictable, because of the strong interactions between people. […] The mass of data makes it possible to operationalize old sayings such as “birds of a feather flock together”, “tell me what you read, I will tell you who you are” and especially “I am who I am” [Chapter 12, Predict thanks to big data? [Chapter 12, Predict thanks to big data? Pages 150-3].
An even more striking example of the incomprehensible nature of machine learning is about image recognition: the best way to predict the presence of curtains in a room was to identify a foot in a bed. Just because most bedrooms had both [Pages 154-5]. Jensen also criticizes the ranking of universities and researchers (pages 246-53), a topic I had addressed in the past in La Crise et le Modèle Américain. In chapter 20, “are we social atoms?”, he adds that for human beings [Page 263]: “for now, we do not know internal characteristics which are both pertinent and stable” without which analyzing human beings as a group becomes problematic.
Already in Chapter 13, Jensen explains that there are four essential factors that make the simulations of society qualitatively more difficult than those of matter: the heterogeneity of humans; the lack of stability of anything; the many relationships to consider, both temporally and spatially; the reflexivity of humans, who react to the patterns of their activity. […] No single factor produces anything on its own […] In the social sciences, a dense network of causal conditions is needed to produce a result. […] There is no guarantee that the consequence of [one factor] will be the same in other situations than it will be combined with other causal factors. The only possible answer is “it depends”. [Pages 162-4]
To discuss the greater or lesser stability that can be expected from these internal characteristics, we must go back to their origin. For the physicists, the answer is clear: the origin of the forces between atoms is to be sought in the interactions between these really stable particles that are the nuclei and the electrons. […] The origin of human actions is at the heart of sociology. Its first intuition was to seek the determinants of practices not in the inner minds of people studied by psychology, nor in a universal human nature, but in social influences. [Page 264] The social world is more like a swirling fluid than neat combinations of bricks. […] Of course this image is too simple, because it neglects the memory, the strong viscosity of the social. But it shows the limits of the static vision of the economy or of social physics, which start from individuals already made, ready to function, to generate social life by association, under the impulse of their independent natures. [Page 270] Social life does not therefore consist of a series of discrete interactions as the formula or the simulations suppose. Rather, it must be conceived of as the result of the unfolding of relationships. The distinction between interaction and relationship is crucial, because the latter involves a series of interactions followed between people who know each other and keep the memory of past exchanges. In a relationship, each interaction is based on past interactions and will in turn influence those that will come. A relationship is not a simple sequence of one-off interactions, but a process of continuous creation, of the relationship and therefore of the people involved. [Page 271]
Jensen does not say society cannot be analyzed, qualitatively or quantitatively. He gives a subtle analysis of the complexity of society and social behaviors which we should always remember before accepting as facts often too simplistic analyses from big data…