Algorithms, archives and evidence of culture

Last week I attended the Australian Society of Archivists Annual Conference. One of my jobs was to talk about the cultural perspectives of technologies. I talked about algorithms, machine learning, and constructions of evidence of culture. I think I scared people.

Below is a video of the extended talk. It goes for just over 10 mins. At the end, I talk about my Mediated recordkeeping model and how it might be useful in exploring these expanding contexts and complexities of culture.

I am keen to explore the role of machine learning in cultural heritage spaces. Who wants to help?

Transcript:

Slide 1:

Image: http://www.npr.org/sections/13.7/2017/03/22/521059752/would-you-become-an-immortal-machine

Hello, I am Dr. Leisa Gibbons from Curtin University. I teach archives and preservation to undergrad and post grad students. In my research, I explore sociotechnical issues, impacts and implications of acquisition and preservation of online content and the role that archivists can, do and might play in the formation of digital cultural heritage.

In this presentation I am going to share with you some intriguing information about algorithms and machine learning I have been collecting over the last year or so, so that I might talk about the nature and purpose of web archiving and how it is possible to understand evidence of culture as it is being valued and formed over spacetime.

Originally, this presentation was designed in PechaKucha style where 20 slides are shown for 20 seconds each. This presentation has 13 slides with the last one being quite a lot longer than 20 seconds.

Slide 2

Image: http://theconversation.com/no-more-playing-games-alphago-ai-to-tackle-some-real-world-challenges-78472

This year Professor Geoff Goodhill, from the University of Queensland wrote about AlphaGo, an AI program designed to learn to play Go. AlphaGo learns via use of neural networks and extraction of key ideas.

Slide 3

Image: https://www.forbes.com/sites/kalevleetaru/2017/09/16/ai-gaydar-and-how-the-future-of-ai-will-be-exempt-from-ethical-review/#3f3e339e2c09

You’ve probably heard about the algorithm created by Standford researchers that predicts sexual orientation from photographs of a person’s face? This is also generated with learning neural network technology.

Yet, as Professor Geoff Goodhill mentioned, there is no known way to interrogate the network to directly read out what these key ideas are that help the algorithm make decisions. Instead they can only study its outputs and hope to learn from these.

Slide 4

Image: https://ichef.bbci.co.uk/news/624/cpsprodpb/D8FD/production/_96194555_3c7a28f4-bf98-4df5-96ea-476616b896cd.jpg

A couple of years ago, Vladan Joler and colleagues in Belgrade began investigating the inner workings of Facebook. This image is a flow chart that they created on how our interactions with Facebook create data – which show how we, as Facebook users, are in fact doing unpaid work for Facebook – so they can sell us stuff.

We all know this of course, but perhaps we think less about what this might mean in 20 or 150 years time related to data privacy and surveillance when you think about the data we give Facebook is used to calculate our ethnic affinity (Facebook’s term), sexual orientation, political affiliation, social class, travel schedule and much more.

Slide 5

Image: https://aeon.co/essays/judge-jury-and-executioner-the-unaccountable-algorithm?curator=MediaREDEF

In 2013, a community of scholars and activists gathered in the US to examine and discuss the social justice impact of algorithmic accountability or #algacc. Tthey raised more questions than answers about the impact of data surveillance and our right to know what and how data collected about us is being used.

Slide 6

Image: https://pbs.twimg.com/media/CkmaUGgXEAAROUu.jpg

UCLA Assistant Professor Safiya Noble writes about algorithms of oppression and how the data they use to learn reinforces existing structures of racism and sexism. Safiya talks about how a Google search she undertook on the search term “black girls” often suggested porn sites and un-moderated discussions about “why black women are so sassy” or “why black women are so angry” – presenting a disturbing portrait of black womanhood in modern society.

Slide 7

Image: http://assets.pewresearch.org/wp-content/uploads/sites/14/2017/02/06143847/PI_2017.02.08_algorithms_0-01.png

Reseachers at the Pew Research Center identified seven main themes about the algorithm era.

As part of sharing these concerns they tell a story of how Microsoft engineers recently created a Twitter bot named “Tay” in an attempt to chat with Millennials by responding to their prompts, but within hours it was spouting racist, sexist, Holocaust-denying tweets based on algorithms that had it “learning” how to respond to others based on what was tweeted at it.

Slide 8

Image: https://media.newyorker.com/photos/5931822ff7120e02cf40436a/master/w_649,c_limit/Nijhuis-Big-Data-2.jpg

This year, US Professor Ben Shneiderman proposed that there should be a regulatory body called a National Algorithms Safety Board, which would provide oversight for high-stakes algorithms.

Slide 9

Image: http://www.abc.net.au/news/2017-03-20/algorithms-flowchart-illustration/8360072

In Australia, there are at least 20 separate parts of law that allow the government to give computers the power to make decisions. Decisions that used to be made by a human and can have important consequences.

These laws allow for computers to make decisions about social security, taxation, parental leave, superannuation, migration, biosecurity and child support. In every case, some kind of algorithm may be used to make decisions, yet we have no knowledge of how these work.

These are powerful and disturbing stories about the creation and use of data, the role the internet plays and the shaping role that mathematics and computers are playing in our society. This brings me to web archiving.

Slide 10

Image:

One of the most basic tenants of all data science is that data doesn’t exist in a vacuum, it is the result of a massive pipeline of explicit and implicit decisions

yet so much of the output of the data science world proceeds as if data can be cleanly separated from the contexts in which it is created.

Nowhere is this more apparent than the world of web archiving.

Researcher Kalev Leetaru, wrote an article for Forbes recently that starts with this paragraph. This was not his first dig at how poorly web archiving is conceptualised and constructed. He started in 2012 talking about the lack of documentation regarding even the most critical decisions like inclusion criteria, seed lists and third-party crawl donors means that we have precious little insight into how these archives were constructed and what biases may be manifest through those myriad decisions.

This is not a new conversation for me either. But algorthms and the rate of change in our virtual spaces and technologies are raising the stakes.

Slide 11

Image:

When it comes to using data to understand the world around us, the most important question revolves around how well that data reflects the phenomena we are attempting to study.

Kalev rightly asks questions about the nature of web archiving. When it comes to using data to understand the world around us, the most important question revolves around how well that data reflects the phenomena we are attempting to study. Do Twitter-based studies of human society truly reflect the dreams and fears of global society or are they systematically biased geographically and demographically? Do the breaking news events surfaced by the Facebook Trending Topics module exclude much of the continent of Africa and is Africa as a whole largely absent from the datasets we use to understand the world? Does the relative dearth of analytic algorithms for languages other than English mean we miss critical trends.

Slide 12

Image: https://specials-images.forbesimg.com/imageserve/52cd2b055def4f42b28d687712caf2aa/960×0.jpg?fit=scale

All this exploration of algorithms and the internet comes back to a question I have been raising for a decade now – what is evidence of culture? And in this question, what is the role of the archivist and the archives in the construction and dissemination of cultural heritage?

If web archives are online cultural heritage, how is their construction being understood and documented? As Kalev points out – does the medium examined define the results?

This raises the question of what web archives actually evidence of? But how do we interrogate the notion of evidence of culture?

Slide 13

Image: Mediated recordkeeping model

I want to share with you a model I created from research on how to understand the complexity of evidence of culture in online spaces. This model is an attempt to make sense of how and why people interact with recorded information – the purposes, the values, and the nature of memory as it is created, shared, accessed and managed over time in various and complex ways, including in response to technologies, other people and entities, and various mechanisms, systems and tools that help to enable and empower, as well as disempower and make hidden.

I want to share with you the three important areas it represents:

Firstly, memory and evidence as processes are separate but intrinsically linked. The processes of memory-making has a relationship to multiple systems, mechanisms and perspectives involved in establishing evidence.

Secondly, how people create is linked to how they see and identify themselves, what they are interested in, how they identify with various communities, as well as what values they perceive according to various community cultures. Narrative is vital to understanding this as it is a tool that can construct and communicate multiple and simultaneous realities, identify and make sense of the self within groups, community and society, and is imbued with power; of dominant, counter and competing narratives and as a mechanism for memory-making and knowledge preservation.

Thirdly, interaction occurs in conjunction with an understanding of action at various levels, as well as in relation to how people use, value and experience technologies including what technologies afford or do not allow to help people achieve their goals in creating and sharing something of who they are.

This model shows all these points of view to exist simultaneously and in multiples. How an individual understands their identity and work is not necessarily how it is seen by someone else. So when the archivist creates, in the creator dimension by documenting the world, they should be taking into account the varied, diverse and potentially incommensurable complexities that make up this map of how we understand cultural heritage as evidence of culture.

If we see algorithms as part of a continuum of mediated memories where and how do they fit in? Whose narratives are being told and what do we need to know about mandates to understand their contexts as memory? I don’t have any answers today but this is something I am about to examine.

But what my research into algorithms is beginning to reveal is the deep complex relationship and nature we have with data and machines. Recordkeeping is a memory-making process that contributes to evolving values, purposes and interactions over spacetime including memory (as making and remembering), narrative (as personal, sharing and evolving), evidence (as constructions of value and meaning) and technologies (as mediators and facilitators).

Archivists, and I count myself as one, need to consider what this means as to how we understand culture as evidence and heritage as it is being formed. Archivists also needs to understand and challenge their role in the system so that they may empower, discover and transform to meet multiple needs over time. Flexibility, adaptability and a need to understand what is being valued and who by as it is being created is essential to any transformation. That includes transformation within ourselves as professionals as well as the transformation of what role archives as constructions of evidence play in society.

Thank you.

Personal Digital Archiving Conference – Interconnectedness

This is my presentation developed for the 2017 Personal Digital Archiving Conference at Stanford University.

I had to shorten it for the conference, but have now recorded it in full. I also adapted it to fit with an online presentation. It is a PREZI so please click using the forward arrow to listen to me explain each screen.

The key ideas in my research is how personal memory systems (such as those that exist in how we manage stuff on computers, tablets, mobile phones and in online spaces such as social media) help to form collective memory (this term can include various conceptualisations of ‘collective’ but in this research it is primarily focused on what we might call traditional memory institutions).

In looking to explore the formation of memory systems from personal to collective I examine how value is constructed and contextualised by individuals who create and share digital content. By understanding value at the creator level it can provide deeper and richer insight into whose memory is being captured and preserved.

As a final note on terminology, I do not use the terms personal digital archiving, nor personal information management. I prefer to use the term recordkeeping and memory-making. These latter two terms encompass various aspects of what it means to create and manage information for various purposes, including to remember. I see information management is a form of memory management and control. Recordkeeping provides a way to construct the systems to manage and control. And recordkeeping is not necessarily about producing or managing authentic, reliable records or evidence in the sense of what is usually done by governments and organisations. We all do recordkeeping in some form or another using various tools and processes to do so, some more effective than others. Archiving activities or processes are just another kind of recordkeeping process, regardless of who does them. Recordkeeping is a process where recorded information is managed according to its value. The value could mean retention for an instant or forever (although the latter is highly unlikely in practice, but rather is an intention). Value is assigned or identified at various times. This is what this research was looking to find out more about.

ACIS Conference – digital cultural artefacts – who owns them?

An interesting project from Brett Leavy using gamification to preserve songlines in a digital cultural artefacts.

Brett tells us in this conference session that he works in a do-first, ask for forgiveness later practitioner-focussed way. He says the IP belongs essentially to the community (communities), but what about the game? If is a product – a commodity. What if a museum or archive wants to acquire and/or use the game? What about beyond Brett’s lifetime? Admittedly, the technology may not last that long, but the potential for preservation by an institution seems reasonably high.

I found this presentation and project particularly interesting because decisions about how preservation can be conceptualised and carried out can be widely different. The Monash Country Lines project is about a similar topic, but is conceptualised in a different way. Yet is also about preservation.

What both make me wonder if the digital artefacts, created from a perspective of cultural heritage and preservation, actually become the archive. The externalisation of stories presents an interesting idea about how it fits into the notion of cultural heritage within the community the stories come from. Listening to Shannon Faulkhead from Monash about the Country Lines at CIRN Conference a couple of weeks ago I got the impression that in this project the artefact is part of an ongoing archive (and narrative), not the embodiment of cultural heritage.

In the context of my own work what I am interested about is how these artefacts contribute to evidence – what are they evidence of? Whose evidence are they? One of the most interesting things about it is that they are evidence not only of indigenous stories, but also of use and knowledge of digital technologies. In the context of Brett’s game, it is evidence of the role that games play in current society in relation to learning, for access and to communicate. Brett harnesses the power of the game to present information. Is it preserving it though? What exactly is being preserved? Whose memory is it? It would be great to explore in more detail the construction of this project, as well as the Monash Country Lines project. Not to compare, but to explore how the decisions made in their inception and ongoing activities contribute to a diversity of cultural heritage and how.

Digital archives future?

I pitched an article about digital archive(s) to the Conversation recently, and was rejected – sort of, the idea was good, but apparently my writing had a bit too much jargon. Maybe it does, maybe it doesn’t.* I find it a real challenge to be able to accomplish academic writing for journals and developing conference presentations. Plus tweet and blog, and update the various academic websites.  The Conversation is part of strategy to develop a public intellectual profile, but I am just not sure how to write for this medium – I am not a journalist after all. So, whining over, I thought I would share the piece I wrote here and perhaps it might generate its own conversation and readers can give me feedback via the comments. It would be appreciated.

Memory, evidence and people

Technology is everywhere in our lives. It is equally disruptive, transformative and indispensable. We use screens to view, share and transform the past, present and future all at the same time. We explore new ways of seeing, capturing and remembering with apps, games and augmented reality. We are so much at one with our digital technologies they are like extensions of our memory and sense-making functions. Our technology is us – it is part of our evolving social and cultural identity. Researchers have already suggested that we already are cyborgs. What does all this radical change mean for archives and memory-making? If the internet is our collective memory, then whose remembering is it?

In my recent research on how people used YouTube as a memory-making space revealed how connected people are with the various technologies to create, upload and share video, but also used these same technologies to remember for themselves, for their community and for the wider groups of people who interact with online video. How they used the technology influenced what they wanted from it and how they evolved their online identity to support their narratives. This research outcome has two important implications for collective memory and the future of archives.

Firstly, people co-create with technologies, and that interaction is now part of culture – it is part of the narrative of society.  Technology is clearly evidence of us, but without being used, it is only technology. Related to this implication is an understanding that it is the interactions plus the outcomes that tells the story of an evolving society engaged with technologies. Yet, when we think of cultural heritage, archives and collective memory we think of things – artefacts. Objects are collected, described and displayed as evidence of culture, but there are some significant problems with this concept of proof, heritage and value, especially when it comes to co-created interactions and outcomes with technologies. How is the transformative use of technologies going to be remembered – what is evidence of culture?

My research into YouTube identified that archives and other memory institutions create their own evidence of culture by making decisions about what they think are significant as heritage. This practice is based on a history where institutions document events by collecting everything about them (often referred to as special collections), but it has some serious failings that are amplified in a technological era. In the YouTube research I found out that co-creation is not just about making video and uploading it, but is also about making sense of and participating in community decisions and values over time. Consider how people use Twitter and re-tweeting to inspire revolution, as well as socially execute. Gamers whose interest in the game extends beyond playing it modify code to share and play a game of their own devising. There now exists social media that allows people to document and share memories, such as HistoryPin and Collectish, but these has an equally as important role as YouTube and Instagram in capturing and organizing memory.

Essena O’Neill’s recent Instagram revelations and subsequent changes to her account highlights how social media is a space for multiple and changing memories, a documented identity and an evolving narrative. In recent years, there has been a significant research movement in the archival discipline to explore the hidden, marginalized, and absent voices from official records. This work is linked to an already evolving conversation about records, people and power, and how archives have been used to subjugate and make invisible communities of people over the ages. In part, this movement explores the idea that evidence is not within the object, but rather in the stories shared by people and their evolving contexts. Archives and archivists, have a role to play in helping people make sense of these contexts, but traditional role of being selector and custodian of heritage is no longer feasible, nor ethical. Archives and archivists need to be able to facilitate the connections between different ways of experiencing and constructing evidence. This means connecting what is already out there (in archives as well as already online, plus what is on our computers) to help make sense of it over time.

The new kind of digital archive my research hints at enables the creation and management of evidence by providing the technologies and the intellectual framework to allow people, including archivists, to add, manage and link metadata. Metadata is the lifeblood of the archive – it is the description of what happened, who did it, how it was done and why. The archivist is therefore not a selector, nor custodian of cultural heritage, but rather preserves the systems that support wide, diverse and multi-layered understandings of value and evidence. In this distributed, non-custodial archive, anyone can decide to remove something they are responsible for, but the archivist manages the evidence of its contribution to the network of memories – the metadata and the links. The story continues to be told.

The ideas presented in this article are specific to what are referred to as special collections, often managed by libraries but called archives. Archives, in the true sense of the word (as understood by archivists), particularly organizational and government archives do include mechanisms to document and manage context over time, but the lessons of this research and the concept of co-creation is equally applicable. In Australia, we have an entity called the National Archives of Australia, but it only manages federal government records, which is only part of what it means to be the nation of Australia. Archival legislation across Australia does not recognize the notion of co-created records and what it might mean in relation to rights in the records – not just in access, but in how they are created, captured and managed over time.

Archivists should be facilitators of remembering and embrace the complexity of evidence by enabling people to tell their own stories so that the multiple truths that exist in our world can be heard. There has been talk of losing memory because archives are not digitizing materials fast enough, but this is not the most critical problem for archivists. What is critical is moving beyond the models of the archive imbued with power, prestige and control. Digitizing the world’s information is a fabulous idea, but how will it be managed brings a heavy responsibility. Being digital means more accessible, and the ability to crunch data and mash content. Is this for everyone? Who makes decisions about descriptions? How will people be able to make sense of evolving contexts? How can all peoples have a right or a say in collective memory? Who controls the archive controls the future.  In our technological world the archive is evolving, diverse and beyond the confines the institution. The role of the archive and the archivist needs to change to face these new challenges.

* My SEO program tells me: The copy scores 41.0 in the Flesch Reading Ease test, which is considered difficult to read. Try to make shorter sentences, using less difficult words to improve readability. Yikes! It must be true!