The Open University's (OU) Open Research Week took place from Monday 24 to Friday 28 March 2025. During this week, we celebrated and promoted open research by showcasing examples of open practices across many disciplines. Scroll through our programme below to view all the webinars.
Watch the videos below to hear what Professor Kevin Shakesheff, Pro-Vice-Chancellor, Research & Innovation, has to say about accountability in research, and what Professor Theo Papaioannou, OU Open Research Lead, has to say about how open research breaks down barriers to access.
At The Open University, we’ve always believed in the power of accessible education and research – open is who we are. Our mission is to remove obstacles to learning, and open research is a natural extension of that ethos.
Open Research Week is important for us because by sharing our approaches and our work openly, we invite discussion and critique, and—most importantly—progress.
It’s important to say that openness doesn’t mean an absence of rigor. On the contrary, open research encourages a higher level of accountability. Transparency in methodologies, in peer review, and in data sharing strengthens the credibility of our findings. It allows for scrutiny, replication, and improvement, making research more robust and trustworthy.
[Images here] This week, we’ll hear from experts leading the charge in science communication and in opening research processes to publics using citizen science platforms. We’re also very proud to be launching our free, online course on Open Research, and the team behind that will be taking us through what it offers researchers.
There are challenges – of course there are. We have to navigate questions of intellectual property and data privacy. And we have to think about sustainable funding models for open-access publishing. But, in keeping with our mission and our Open Societal Challenges of Sustainability, Living Well and Tackling Inequalities, we have to ensure that open research doesn’t inadvertently exclude those without the resources or infrastructure to participate fully in the research process. Equity must remain at the heart of this openness. It is not enough to open the doors of knowledge; we must make sure that everyone has the means to walk through them.
So, whether you’re an experienced advocate of open research or just starting to explore what it means, I encourage you to take full advantage of this week. Ask questions, share ideas, challenge assumptions, and most importantly, think about how open research can shape the future of your own work.
Here at The Open University, we’ve always embraced the spirit of openness in our teaching and research. Our commitment to open access and to making educational resources available to the widest audience possible, reflects our values and mission. But we must ask ourselves: How do we continue to move the conversation about openness forward?
The work we do in advancing open research, goes beyond simply making research outputs publicly available. It requires us to rethink the entire research process—from project conception, to how we collect data and share findings, and how we engage with the public and other researchers.
This week gives focus to these conversations, particularly around openness and AI, around citizen science, around open access publishing, and much more.
At its heart, open research is about breaking down the barriers that have traditionally limited access to knowledge and research outputs. The notion that knowledge should be open and accessible to everyone, regardless of their institutional or economic background, is increasingly being seen as the foundation for a more equitable, transparent, and innovative research culture.
My work has focussed on innovation in the global development setting. For decades, academic research has been locked away behind paywalls in journals and books that many people in both the global North and global South simply cannot afford or cannot access. But this model is now changing, and with good reason. By embracing open practices—whether it’s open data, open methodologies, or open peer review—we make the process of scientific inquiry more inclusive. And more inclusive means more innovative.
Much of what we’re sharing this week is part of an ongoing conversation about inclusivity and innovation. Our speakers are sharing examples of open practice and what works for them, because we know from experience that one size doesn’t fit all. I encourage everyone to help continue this conversation with us.
Alice Fleerackers, Assistant Professor of Journalism and Civic Engagement, University of Amsterdam.
Alice discusses how open science practices - like posting preprints, sharing datasets, and publishing OA research—can support the public in accessing, understanding, and using research knowledge. She draws on five years of research into journalists’ engagement with open research outputs, as well as her professional experiences as a journalist, science communicator, and scholar.
Nilam.McGrath 0:03
In the Department of Media Studies at the University of Amsterdam, she is also an award-winning researcher and writer whose work has appeared in the Open Notebook, Globe and Mail, National Post, and Nautilus Magazine.
Among other outlets, of course, she holds an interdisciplinary PhD with a focus on science journalism, and she studies the and studies the intersections of journalism, health, science, communication and scholarly communication.
All of which, of course, is a perfect start for our open research week. Alice's talk is about making research knowledge public, and in particular, using open science to support science communication. Now there will be a chance to post questions to Alice in the Q&A channel, which you'll be able to see on the right of your screen, and the button itself is a couple of little bubbles.
At the top or the bottom of your screen, depending on on what you what platform you're using.
There is also a chat channel which we'll use to share a feedback form at the end of this webinar and links to further events for the for the rest of the week. Now we won't be monitoring the chat channel for questions, but I know that Alice will be using it throughout throughout her presentation. That doesn't stop you from sharing links and ideas in the chat, but for questions specific questions to Alice, please use the Q&A function so.
With that, Alice, I shall hand over to you.
Alice Fleerackers 1:33
For having me, it's a real pleasure to be here today.
Before I start talking at you, I want to briefly hear something from you. All of you are clearly here because you have some interest in making research open. Either you're already doing it and you want to improve your practises, or you're interested in starting to make your research open. So I'd like to hear from you and I'm gonna ask you to put your answer to this in the chat, but wait to press enter until I say so first.
Who your audiences are, who you hope that you are.
Fitting in, making your research open, and what practises you're using to make your research open. So just take a few seconds to draught an answer to these questions in the chat and then hold off till I say hit enter to share with the group.
Tell me a.
K Take just finish up your answers.
And.
On the count of three, I'll ask you to hit send or enter 123.
OK, so why do I ask you to reflect on these questions? And I hope you can see some of the the diversity of potential publics of of making research open.
I ask you to reflect on these because historically there hasn't been very much attention to the actual audiences of open research. Very often the focus has been on making research open to the scholarly community, which is of course really important early on in writing about Open Access, which is sort of gained a lot more attention.
Then open science. In the early days, Super wrote that Open Access isn't primarily about bringing access to lay readers. If anything, it focuses on bringing bringing access to professional researchers who, whose careers depend on the on that access. However, really crucially, even in this very early writing about Open Access, he also acknowledged that there is a potential in making this research open for other scholars.
To benefit that lay public.
Specifically on the very same page, he writes that there's no need to decide which users are primary and which are secondary Open Access about bringing access to everyone with an Internet connection who wants access regardless of their professions or purposes, and research from my doctoral supervisor, actually Juan Pablo Alfarin, has actually found that as many as 35% of people who access.
Open Access research do so for personal purposes.
So this suggests that there is a public beyond this non expert or beyond the expert community who has an interest in using research.
And this idea of broadening access to a public audience has gained a lot of prominence in the recent UNESCO recommendation on Open Science, which defined open science as a set of principles and practises that aim to make scientific research from all fields accessible to everyone, for the benefits of science and society as a whole.
So in a contrast to that, early conceptualization of Open Access you see here that science and society are being put side by side as the core beneficiaries, the key audiences of making research openly available.
And what my research group has seen is that during COVID-19, this broader view of open science took up a lot of space and increasing focus within the debate around open science. This is an analysis of some of the key documents relating open science to COVID-19.
We saw this sort of engagement with societal actors come up again and again, and often with reference to that UNESCO recommendation on Open science, which seems at least based on this study, to have had a quite a big impact on the way we think about making our research open.
However, another study that we published when we looked at the way open science is discussed in actual policies about open science within Europe and the Americas. So rather than just documents.
Providing arguments about open science, what it was, what does it mean to put it into practise in a from a policy perspective, we see that this broadview of open science is much more an abstract concept that's still not very much reflected in the practical policies that we've put in place at many institutions and organisations and at the national level. Specifically, our results show that open science policies overwhelmingly focused on making research outputs publicly.
Publicly accessible, neglecting to advance the two aspects of open science that hold the key to achieving an inclusive, scientific, culturally culture, namely equity, diversity and inclusion, and public participation. Now I've bolded these two terms about public accessibility and public participation, because they're very often treated as if they're synonymous. But in fact they're very different.
Specifically, making something accessible.
Making it so that people can find it, see it read it does not necessarily make it useful. It doesn't necessarily mean that people can understand the research can use it to make informed life decisions, or, crucially, contribute to it meaningfully. Can they participate in the research process? Can this wider public really play a role in science? And can science really play a role in their lives?
And Kelly and Audrey have really, beautifully.
Conceptualise this as providing material access to research and conceptual access to research.
Essentially making your research free to access.
Provides material access. It means people can get they don't have to go over a paywall to access your work, but that is while an important first step, not sufficient for ensuring that they conceptually can access that research that they can understand and use it.
And this is where science communication, a field that has been dedicated to studying and advancing the ways in which the public can understand and gauge with and participate in research, is so crucial.
And specifically what I want to focus on largely today is the role of science journalism, which is really where my research over the last five years has focused.
And over the course of this presentation, I want to convince you that journalists can be really crucial.
Brokers of open research knowledge, that is, they can play a really crucial role in making research knowledge public, and we make this argument in a review that we published in F-1000 quite recently, looking at all the research that has looked at how journalists use and communicate about Open Access research, where we find compelling evidence that journalists.
Can help bring open research outputs, specifically papers.
And preprints into the public sphere, and that in doing so, they can not only provide material, but also conceptual access. That is, they might share research findings or communicate the summary of findings of research that people would otherwise not access, and certainly wouldn't come across or find. But they also are really skilled at breaking down the jargon Laden complex technical nature of research.
Into a language that people can understand.
But also, crucially, communicating the importance, the significance, the impact of that research for the public, for society, for policy and so in this sense, journalists are crucial brokers of research knowledge who can support the open research or open science movement in making research not only materially accessible, but also conceptually accessible.
But that same literature review, as well as my research and my team's research more broadly.
Has shown that openness brings both exciting opportunities for science journalism, but also really crucial challenges that we as a scholarly community need to address. If we truly want to achieve this inclusive societally engaged form of openness that recommendations like the UNESCO recommendation have advocated for.
And I'll demonstrate these opportunities and these challenges through three specific areas.
How journalists use Open Access research, how they engage with open data sets, and finally how they engage with preprints or unpure reviewed research papers that are publicly available before they've been published in a journal.
First, let me briefly turn to journalists and their use of Open Access papers.
In our interviews with journalists, we found really interesting findings about how they understand Open Access.
Why they turned to Open Access journal articles, but also why they avoid these journal articles in some cases.
Specifically, in interviews with journalists who frequently report on research, they've told us that they find Open Access useful on a really practical level because they sometimes struggle to access research papers due to paywalls.
They've also told us that they see it as this ethical must do that because so much research is publicly fundable funded. It should also be publicly accessible.
Crucially, however, journalists who report on research are extremely diverse, and this appreciation of Open Access was really only true for some journalists. In every study we've ever conducted. In addition to journalists who have this deeper understanding of the Open Access movement.
Understand it as key to the public good. There are many journalists who are actually suspicious of Open Access journals and who equate quality research as research that has been published in top, quote unquote peer reviewed journals, many of which have historically been closed. And even today though they are open, require scholars to publish that exorbitant costs. That excludes a large amount of the scholarly community.
Places like nature, science, cell, et cetera.
And in more recent research, this is a preprint. It's currently still under review. We found that journalists, like many scholars themselves, have often start to conflate Open Access journal publishing with predatory publishing. So again, we see them relying on top high quality journals, placing their trust in them almost uncritically.
Whereas newer Open Access journals, like many diamond Open Access journals are are treated with more suspicion.
This obviously has really important implications for the diversity of the kind of research that ends up getting media coverage and ends up making it out into the public sphere.
And while these findings are, to me at least, really interesting and have important implications, they're also just the tip of an iceberg. Because our review of the research that I showed you earlier suggests that.
Really know almost nothing about her, how journalists are engaging with and thinking about Open Access or what this means for the public and for science. So if anybody needs research ideas, highly recommend doing anything in this space will be useful to us.
Beyond Open Access journal articles, I want to speak to how journalists have been using open data sets, which are datasets that are shared publicly so that others can use them, do secondary research on them.
Scrutinise them, validate them and research in this area is even more preliminary. But a recent piece of journalism that I wrote myself for the open notebook I spoke with seven journalists who use open data sets to tell science stories, and these journalists told me that they found these data sets really useful for doing things like adding additional context. For example, if they were writing a story about.
Inequities in healthcare access. They might find a data set that helps them.
Show which kinds of people are most at risk of being overlooked by the healthcare system. Or they might use some quotes from publicly available interview participants to provide a more personal lens to their stories. They also did first hand investigations of data sets to call attention to problematic aspects of science to monitor the powerful.
And finally, they used open data to tell impactful science stories.
Sometimes to bring a more personal lens to otherwise technical coverage of research.
And very similar to what journalists said they appreciated about Open Access research articles. Journalists said that they appreciated these open data sets because they were free to access and thus easy to use and explore.
This let them have a preliminary look at the data available and come up with new story ideas. In some cases, stories that.
Journalists weren't reporting on exclusive ideas that helped them stand out from the massive competition of journalism.
But and this will come up again when I talk about preprints, journalists also really struggle to verify and analyse these data sets, because that typically requires time resources, deep expertise that journalists may not have, as well as good metadata that spells out the limitations, the use cases of the data.
And fundamentally, journalists said that they needed collaboration from other researchers to be able to do this kind of verification.
Many of them recalled working directly with the researcher to analyse the data, clean the data but also interviewing researchers about data sets to understand what's the context that this journal, that this research data applies to, what are the limitations I need to be aware of if I choose to use it in my reporting?
But while this is very promising from the perspective of making data more publicly accessible, accessible and understandable.
It seems that these journalists are a minority. In fact, writing that article, it was very challenging to find journalists who could effectively speak to their use of open data.
And in a review that we just published a couple days ago, we found that at least within data journalism, which is a very obvious use case for for open data sets, there was very limited evidence that journalists were really using these data sets in their work, but.
Crucially, again.
There's very little research on this topic as we found in our review and what we do have in terms of research is often not methodologically done in a way where it's very easy to know whether a data set being used is openly available or not. Essentially, the data journalism community at least has not turned much attention to how these journalists are using open data, and so we need more research in this area.
To really know how and why, journalists can help broker this knowledge.
Finally, I want to turn my attention to journalists and their use of preprints, which are unpure reviewed but publicly available research papers. And this is probably the area of research where we have the most knowledge.
Which was largely due to the focus on preprints during the pandemic.
When many scientists started to use these to share their research freely and and as quickly as possible.
To be able to support the pandemic response in those early months, peer reviewed research typically takes months or weeks because it was a bit expedited during COVID to come out, whereas preprints are almost immediate and what we found in our research is that journalists also started to adopt preprints in their work at rates not seen before COVID.
Because, like scientists, they appreciate it.
3 rints were free to access that they were rapidly available, and because they were so timely, they were directly relevant to the societal issues that they were trying to report on in the heart of this crisis.
And.
What was interesting about COVID is that these key benefits, which have arguably been there all the time, preprints have always been free, timely, and many of them relevant to society. COVID, our research suggests, seems to have introduced many journalists to these studies for the first time in this sort of scary looking graph.
You can see that COVID that media coverage.
Research articles, which is that grey line. The share of that coverage that was focused on preprints leading up to the pandemic was around 2% of articles looking at research. But in COVID, if you see the blue line media coverage of preprints about doubled and with respect to media coverage of COVID related research, which is that pink line, we saw that almost.
As 15% of media coverage.
And gets twice as much engagement.
Than social media posts just sharing the original preprint itself, and this study was conducted with just five media outlets, so just five media outlets covering these studies is generating double the engagement as sharing the study in its original form. You can only imagine what kind of social media impact media coverage of research can have when you look beyond just five outlets to the entire.
News media industry.
If you're a visual person, another way of looking at this is the overlaps of the Twitter and Facebook accounts that we're sharing preprints and papers in their original form. That's those pink bubbles versus sharing the media coverage of those same preprints and papers as the green bubbles. Not only are these green bubbles much larger, they're also distinct. Other than that, very small overlap.
It's different kinds of audiences.
Seem to be engaging with research so shared on social media it first hand versus research shared in this media coverage form.
Suggesting that there's a potential to reach a larger and different audience when a journalist covers your preprint than when you just share that on social media.
And this is supported by another study we did that found that media coverage of COVID preprints from 1 media outlet. The conversation was not only shared widely on social media, it also was shared to a large community of individual accounts that went beyond the typical scholarly audience of research shared on social media. People like policymakers, healthcare professionals, and.
Maybe most excitingly.
Ordinary citizens.
This is exciting. It also comes with risks, especially if you remember that journalists are struggling to communicate what a preprint is and what that means for their audiences.
And our recent research suggests that the public has basically no knowledge for the for a large part of what a preprint is. In some cases, people had a vague understanding that it's a draught of some sort, but many people equated preprints to things that were wildly incorrect.
Things like republished versions of articles, unfacted news stories, or in perhaps the most concerning answers, plagiarised content.
And crucially, even in cases when journalists do try to help their publics understand what these outputs are, it seems to have limited impact on how well people understand this term.
This is a lot of doom and gloom, and of course these risks really need to be addressed. We need better ways to communicate about open research outputs in ways that help people understand the caveats, the limitations, and the potential.
But I want to bring it back to this life saving potential of pre prints. When we look at that same social media study that I discussed earlier and we look at the COVID-19 pre prints that got the most attention on social media when they were shared through media coverage, many of them are about life saving and very publicly relevant.
Data, for example, things about the effects of different intervention timing on COVID-19 spread, things about how.
COVID spread in hospital rooms. Lots of these. The oh, yeah. The controversial hydroxychloroquine usage. And while this is obviously problematic, many of these preprints were ultimately published in quote unquote top journals, places like science, Science advances, Med cell science, the usual suspects.
So I just threw a tonne of information from you and if you didn't get it all, that's totally fine.
Because I'm gonna hopefully bring it together with a few key takeaways that I hope I can leave you with before we enter the Q&A.
The first is that open science or open research, as we're talking about today, share some really important science communication goals, goals of ensuring public access to research.
And in recent conceptualations conceptualizations, like the UNESCO recommendation, public participation in research, societal engagement with science.
However, historically and as we see from these policy documents and and related discourses, there tends to be a focus on providing the public with material access, of making research free rather than providing conceptual access, making research understandable, useful and.
Ultimately helpful to the public.
And this is hard for scientists. We have lots of things to do.
We're not all skilled communicators and This is why journalists can be so crucial to providing the conceptual access that's often overlooked in the open scholarship, open research movement. And they can do this by, for example, reporting on Open Access papers, open data sets, and preprints. As I've shown you today.
But to do this, and if you remember one thing from this presentation, it's this slide.
Journalists really need help from the scholarly community.
They need us to make our research open so they can find it, access it, scrutinise it. But this is just the first step.
They also need us to be really clear in the research itself and how we discuss it online, but also in interviews that they might conduct with us.
And especially be clear about the limitations of the study.
What really are the weaknesses? Not just the weaknesses we throw in for the peer review process.
And when a journalist asks you to comment on someone's research, whether it's your own or someone else's, it can be helpful to really think about this like a mini peer review rather than just applauding your colleagues accomplishment, which you should do as well.
It's up to you. You have a responsibility to critique, to explain, and to provide the important context for that research that the journalist needs to know if they're going to communicate it to a public audience.
Relatedly, consider it that a journalist is going to be paying extra attention to your abstract and for an open data set, things like the metadata or read me file. These are spaces where they can get a sense of what your research output is about and which they're going to be using in that preliminary step when they decide whether or not to report on it.
In all of this, I think a crucial thing is to shift the way that we think about our research to not only consider how it might impact our peers, the scholarly community, but also how it might impact the public and in some cases, this might discourage you from posting a preprint if you think that the risks are really going to damage the public's interest. But it could also encourage you to promote your research more broadly to turn to social media, to reach out to journalists, to ask your press.
If they're willing to put out a press release because you might see that the research you're producing has really important implications for your public for the public.
But most fundamentally, I think it is up to us to change the narrative around what we call good science.
As I've pointed to in several of the studies I discussed earlier, journalists are so reliant on reputation based metrics that we know are problematic things like journal prestige impact factors, citation metrics, and it's up to us to change the way that we talk about what quality research means when we engage with journalists. When we talk about good science.
In the public sphere, and it's only in doing this and considering these alternative publics of our work that we can really move from making research knowledge open to making it truly public. Thank you so much for your time. I look forward to hearing your questions and comments.
Drs Camilla Elphick, Sarah Laurence, Lydia Devenney, and Ailsa Strathie, School of Psychology & Counselling, OU.
The team behind the OU’s free course on Open Research share its content and interactive decision-making tree. The course is designed to help researchers from across disciplines understand the principles of open research, why open research is important, and to recognise good practice in the work of others.
Nilam.McGrath 0:04
Thank you.
Lydia.Devenney 0:06
For the lovely introduction, Milan and good afternoon, everyone. Thank you so much for having us. So as Neelam said, my name is Lydia and I'm joined by my colleagues Camilla Elfick and Elsa Strati. Today, Camilla and I are going to talk to you about developing the open research course and the accompanying decision tree, and Elsa will be managing the chat. Thank you, Elsa. And so we will demo various elements of the course and the decision tree throughout the presentation.
And today, however, if you'd like to try it out yourself, and you can follow the QR code on the screen there. And if you have any troubles, we can pop the link into the chat for you as well. So to get stuck in now, I'm going to pass you on to Camilla and she's going to tell you a little bit more about how the course came about and who was involved. All right, I'll pass you on to Camilla now.
Camilla.Elphick 1:02
Thanks, Lydia. Yeah, hi, I'm Camilla. And yes. So I think we decided we really wanted to tell you why this course came about. And it started with an away day that Sarah Lawrence organised for the School of Psychology and Counselling. And we helped her to organise and at the event we realised that many of our colleagues had an interest in open research but didn't really know what it was.
And I think we've had a sense that quite a few of them felt it didn't apply to qualitative methods.
And there was a sort of vague assumption that nothing could be done about making a completed research project open, so that made us realise that understanding around open research was fuzzy at best. And so it was difficult for people actually to take action in their own research. And so we made this question also whether there were similar issues with researchers from other disciplines.
So we had quite a lot of chats about what we're going to do about this and whether we could make some kind of resource to.
To to to mitigate this problem. And we realise that what people really need is something that's really accessible, flexible, targeted and simple. People are really time poor and they don't want to be reading for hours or days on the Internet trying to find out stuff when they can actually find something really easily. So we decided that we needed to create a course.
That helped people understand what open research is, why it's important.
And how to recognise good practises in other research so they can make a decision for themselves about the quality of that research and how credible and believable it is?
But done in a very very simple way. And the other thing that we thought people really need is to be able to take practical steps to make their own research more open in a really targeted way. So we were really lucky. Rose Captivilla and Nicola Dawson helped us to get funding for this and with them we managed to get gather really great team across disciplines and the library.
To help create the resource and we've got a fantastic consultant, Priya Silverstein, and you'll hear more about her in a minute. And she led the writing of materials with with me and Lydia.
And the people that we got involved all gave us a really fantastic feedback about how we can make the course more relevant to different disciplines and different methods. So it's basically was designed for postgraduate researchers, but actually we feel it's one of these resources that could be interesting to anybody who's interested.
And good practises of research. So we have the link, I think it's been shared in the chat. We also have the QR code that was shared. So if you want to play around and whilst we're talking, that's absolutely fine but.
The basic idea is that it's structured and flexible, so you can dip in and out as you wish, and the interactive tool is really the way that people can actually make things access open for themselves. And I just wanted to show you one.
Slide here just to show you how many people were involved in the production of this and actually the list is even more extensive, but you can see that on the left here we are all involved in psychology and counselling, but we've got people from the library, from social sciences and global studies, arts, humanities and physical sciences. So it really was a collaboration between various different disciplines. So I think now I'm going to hand you back to Lydia.
Who was the one who introduced us to Priya in the first place?
Lydia.Devenney 4:46
Thanks Camilla. And so, yes, we were extremely fortunate to have Priya as our lead author. I first met Priya back in 2020 when I joined the Nowhere Lab, which is an online lab that they set up during lockdown. And then in 2023, they recorded an interview for us on open research that was situated in one of four psychology and counselling modules. And it was clear from there, really, that the tone and approach fit perfectly with.
Style and values and not only that, like pre is well known contributions to open research through publications like easing into open science, a guy for graduate students and their advisors as well as their experience teaching open research across disciplines and career stages made them an ideal lead author for the course. And as you explore the course you'll notice that the writing is quite engaging.
Inclusive and easy to follow, and it features examples from a wide range of disciplines.
As well as clear and relatable explanations, which is great. So Priya is currently a researcher in with the Psychological Science Accelerator, serves as the President of the Society for the Improvement of Psychological Science. All right, I'm going to pass you back to Camilla and I.
Camilla.Elphick 6:08
Yes, Lydia.
OK, so I am going to talk a little bit. So we what I think what we've established so far is that we have two products. One is a course and Lydia's going to take you through the basic elements of the course and we have an interactive decision making tool which I'll talk to you about very briefly at the end.
So for the course, basically it's set out to last eight weeks, but being OU, it's very flexible. You can do the work whenever you like. It's there online for you to use whenever you want, whenever you want.
And we estimate it takes about 3 hours of study each week. So I did a little bit of maths and worked out that's 24 hours. So if you want, you can binge study like watching the whole series of 24 in one go. If you want to do so, basically what it does is it sets out from the basics. So what is open research and why is it important?
It also shows you how to recognise good practises in other research, so you can make you can judge credibility and quality of other research.
And it shows you how to get involved in some really innovative open research initiatives.
And also it shows you how you can apply open research to your own discipline and your own method at any stage of a research project.
And being Open University, it's full of interactives and links and interviews with researchers, so we think.
Addressing is now to pass over to Priya. This is a video that's included in the course which explains their personal experience about replication and why they became interested it in the 1st place. So I'm going to try and press play and hope that it works.
I'll hand back to you now, Lydia, if that's OK.
Lydia.Devenney 13:15
OK, so we've just inserted the QR code there again in case anyone hasn't been able to, to have to have an exploration of the course yet. So the course itself is designed in a really simple way, so it's broken down into three elements. So there's transparency, which refers to the practise of being open and honest about all aspects and details of the research process.
So if you imagine you've got a kick and.
Your friend asks you for a recipe. You wouldn't just say make a batter, put it in the oven and then decorate it because that wouldn't be enough information for your friend to reproduce it.
Yeah, this is often what happens in research, so we don't actually have enough information or transparency to understand how exactly a study was carried out about what Priya was just describing there in the video. So in reality, when you're sharing a video, a recipe, for example, you probably share details about the ingredients, the settings for baking it in the oven and any steps in between so that your friend could actually make it themselves.
And that's really just what transparency is. And open research, it's about researchers openly sharing the methods, data and results that anyone can see. Exactly how something was carried out. And then integrity is a degree of trustworthiness or believability of the research findings. So it's a bit like.
If you had, you know, you might be tempted to leave out a part about in your recipe about how it took 20 odd attempts before the cake actually rose.
Integrity is about being honest about things like that.
And accessibility is about ensuring that all who are interested in the research are able to consume, evaluate, and interact with it. And it doesn't matter whether you're a researcher, a student, or if you just have a particular interest in the topic and it's sort of like posting the recipe online for everyone to see. And when research is accessible, it breaks down those barriers to the development of new knowledge, which, as researchers, is what we're all about.
Because the information isn't just limited to that one grip. So when it's accessible, anyone can access it. So those are really the three core principles.
And then after we sort of highlight and describe what they are, we go into a little bit more detail on those. So I'll just take you on to the next screen. So the first one is transparency.
And you can see here there's quite a lot that we cover under the umbrella of transparency.
So the course talks about making methods, data and results open to everyone. It covers fair principles, so ensuring that your data is findable, accessible, interpretable, and reusable. It also covers understanding the types of licencing and repositories you'll need to navigate when sharing your data.
And the ways in which you can improve your research so that others will be more likely to be able to reproduce it.
There's also an explanation of how to pre register.
Certain links to fee registration forms across different disciplines and different types of analysis and design and research design, and it really takes you through like pre registration questions step by step, which can be really useful because I think doing your first pre registration form can be a little bit daunting.
It highlights the importance of distinguishing between confirmatory and.
Exploratory analysis as well. I mean, I think there's a tendency to think that, you know, it should always be confirmatory analysis and you can only pre register confirmatory analysis, but.
You can practise open research and.
Be registered exploratory analysis too, and we really cover that in the course.
And as well, we also talk about reporting guidelines. So that's sort of set of rules and standards that help researchers.
Present their findings clearly and transparence transparntly as well, and within the course there are different examples because it can that sort of checklist of how to report your findings can differ across disciplines, and we've really tried to, you know, highlight that and draw attention to some of the different ways that that those checklists can be presented.
Across different areas of research.
So then the next one we go into in in more detail is integrity.
And again, these are some of the topics that we cover within the integrity weeks.
So.
In case anyone isn't aware, just in terms of replication. So if a study's replicable, you'll be able to conduct the same study again, generate new data, and still get the same results as the original study.
To cover that and also the replication crisis as well. So where studies can't be, can't be replicated.
And it goes into, you know, various questionable research practises as well, what they are and how to avoid them, like P hacking, which is like exploiting analysis techniques to increase the likelihood of obtaining statistically significant results as well as selective reporting. So when results from research are deliberately, you know, not fully accurately reported.
A bit like just selecting a successful recipe, finding of the good cake that worked out and ignore and not talking about it. The other attempts that weren't successful or weren't deemed successful in the researchers eyes. It also covers generalizability. It's about well the theme findings occur in other contexts, and disclosures of conflicts of interest, positionality and associated biases that we can have.
Interpreting findings appropriately.
And identifying and, you know, staying away from flimsy interpretations.
And as well as robustness as well. So like the strength and reliability of of results and how to interpret that and also ensure that you have robustness in your own analysis and results. And then multiverse analysis, which is where researchers try to perform all possible analysis on the data in order to explore, you know, which analysis actually.
Surely effects their interest in which don't. So there's quite a lot that is addressed there and it's done a really accessible and easily easy to understand way with really cool examples such as recipes and the likes to make it easy to understand and also applicable across different disciplines. And then the final principle that we cover.
Is accessibility so.
As you can imagine.
Covers things like Open Access and journal publishing models. We had fantastic and fruitful from the library on this.
And there's a video as well, which is great as well as preprints. How to do them, what they are, academic privilege. And this is an interesting one because there are of course, systematic inequalities in academia and research and, you know, not everyone has equal access to opportunities and it might be because of gender.
Or disability. We all have different privileges, and there's a really fantastic academic wheel of privilege that's included in the course and that really just depicts as well essentially, you know, we should be striving for diversity at all stages in the research process and we really get to the novelists in the course in week six. So if you want to learn a bit more about that, that's in week six, we also cover.
Big Team science and that too, which involves a sort of large scale projects where.
People from across the world conduct the same study and then they pull the results together and it's a really good way of addressing generalizability. But again, it has its limitations too, and you can read more about this on the course.
So yeah, those are the three main principles and sort of I wasn't stopped tour of the different topics that are covered within it. So I'm going to pass you back to Camilla now. Thanks.
Camilla.Elphick 21:48
Thanks, Lydia. Yeah, so this is really the final slide that you might want to be thinking about, questions to ask us afterwards. We can have a go together.
About your own research and see how the tool might be able to help you with your own research, but for now what I just wanted to say a little bit about the decision tree itself. So the course that Lydia's described so well with how cake baking allegory is about really understanding the principles.
And that involved and the practises and what could be done, but what this tool does is it allows anyone who uses it to take action themselves for their own research, so at whatever stage their research project is at, they can take some steps, if they choose to do so, to make it more open than it otherwise might be. And so that's really what this tool is for.
It's for people who've done the course already, or anyone who's interested in doing the course. You'll see that in the course there's some activities involving this tool.
You can actually learn how to use it, although it is actually very easy, so you probably don't really need that. But anyway, it's part of the course.
This tool also will help people who've done the course. Just have a little refresher if they've forgotten something, in which case they might click. Want to click on the principles button, which will just remind them about accessibility, transparency and integrity, and take them to the part of the course that they most want to remind themselves about.
But I think most people will be using this to take action and so if by clicking the take action the actions button.
They can find out what they need to do at the search point they're at, so if they finish their project, there are still things they can do and this helps them to find out what those things might be. Now the thing about this tool, it's also Open Access to anybody who hasn't done the course. So you don't need to do the to do the course to use the tool. What it does is it will take you to the section of course that's relevant.
To you and you can still access quite a lot of the material.
And I think again probably the best way to demonstrate it is to show you. So this I think a few of you had difficulties with some of the sound in the previous video. We will put a transcript of that video in, in the chat in a minute. And I think you've been told that you will get access to the slides and the transcript afterwards just so that you know this video is silent. So don't be worrying about your sound button for this. It's just.
A screen recording. So anyway, just kind of press play.
And you can see somebody's clicked the actions button and you can see their mouse as they're trying to work out what they want to to select. And if you look at the links you can see they're taking the person to the correct place in the course to tell them what they want to do.
You can see here when they finished their project, there are still things that can be done.
And that's the end. Now, if anybody wants to, you can either try using this QR code or we might pop the link to this page in the chat again and you can have a go at trying out the the decision tree if you want. And if you've got any questions about that or about anything else that we've talked about, please. Oh, thank you. Elsa has popped it in the chat.
Yeah. Please ask us questions because we're now done.
Thank you very much.
Dr Paul Piwek, School of Computing & Communications, OU.
This webinar details contributions, over the last 20 years, to open research in Artificial Intelligence (AI) at the OU, and in the context of long-term trends in AI research. The talk is aimed at researchers and practitioners interested in the role of open research in AI.
Paul.Piwek 0:05
Thank you very much Neelam for the introduction. So yes, I'm going to talk about open research in artificial intelligence and it's neilam already mentioned. So this is going to be a personal account and I thought it's important to sort of signpost that at the beginning because there is a lot of research going on at the Open University in terms of AI. And this is just a very small, not necessarily even representative sample of my personal.
Involvement in that research.
And what I really hope to sort of that you will get out of this talk at the end is that by the illustration of a number of these, these projects, AI projects.
Historical projects over the last 25 years we'll shed a bit of light also on the most recent developments in artificial intelligence and then in particular I'll be focusing on this notion of.
Generative AI?
And you'll you'll you'll be in good company if you haven't heard of generative AI, at least before the end of 2022. So this is this is a graph from Google Trends which shows that there were very few searches of that term at the time, but then.
2022 and some of you may guess what happens ChatGPT came along. It was released the 20th of November 2022 and all of a sudden there was a huge interest.
Now, what is generative? Aii won't sort of give you a technical definition immediately, but I want to really show you an example of a particular use of this technology which really excited me. And I hope that it will also become clear throughout the talk why that was so interesting.
So the idea here and and the example I will give you is really from my own practise.
I I.
So together with a number of colleagues, we we occasionally we present our research at conferences, workshops and a while ago we had to present a poster and the organisers had the brilliant idea to also accompany that with a short video or asked us to come in that with a short video and we thought we'd do something slightly different.
In the sense that we wanted to turn this into a little conversation between in this case, the three of us, the three of us who had worked on that particular project. So we asked chat chip PT to create an engaging dialogue for three persons, which summarises the context of this text and the text then was also was authored by us. This was basically a description of the work we had done in particular here actually on one of our modules, TM112.
And the key thing here is so this text that you see here is fed to the generative AI. And so that's what what we normally call the prompt and what it then did is was quite nice actually because it it turned this into a nice short dialogue script. It wasn't yet perfect. For instance, the names had to change to match with our names. And there were also a few very small mistakes. So we had to edit that text a little bit. But once we had that.
And this richly writing the the prompt took us a minute, and then actually checking that the output was correct because another minute. So this was very fast. We then took that, and what you see here are a few stills from the the video that we created. We still use our own voices actually, to then act out that dialogue script.
And I won't play you the video here, but we will later also share the link to this particular video in the chat.
But it can't have been that bad because we actually that year won at that particular scholarship of teaching and Learning Conference.
The best.
Award or we voted best poster.
The two things that I sort of want to emphasise here, which I think are interesting or important, are so on the one hand, and I hope you can see the highlighting there, the content which we were responsible for as the developers of this, this little video was produced by, produced by us.
And it was also checked by us.
Second thing, we also then of course in our presentation, we acknowledged that we had been making use of ChatGPT as a way in particular here to create this dialogue script.
So that's just a little example of why I thought what, what got me really excited about the most recent technology, but also from a research point of view, this was really exciting because, well.
I in 1998 so quite a while ago I joined the Research Institute in Brighton.
The Information Technology Research Institute, where there were a whole bunch of people who were actually working on this notion of flexible information presentation, and I just actually come out of my PhD examination at that point. She can see this is a photo from from that time. So what do we mean by flexible information presentation? So the idea is that there is a lot of information around.
In locked up in text, maybe in databases or other forms and it's not always in the form that's most suitable.
For sharing with others are communicating and so flexible information presentation would then be technologies that help to turn this into the right form, and so this could be just a plain text which might sometimes be the best solution, but in other cases it might be text with a particular linguistic style. So for instance, when I talk about style, it could be formal or less formal text.
The text might also be laid laid out in a particular way that makes it more friendly for the for the readers, and then there could be.
Pictures added and last but not least.
It could also be turned into a screenplay so a dialogue script like the example I I just showed you. So when I joined this, this particular Research Institute, one of the first projects I got involved with was the NACA project net environment for embodied emotional conversational agents. And so there we were also working on this idea of generating these dialogue scripts in this case.
The application application in mind.
One of the applications was to produce.
Information about a car or a set of cars for a potential car buyer, and the idea would be that the buyer would. This information would be picked up from a database and then a short conversation rather than a maybe more boring leaflet would be produced that presented the car and as a user you could choose the roles of the car buyer and seller and you could also for instance express.
What particular things you were interested in, whether it's.
Environmental friendliness or family friendliness?
And once you've select made those selections, a dialogue script will be generated which was then enacted by this pair of conversational embodied agents.
We actually use that technology then also in completely different context with Austrian collaborator. In this case it was sort of embedded into a game environment that they had developed called Espidelberg which is student water of Vienna. And so here you could create your own avatar, you could send it out meet other students.
It would then generate these little dialogue scripts and stories that would then also be played out of its adventures in this particular game. So just to sort of take a step back. So if you look at an ecosystem, so this was well almost 25 years ago. So we started with a database. So that was a very formal, specific.
And constrained representation of the information that was going to be presented in. In this case about cars or about these.
Student interactions and then an algorithm. A plan based algorithm. Basically in in a in a precise programming language would specify how these dialogues are generated and then enacted, and the rules were written by us, the researchers, so that involved quite a bit of work and quite a few of them are also domain specific. So they were specific for the car domain or for the student domain.
At this point, it wasn't yet really proper open research, although we were actually a large consortium.
Search groups, who share their their data and.
Resources amongst each other. We didn't at that point make that available to the wider research community.
So the the dialogues were OK, although they were a bit repetitive, probably because this was had all to be handcrafted, it it was explainable in the sense that you could as a as a programme of this system, you could go back and see why a particular dialogue looked like it looked by simply looking at what rules had been applied.
And it would also very accurately sort of translate the information in the data by based via these rules into the conversation. Controllability was limited, so you could say things like who is going to be this the sales person of or the the buyer. And also we allowed users to set the sort of the emotion levels of the the interactors, but it was it was all all a predefined set of options.
So that was sort of the first foray into this idea of automatically generating dialogue scripts.
In 2007, I had the opportunity to work with a number of colleagues at the National Institute of Informatics in Tokyo, and there we are proposed basically that instead of starting from a database, it would be much better if you could start just from free text. That would be much less constrained and would be a lot more to do.
So we we developed the system, the T2D system text to dialogue that maps from text to dialogue, and we had a mapping algorithm. Now rather than this sort of top down planning programming language, just to show you what that sort of looks like. So if you have a short bit of text like to eat Japanese food, use chopsticks, we'd have a system that would understand or.
Figure out that there is a means relation between two parts of the sentence.
And then it would be able to translate that into a question and an answer like how can I eat Japanese foods? And the answer would be use chopsticks and it would also know what the roles of these participants were. So we would have a whole bunch of these roles. That rules that could be applied to text. And you have to imagine that the specific text there wouldn't be really part of the rules. So it could be applied to any text where it had detected a means relation, basically.
So those were still hand hand crafted rules.
Some of that involve me actually going through a lot of books on how to write good dialogue scripts and then trying to translate that into more precise computer processable rules.
It was no longer domain specific. That was the nice thing about this. So you could now apply to any text and it would give you a dialogue script. At this point it wasn't yet open research. We released that later, but I'll I'll come to that in a minute. The fluency was was good.
Not so repetitive as the previous example.
It's still explainable because we have the specific rules that we can look at, which triggered a particular dialogue script that was generated. Accuracy was also reasonably good, but there was no controllability.
So now we come to the the sort of the penultimate example I want to give you, which is the Boda system. So this was a follow up to the T2D, the text to dialogue work. So this was on coherent dialogue automatically generated from text. And again we had text as input. We had a similar mapping algorithm.
As we used for T2D, but now the rules were really learned or extracted from data.
And in particular, so we created this parallel corpus of dialogue and monologue text, a bit like if you wanted to translate from French to English, you might want to have bits of text in both English and French that are about the same topic. Here we had and had the same content here we had bits of text, bits of dialogue, and we tried to align those with bits of monologue that expressed the same information.
And here the the notion of open research also starts to come in. So we we wanted to.
Start from dialogue. Set text written by professional authors rather than try to make this up ourselves, and here we were lucky that there is something called the Gutenberg Library which has lots of texts that are no longer in copyright and so that can be freely shared, freely shared, and fortunately that also had a number of texts included that were on dialogue.
In particular, expository, non dramatic dialogue, and so here are a number of examples. So we had examples from the 19.
All the way to the 20th century. Actually, the guruvich paper wasn't part of the The Gutenberg Library, but we asked specific permission from the the publishers and the authors for that one.
So what did we do then to create this this set of data to actually learn the rules?
So we took for instance, here you see a short bit from Mark Twain's dialogue between OM, the old man and the young man, and you can see on the left hand side that that was then marked up with whether we had a question.
Yes, no question followed by an explanation followed by another, yes, no question followed by no answer, followed by another explanation. So we we added that what's called annotation or markup to the to the dialogue and we then we also rewrote our own sort of bits of monologue text that express roughly the same information. And we added this mark up here as well.
So attribution relation basically saying that there is a statement that's attributed that's attributed to somebody.
And here a contrast relation in this particular bit of text, and we could then like for the chopsticks example, we could create these rules which would map.
Instances on the left on the right hand side. So say an attribution relation to a yes, no question, followed by and explain or a contrast relation to yes, no question, followed by no answer followed by an explain.
So this was no longer again domain domain specific.
And in this case, now we were actually doing open research in the sense that we released that originally via one of the.
Research societies on language generation, although a few years ago I discovered that some of those sites actually had become obsolete, and we then moved this to the open research data online repository order of the Open University. And I think this is I wanted to sort of include this because.
With making your research.
Open it is really good to keep an eye on where to precisely do this and these platforms, so order is based on something called fix share are most likely much more persistent than sort of trying to create your own website and put it's online which might go offline or disappear in some way or other. And we actually discovered that another bit of information that we had also released.
About 1010 years ago was also no longer available.
We just put that on order literally the other day. So this was the the tools and the actual software that we produced for this for this particular project and which was actually mainly produced by RA at the Time, research assistance for at Lannister Yuan Sheff was now actually a team leader at Toshiba Europe, still working in artificial intelligence.
So one thing, we also then started doing is look at, OK.
How well or how good is the quality of the dialogues that are generated?
And in this case, we compared it with the original T2T2D system and we we got human judgements, we got people to rate these. We found that there was actually an improvement. So actually the the rules that were learned from the data from professional writers were maybe not surprisingly better than the rules that I had cooked up myself.
It's still explainable because it's still, we still would have these rules, which were extracted from the data.
Also accurate, but again, there was not so much controllability.
So just to give you an example of an application that we also developed then because now it it had gotten to a level where it was actually useful. So at the time we were talking in another context of another project.
With people from the Petworth Trust and so one of these, the things they do is they provide.
These leaflets, for their their their user, their users who are.
Helped with assisted living, so here is a leaflet for instance, which in the middle bid it says everything we do affects the people who use our services. It is important that we listen to those people, so our system would turn that and other bits of the the leaflet basically into a short conversation in this case is it important that we listen to the. It is important that we listen to the people using our services.
And the question why is it important? Because this will help us understand the changing needs of a service users.
And that was then again. So in this case we actually we use this off the shelf tool from a company called extra Normal, which now trades under the name normal to then actually turn it into a video. So we created the the dialogue script and then we had to add some instructions basically director's instructions to turn this into a video to have a camera basically film these two characters who would then act out the script and.
We'll share the link for that. I think later in the chat as well. So you can actually see what such video looks like.
But so at the time, and this was actually.
On the pet with trust website between 2009 and 2019, in addition to the sort of traditional leaflets they included, also our videos as an alternative way for their service users to to learn about the services they provided and the the the way users can get involved with that as well.
So one thing I wanted to sort of highlight as well is so when we were working on these projects to generate dialogue scripts, one of the things that really was of interest is how do you generate the questions since a dialogue to some extent, one of the main things that distinguishes it from a piece of monologue is that you often have explicit formulation of the questions that are being answered. And so we got.
Talking with colleagues at the University of Memphis.
In particular, the group led by Facil Rus Rus and we we organised this first question generation share task and evaluation challenge or campaign. And So what does that what does the task involve. So basically we we organised as the OU team we organised the the generation of questions from single sentences. So where you would have an input text.
The poet wretched Kipling lost his only song in the trenches in 1915. There's an extract from open learn and then the at the task for the participating teams or the participating systems was to generate questions of particular types. So for instance, they would be asked to generate a who question. So who lost his son in the trenches in 1915? When questioned, when did Richard Kliplin lose his son? Would be a good example.
How many? How many songs did Richard Kipling have?
Again, that should be answer. All these questions should be answerable from the original input text.
So we had then for this challenge.
In 2010, there were four teams from from all across the world actually who participated.
And I wanted to just include a quote from one of the participants, Shiv Sen Yao, who actually produced one of the.
The top performing systems.
And he also he, he emphasises the the sort of the academic environment in terms of, I think again, the openness here in this in this instance is about these researchers participating in a in a shared task. So the same task. So everybody does the same thing and then also sharing the results of that and comparing the systems with each other to find out what are the techniques and technologies that are most successful and.
Shishan here also then, so he's now CEO of one of the current AI companies.
He emphasises that this question generation, and it's still very important, so it's applications in education and healthcare are profound, is asking the right question remains the first step towards success.
I want sort of delve further into the that particular sort of the results with regards to the systems, but I wanted to highlight another sort of interesting observation at the time and this had to do with the quality of the input data. So keep in mind again, so the input data was a bit of text.
And the systems that had to generate a question for the text.
And what we found is, so we had these three sets of text, so texts from openlins or the open universities.
Course materials or extracts from course, materials that are freely available.
Wikipedia and also Yahoo answers Yahoo answers is something where people like on a, say, an Open University forum can just ask questions and get get answers. And of course there is a there is in terms of the.
The degree of editing that happens with these different types of text as you go from openly into Yahoo answers, there's a lot less editing, so openlan obviously has a whole set of editors and people who try to make sure that the text is correct and also grammatically and otherwise fluent. What we found is that given that the text is is sort of more edited. There's also more questions that can be generated from it, so you can see for open learning was 65%, which then decreased to.
57% for Yahoo answers and that was even after filtering a lot of the the information from the other answers.
And the same happened for the question quality. Again, I won't go into details here and actually the numbers here. So higher numbers mean worse questions or lower numbers mean better question. But again the general trend was that the more edited the more sort of quality control text actually led to better outputs. So good data, good data in results also in better outputs coming out.
I think that's that's a lesson that's in the in the sphere of AI research, everybody is still still learning. So we then now move to today's sort of work. So systems like ChatGPT. So we still the input is typically is text, but you can also input various other bits. So there's a there's sort of a step forward in sort of the variety of inputs that can be dealt with, although most of it is still actually dealt with Aztecs. Even if you put it in as data.
We know we've got as the algorithm a neural network with attention the so-called transformer.
And that's trained again on data and I'll try to very quickly.
Exemplify the difference between these previous approaches. So in previous approaches we had these rules and we had these extracted from about 700 turns of parallel monologue and dialogue. So that's about 5000 to 10,000 words in total.
We now, which at GPT, we're looking at a trillion words collected from all over the Internet, including Wikipedia, but also book collections etcetera. And the idea then is that we have a neural network that's trained on that information.
And in this case, this neural network has 175 billion parameters, so there are several orders of magnitude bigger than what we were doing. Also, using a lot more energy than we were doing in terms of the training.
And so the idea here with these systems like ChatGPT is that.
You start with a network and you give it these bits of text, but you blank out the last bits and you get it then, well, you transfer transform this into a set of numbers which is processed by the neural network. It then predicts the words that should be in the blanked out place space. In this case, it might have predicted place was actually the correct answer is city.
And what it would then do is it would recognise that that's incorrect and it will update the parameters or the connections in its network so next time it will will get this particular example right, or at least it will have a better shot at at that.
And then in the training you do this lots of times. So you take another bit of text, you again you blank out the last bit, you feed it into the network and you you keep training it in that way. This is called the so-called pre training.
And then there is further training based on human input. So in terms of whether a particular text is appropriate, whether it's safe, but also for instance, if you give it a question, then it should produce as the next thing an answer to that question rather than a further elaboration on the question and some of you on LinkedIn might have come across that yourself as well. There is actually still a lot of human input being asked. So I got one of these invitations recently.
Please help us with your expertise by rating relevance and factuality of math specific text produced by AI models. So there's still quite a bit of human input there as well.
So the fluency of these models is very good and maybe not surprising because it's sort of taken the whole Internet and somehow represented that in its 175 billion parameters to understand what words follow, what other words and how that can be done in a way that's fluent and coherent.
So is it explainable? No, not really. So in as I showed in our old example, we had these rules where you could actually see, OK, this is what is going on. There is a means translation which is turned into a question and an answer about that means relation was now we've got a bit of text and we we do some calculations in this neural network and we get another bit of text and it's very difficult for us as a human to exactly understand what's going on there.
In terms of accuracy.
It's OK, but as we all and even if you look at the most recent generative AI systems.
Deep seg. Gemini, et cetera. If you look at most recent paper from from last month on medical hallucinations, it's one of the findings is still that there are non trivial levels of hallucination persist and when we talk about hallucination that means information that's factually false or information that if we in our scenario if we give it a bit of text.
And we asked to turn that into a dialogue script. Ideally, the dialogue script should express exactly the information that was in the original text. But what might happen is that it introduces additional information that it might might have found somewhere else that's actually not in the original text, and that might actually be incorrect or even inconsistent with the original text.
Controllability though is, is high in the sense that you can ask it to produce text in a particular style.
You can also well like it like we show you can you can ask it to turn it into a dialogue script or.
A piece of text in a different language or in a in a version of the language that is more dated, for instance.
So in terms of open research, so chat, chip, PT wasn't very open at all.
So jtpd in that case, if we talk about openness, there are a number of things to consider.
So the data was not available. So what was actually the training data? We don't have a precise record of that. The source code is not available in terms of how the system was actually trained and then the model. So the actual ways the parameters are also not public.
And so that's one extreme. There are, however, also sort of intermediate cases. So for instance, if you look at Metas, Lama. So that's another generative AI. Again, the data's not really available. The source code is not available.
But the model is available so one one advantage of that is if you have these weights then you can also use something like. For instance there's something called olama which allows you to run models and not just just the ones from meta on your local machine. Of course from a privacy point of view that's a. That's a great advantage because that means you don't have to send your data over to whoever is providing the generative AI. More recently deep seq. So this is the Chinese.
Addition to generative AI deep seq R1 was a lot in the news, so they did not make the data available. Some of the source code is available and the model is also available and then sort of on the other completely. On the other hand, we've got the Hogan face small LM Huggin face. I think it's a French company, so they make their data source code and model available and this is also one of the the higher performing models actually.
So.
To conclude.
Why do open research and why do we want to make the data codes and weights available? Well, there was actually a paper by.
The late Adam Kilgariff, a colleague at the time in Brighton who wrote this paper called Google Theology, is bad science and I just want to sort of introduce a variation on that. Gptology is bad science, and why is it bad science? Well, the systems are not examinable, so it's not never quite clear what is causing what, what aspects of the system are causing are causing a particularly good or bad outputs.
It's there's only limited reproducibility.
And maybe also another point to keep in mind is that the agenda is really controlled by for these particular systems by commercial rather than research interests.
And even going beyond research itself, it may also be it's also there's some evidence that these systems are actually done also less safe in the sense that was recently.
The other month, again, a paper by researchers in and entropic, called auditing language models for hidden objectives. So they had the model a red team. So the the sort of the opposing team trained a model with hidden objectives like rate recipes with chocolate more highly.
Even if they're not really appropriate for that particular for those particular ingredients, and they then had a blue team that tries to find out what the hidden objectives were that the model.
Or degenerative AI was trying to obey, and they found that, well, if you don't have access to the training data data, so they had a number of teams who tried to figure this out. The teams who had access to the training data were able to figure out what was going on. The ones who didn't weren't. So that's sort of a final note on on the safety. And I really want to conclude here with a quote from one of the MSC students at the time, Brandon Wise.
Who contributed to the question generation share task and evaluation campaign.
And I think it's really nicely also emphasised this idea that research is really a community activity.
And he he found the the sort of open research community, very welcome. But I think it's it is really important in the sense that the sort of the underlying theme, I hope that also came across my, my in my talk is that research is not something that you just do on your own. You work with others and you build on the results of others. And open research is really crucial for that.
And so before I've completely complete conclude, I also wanted to acknowledge of course, all the colleagues that throughout the years have been, I've been collaborating with and who've actually contributed to a lot of the the sort of the ideas and the the work that I've I've shown you there. So thank you for your attention. I think it's time for questions now, Elaine.
Drs Beck Pitt and Irina Rets, Senior Research Fellows at the Institute of Educational Technology, OU.
Beck and Irina share five open practices that have helped support the co-development of Open Educational Resources (OER) on the digital energy transition. Using the example of the European Union funded Every1 project, which engages with diverse stakeholders across seven European countries, the examples detail how you can make your project more open, collaborative and transparent.
Beck.Pitt 0:03
Perfect. Hello everybody.
It's so great to see you today and thanks so much for joining us. Happy Open Research Week and yeah, we're here. My name is Beck Pitt. I'm with arena retts, my colleague, and we're going to be talking about 5 open practises to support effective capacity building. And as Neelam mentioned, we're going to be talking about this within the context of the everyone project. And thanks so much to Neelam and Emily as well for their support and for inviting us to participate as well. It's really good to be here.
So what we're going to do today is we're going to start off with a brief look at the everyone project, the aims, what we're doing. Just to put this in a bit of a bit of context here and then we're going to talk through five different kind of open practises. So we're going to talk about things like open licencing, but talk more about also openness in a broader sense. So things like transparency and inclusion and the different types of activities and ways that we're doing things within the project to kind of support that and enable the everyone.
To meet its goals, we're then going to kind of round it out with some bring all those strands together and talk a little bit about ways to make your project more open, transparent and inclusive.
So let's get started and tell you a little bit more about the everyone project. So the everyone project is a 3 1/2 year long project. It's Horizon Europe funded and we're kind of on the home straight now. The project will be ending in April 2026, but over the past couple of years and currently what we're doing is really focused on enabling and supporting everyone to participate in the digital energy transition.
So the use of things like digital technologies.
In terms of the production and consumption of energy, kind of raising awareness of that and exploring some of the issues around that. So for example, climate change, energy security and so on. We're working with the 10 other project partners from across Europe. So there's 11 organisations in total involved in this work and that includes universities like ourselves, like the Open University but also non profits.
Consultancies, membership organisations and SMEs, so a real mix of different people and organisations involved in this exciting.
Project.
So as you can see here on the right hand side, we've got a little diagram of what's happening within the project and the OU is particularly involved in knowledge gap identification. So really understanding where are there kind of gaps within the current offer of learning materials related to digital energy and the digital energy transition creating learning pathways so that people can navigate.
And work their way through different learning material.
And training materials for their own needs. And then also our involvement and input into the actual development of the learning materials themselves. And that's going to be the kind of focus of today's discussion. So you can find out a little bit more and head on over to the everyone website there on the left hand side on the QR code as well, if you'd like to find out more about the project and some of the wonderful work that our colleagues are doing.
But what we're going to?
Talk about today.
Is the everyone learning materials in particular? So as I kind of indicated a moment ago, we're kind of raising awareness and and looking to kind of support understandings and developing understandings around kind of the use of digital technologies in relation to the production and consumption of energy, the digital energy transition and engage in different ways with people across.
Across Europe to do that, so we're working with a wide range of stakeholders, stakeholders as part of this project.
I want to talk a little bit about that in a moment, but also really focused on those hard to reach underserved and marginalised groups as well and how we can really engage with people and support their kind of engagement and involvement in the digital energy transition as well.
So as was indicated on the previous slide, you know we've been doing.
To try and understand where we should focus our efforts in terms of creating new learning materials, listening to our stakeholders and external organisations.
The people that we're working with as part of the project to identify knowledge gaps, we're producing around 80 English language learning materials. There are a number of different formats. There's about 15 in total. So this is everything from online courses to online games to offline learning materials like case studies, information and engagement packages. We've got some secondary school materials.
We've got our digital energy essentials online courses which are on our open loan create platform.
And we've got a MOOC, so there's all kinds of things in the mix there. So it's very exciting and many of these resources are being translated as well into different languages.
Obviously these are aimed at different audiences in levels we're aiming to reach citizens and stakeholders across Europe, so you know really needs to make sure that we're addressing different people's needs there.
Co creation is central to the development of these, so we're really working with.
Colleagues, to understand their needs also involve them in the testing of those learning materials. And as we'll go on and talk about a little bit as we progress through this presentation, involve them as well in the kind of development of the learning materials and the content development of those and all of our learning materials are openly licenced as well. Now come on and talk a little bit about that.
Shortly, but our learning materials are available on ACC by share alike licence.
Licence and you can find out more about the learning materials. We've got 11 that are currently available, but obviously more coming soon on the QR code here on the right hand side.
So just to move on now and just start and tell you a little bit more and introduce you to our first kind of practise that I wanted to touch on today which is agile sprints and the development of our learning materials in the everyone project is structured across.
Two year period and in order to kind of structure that process of developing learning materials, we're using a kind of we've got 5 cycles that span this this two year 12 month period and this kind of approach provides a clear structure and specific points for us to engage.
Both with people within the consortium around the development of those learning materials, but also give a very clear indication of when certain things will take place, such as testing.
And showcasing learning materials to really kind of supports our communication as well with external stakeholders. So we have this kind of structure across across the period where we're producing the learning materials and there's some flex obviously within that to accommodate different different to accommodate different things. So within that we're also then using kind of Sprint methodologies to create these learning materials.
So these have been used really successfully on other projects and initiatives.
Both here that I've been involved in and elsewhere and externally as well have been really used to successfully create OER as well open educational resources or openly licenced content in other contexts. Something in particular places like British Columbia, where the open textbook development and stuff like that's quite often used Sprint methodologies to rapidly create content.
And bring people together to do that. So really it's about that.
Inbound structured approach that we're that we're using within the project that enables people to develop content quickly enables us to engage with colleagues that has this kind of iterative and flexible approach and enables us to kind of work together and really bring kind of people together and we'll talk a little bit more about that.
Shortly.
I'll now hand over to Areina. Thanks.
Irina.Rets 8:37
Thank thank you VAC so to prepare for the Open Research Week we conducted reflective interviews with consortium partners to sync together and identify whether what kind of capacity building practises we've we've had so far in the project and think together as to how the fact that the partners and how it it has affected the ecosystem so far.
So once we identified.
The splint methodology as a capacity building practise.
We also asked partners well, what impact has it had on you so far and partner said well.
Having this abundance of materials and we need to develop over 80 as as back mentioned was alarming in the beginning and no one was really sure how to approach this.
And the cycles were not in a proposal. So it's something that we introduced to the project. But dividing the development of those eight materials and they also each of them used to be tested, they need to be translated into multiple languages.
So dividing that process into short time bound cycles allowed manageable process.
And continuous improvement because we incorporate the feedback from the ecosystems following each cycle.
And this practise also help to engage partners, especially those not directly responsible for the final outputs and clarified responsibilities and timelines. And it also had a cascading effect on the ecosystems they also benefit.
That's so they reviewed content in stages instead of all at once, so it also made this process manageable for them. And this iterative process kept them engaged and ensured that with the next cycles, we also meet the evolving needs of the ecosystems. And of course, it's not as as rosy. So there have been some challenges along the way. So the key challenge with the agile sprints is working against very tight deadlines.
So it seems that we have, you know, two years to do this work. But if if you take into consideration that every material needs to be tested with ecosystems, this feedback needs to be incorporated, that needs to be translated.
It doesn't seem you know that long anymore.
So introducing flexibility into cycles helped in a certain to certain to certain degree, but also balancing flexibility with a structured.
Pipeline is a challenge.
But but yeah, considering the the benefits and the fact that it made the process manageable, it it was an impactful capacity building practise I would say.
Thank you. Back. Yeah, we can go to the next next practise.
Beck.Pitt 11:32
Thanks so much, arena. So I wanted to move on now and talk a little bit about the kind of way in which we worked together within.
Within the consortium to produce everyone learning materials together. So we're using this kind of Sprint approach. This rapid development approach. But in order to get people on the same page to move forward to create learning materials together in the way that's described, what we need to do is also ensure that we're all on the same page.
So one of the things that we did in addition to kind of regular meetings and conversations around this is to really talk with people and get agreement around both the kind of expert author teams who would be involved in producing different learning materials. So for example, with our online short courses, there's twelve of these in total. They're on different topics and obviously we're kind of bringing together people that are lots of different fantastic expertise.
And we're kind of looking at who would be best getting people to.
Align themselves with different kind of.
Course focuses so we can produce these learning materials, so one of the ways that we would kind of get that consensus is by bringing people together to have a kind of kick off meeting prior to the kind of Sprint work commencing. And this really helps get that kind of collective agreement and create shared understanding around activities and timelines as well as kind of building consensus.
And building the kind of team as well, who'll be working together because obviously we're working. We've got 11 project partners and we're all kind of working together. Some people have not necessarily worked together. So it's about getting that kind of getting that kind of, yeah team building in action there. It also kind of enables us to have an open discussion of any issues or concerns as well, which is really important for this work. And we kind of did that in a very supportive way. So this is I think is key as well.
So providing examples of of different learning materials of what things actually kind of look like, templates, the kind of support people's development of the content, I really did some work with people around personas, training on open licencing because that was something that was new to many people in the consortium as well. So all of these kind of things played in so that everybody you know has got the opportunity to.
Share their kind of expertise and understandings of different.
Topics.
Irina.Rets 14:08
And also in our reflective interviews, partners highlighted how we all have different ways of doing things. But one partner explaining the topic and sharing their expertise helped to consult them to adopt similar methods and work more as a team.
So as back mentioned.
Well, we create this created a unified approach of working with ecosystems, communication, project deliverables and also although indirect, this alignment improved the quality of the learning materials.
And yeah, we, we we leave a better impression as a team on the ecosystems I think so these are all the the benefits of using this practise for the ecosystems as well as the consortium.
The the challenge of of doing this is additional workload that's running. This training creates for everyone involved and it has led on quite a few of these trainings.
But.
As back mentioned earlier, now meeting.
Having a substantial time allocated to this work back package in the proposal probably should level up expectations of of people to a certain degree.
And yeah, we, you know, we should think of leaving some time for committing for for this online trainings.
Yeah. So that's it has been an impactful practise.
But it does come with a challenge of having an additional workload as well.
Beck.Pitt 15:48
Thanks so much, Irina.
That's great. So let's move on to the open licencing. Now I'm conscious of time as well. So I wanted to talk a little bit about the open licences and the way that we've used these in the project. So as I mentioned at the start, the project uses the CC by share alike licence and this is really important for what we're doing and I'll come on and explain a little bit more about this in a moment. So.
In case.
People aren't familiar with Creative Commons licencing. You can find out more on the QR code below, but basically these licences are kind of some rights reserved. They give users clear upfront permissions for the reuse of content at no cost and in specified ways without needing to seek permission from the creator. So we're applying a licence that will enable people to reuse our learning materials with attribution.
And if they would like to make changes to the learning material.
And adapt it for their own needs. They only need to acknowledge, acknowledge us and apply the same licence as the original resource and the thinking behind this was that we're looking to really support localisation of materials. Obviously working within a context with diverse kind of.
Settings within the European context we can include lots of wonderful examples in the work that we're produced, but we can't necessarily cover obviously everything. So giving people the option to change.
The examples to rework the content for their own needs is really important, and again for translation in the updating of learning materials that really fast moving terrain as well is is really important.
There's also the kind of support, as I mentioned, that's needed for colleagues to develop or deepen their understanding of open licencing. As I mentioned, colleagues are quite new to this topic, so providing resource and activities is really key to help people understand more about open licencing.
We obviously had a discussion around this licence type as well and chose not to, for example include a non commercial to use one of the non commercial licences so that we could.
So that people would be enabled able to use everyone resources within commercial context and then finally as well part of the project is is to deliver resources and activities to support external external re use of our materials as well and we're providing lots of support for people to do that and to raise awareness of of the fact that these resources are openly licenced.
Irina.Rets 18:31
And now, reflective interviews. One partner noted that how using open licencing from the start.
Allows the learning resources that we create to be used immediately upon release as ecosystems hold their own information events and they're happy to have other trusted sources.
And with other types of licences, there's sometimes an embargo, you know, for a couple of years you you cannot use certain material, but that's not the case with openly licenced materials.
And there was another quote which I thought was quite powerful around the fact that the project is funded by EU taxpayers, money and publicly funded output should not be locked behind a paywall. So having an openly licenced open licences does have this ethical implications as well and.
Creates it. It creates a world world that's more just for for everyone involved.
Then there is another capacity building practise that we championed in IC within the project and that's storytelling.
So what is storytelling?
If you go to the next slide.
It's well, as with with many approaches in education these days, it's a learner centred teaching so learning is and learning is at the centre of of materials.
And we do this by developing a narrative or journey that will resonate with different learners.
So we involve reflective questions. We use real world examples, share our own stories with energy digitalisation.
And since we have different types of materials that we develop in the project, this particular approach was used with articular presentations so interactive, a short interactive presentations about different aspects of energy digitalisation, and we almost followed, you know, elements. We built it as if we are writing a novel. So there is introduction exposition where we introduce the aims of the material and set out and motivation as to why we are writing this.
Then there is a rising action.
Where we sort of touch upon a conflicting question, we use a reflective question for that and quite a few materials that we are designing. They are own questions that are quite topical and whether is, you know, conflicting evidence online. So one such material, for example, is busting energy myths, which I thought I found really useful. I know, for example, there is a myth as to.
Whether living devices plugged in.
Substantially uses energy even when they're off, and if if you search online, you sometimes find conflicting answers to this question.
So using a reflective question, we sort of directly ask do you think that that uses a lot of energy at home? And then as the material develops, we answer this question. So there is a climax and then a conclusion. So we we try to follow with with some topics, we try to almost write materials as if we would write novel and this creates.
More engaging experience, both for people writing the materials and for people who would be learning from them. This approach helps reflect different audiences and motivations, but it's also an opportunity to draw on consortium expertise and understanding as this approach involves real world examples.
Interestingly, one partner noted in reflective interviews how using storytelling in right materials writing helped her.
Think about how she'll share her own research.
So in research dissemination, she also tries to talk about her own research as if she was telling a story. So that had a personal impact on on that person.
I guess the challenge of using this approach might be and that's what we heard from some partners, that it doesn't work with all topics, but especially the most technical ones, but it it does work for quite a range of topics. So that was a useful practise.
Beck.Pitt 22:58
Thanks so much, Irina. So I'm conscious of time because I know we need to leave a little bit of time for questions here. So we're just going to touch on the the final practise here. So this is you said we did and this was really connected to the continuous engagement that we have with our stakeholders. So once we've begun the kind of testing of learning materials and getting feedback on the things that we've been producing, what we wanted to do is share that feedback with people.
And to show them what we were doing as next steps so.
You said we did kind of became a regular item in the kind of showcases that we have at regular points throughout the kind of production of our learning materials and this, these these showcases also now have this kind of overview of the feedback received and our kind of response to that. So that's really about building relationships as it says here on the slide providing.
Support and transparency and building trust and kind of accountability.
Irina.Rets 23:59
And one partner noted how they've seen this in other projects where ecosystems or stakeholders would contribute insights at the start of a three-year project, but hear nothing until the end. So these practise helped keep ecosystems involved but also showcase directly that there we will hear them and their feedback matters and it's been addressed.
Even though the challenge of that is that of course not all feedback is actionable, particularly if ecosystems talk about well, we won't. Local examples from one particular context.
But you know, our materials need to appeal to a wide audience, European audience.
But there was a lot of value in using this practise to to keep our ecosystems engaged and also within the project consortium to reinforce a culture of responsiveness and yeah, accountability.
Beck.Pitt 24:57
Thanks so much. So just thanks, Irena. So just to kind of finish now and kind of draw those threads of discussion together and present back these kind of ways that you might want to kind of consider making your project or the work that you're doing more open, collaborative and transparent really it's about that kind of provision of support as I hope that's come out through the discussion, we touched on some of the challenges of that for example with time.
And so forth. But really building in that support and leading by example.
People really kind of gives a really good foundation and for both building relationships and trust within the consortium and externally that flexibility and iterative nurse, that kind of structure, but also keeping it flexible, enabling you to kind of respond appropriately to to things that come up as as as a development in this case of learning materials progressed.
Open to discussion and listening is just key and you're learning. Obviously all the time from colleagues.
As part of that process, you know, showcase expertise, champion people and organisations that are involved in the work that's happening, building openness from the start. Obviously we're talking, we're in open research week at the moment. That's really kind of a key thing as well when you're thinking about open research kind of building it in from the start, but certainly for projects as well, it's important to really think about how you're going to do things and be open in your practises and the kind of licencing that that you might use for outputs.
And then finally around sharing that progress and updates with stakeholders, setting expectations and so forth and and having a dialogue with people, a meaningful dialogue as well or really supports and contributes to that. So we're slightly over time, but hopefully there's time for some Q&A and thanks so much for joining us. Thank you.
Professor Elton Barker, OU, Dr Rainer Simon, former technical director of Pelagios, and Dr Sarah Middle, Archaeology Data Service and OU Visiting Fellow.
The Pelagios Network has allowed cultural heritage practitioners to link historical information about places online in a simple way; tools and a community has formed around this citizen science platform. This webinar is aimed at researchers, cultural heritage practitioners and policy makers interested in making data more discoverable and reusable, using FAIR principles.
Elton.Barker 0:03
So much to the open research team. I think this is a great initiative and we're really proud and privileged to be part of this. Ryan, could you go into the the first slide just for the?
The shows what we're going to be talking about today, so we're going to be talking about this open research digital infrastructure called pelagios. I'm going to speak first about I'm going to give a brief history of Collagios in 10 minutes. So that's going to be quite a challenge from project to community. Rhyno will then take.
Over to focus in on one aspect of the work within the the PLAGIOS network which is building open research tools that anybody can use, and he's going to focus in on one of those called Record Gita Studio. And then Sarah will finish things off again with the 10 minute slot to focus in. On one example, one use case of record Gita studio.
In one particular community community of cultural heritage, so first of all, some background, Ryan and next slide please.
And one after that, that's, that's just me. And this is where we're going to start. This is this is this is my world. This is the world of Homer. And you're looking at the catalogue famous catalogue of ships in the epic poem The Iliad by Homer.
Where we see all the contingents that Homer mentions fighting the battle for Troy on a on a map. This is my research. My research is in ancient Greek literature, Homer being a primary case study, and your question might be.
Why? What am I doing here? Why? Why am I talking about ancient Greek stuff when it's supposed to be about, you know, open research and digital infrastructure? It's a very good question. My interest is in. Really.
Placial forms forms of place, spatial forms within ancient Greek literature, and.
To help me with this investigation, I've been using digital maps and digital technology to do the mapping and to do the exploration of these representations of space and place.
And this interesting place identifying places in documents disambiguating places so as to be able to map them to be able to explore them.
This can help us actually contribute to web technology and to open research more generally. Specifically, linked open data Ryan next slide please. As soon as you talk about the web, I think it's always good to have a slide to acknowledge the founder of the web, Tim Berners Lee. This is a useful, useful quotation about what we are going to be talking about today, which is linked data.
Linked data is simply about using the web to create typed links between data from different sources. These may be as diverse as databases maintained by two organisations in different geographical locations, or simply heterogenous. Different systems within one organisation that historically have not easily interoperated at the data level.
Linked data relies on two technologies are the fundamental to the web. The universe is uniform resource identifiers. This is the one we're going to be focusing on, and hypertext.
Transfer protocol HTTP all this sounds rather complicated, all a bit technical, but essentially the HTTP we use all the time with web addresses, so that's the basic technology. The basic structure infrastructure of the web and it's spoken and the interest that I had in places interestingly can help develop this form of linked data. So that's the theory. Let's sit in practise.
To bring it to life and to make sense of it by in the next slide, please.
This link date in practise and there are two two parts of the screen I want you to focus on here the 1st is this single clearedase. Pleiades is a gazetteer, essentially a database of place information to do with the historical world, the ancient worlds. That's what this was, a resource I was familiar with. I got. I got familiar with with my interest in ancient places.
And what Pleiades does as a gazetteer as a digital gazetteer. It has information about places, and it dedicates.
A web page for resource about a particular place. The web page that we're looking at here is ancient Athens, or athenae in Pleiades, and it has all the information to do with that particular ancient place, including the ability to map it, other resources, et cetera. And crucially, it has what Tim Berners Lee and his colleagues were calling a uniform Resource Identifier, a Uri.
Essentially a code number or Social Security number for a place.
I know Social Security numbers can be seen here. This is a one for Pleiades 5 for Athens, either 579885. That is the character string. Computers are great with numbers. They're not great with.
Words. So rather than getting the computer to to think about athenite and using that as a means for disambiguating to telling the difference between different Athens is for example.
In another context, apart from when I'm working on ancient world studies.
I might be interested in Athens, GA, the home of REM, for example, well, Pleiades and Gazetteers are more digital gazetteers more generally, can help us disambiguate between these different entities by use of a number. That's what the computers are really great with. And just by way of illustration. Another illustration here is one for Sparta, another ancient world place, and you see the two mentions in the text.
And a sticky note attached them, and essentially that's what plagios is all about.
Sign up next slide, please.
The Plagios is a way of linking data to annotation to bring putting if you like sticky notes onto a web document with these Gazza tear Uri's with these special Social Security numbers for each individual places and by virtue of doing that you're then able to connect different kinds of resources as as Tim Bernerslee is in T Berners Bernersley and his colleagues were talking about do you able to connect different kinds of documents such as texts?
Databases.
At museum objects, etcetera. Next slide, rhino.
So when Plagiar started, it was a project is funded by an agency called Jisk and it was a proof of concept. It was. Can we use this really simple idea of web annotation putting a sticky note on a on a web document to start joining together ancient world resources. So here we have one knowledge community, Pleiades representing the.
A specialised community about geographical places and all the information to do with those places.
You have another community which is more my world full of ancient world resources such as the Perseus Classical Library, the German archaeological database, a database called NORMISMA which is dedicated to ancient coinage. You have all of those different documents and the places in within Pleiades have connections to each other. The ancient world resources have connections to each other, and then Plagios comes along using this idea of web annotation and.
TRIZS sticky notes in documents to start joining together these two different knowledge communities so you can start finding interesting connections between places by virtue of the documents that refer to them. Or you can start exploring connections between documents and remember by documents I mean texts and mats and databases, export connections between them by virtue of the places mentioned in them. Next slide, Rina.
That was that proof of concept. We then got some additional money when Plagios was still a project.
From the Andrew W Mellon Foundation in the states to see if we could scale that up, what would be the challenges if we move beyond my comfort zone, the ancient world yet narrowly defined as the Greco Roman world where we are well supplied with both digital resources such as a classical library, an archaeological databases and, crucially, a gazetteer dedicated to ancient world places? What would happen if you move beyond that world into other, let's say, knowledge traditions?
And mediaeval world, for example.
Where you've got early Christian, maritime, Islamic and Chinese documents, what would happen and what would be the challenges. Next slide, Rina. So that was the finding we got from the Andrew W Mellon and one of the critical things that we felt we needed to do was to start developing some tools so anybody could produce link data annotations for themselves. So that you could publish their documents in a linked data format. So their documents could be.
Accessible, interoperable, reusable. The fair principles, I'm sure. Are you all aware of? So here's one of the tools that we developed, the crucial 1, which is to enable anybody, any old tool, including myself. I'm not a technical specialist. Anyway, to enable anybody to do linked data for themselves. This is a text that I'm familiar with. This is a place.
A character string that I'm asserting is a place asserting is a place once I've made the desertion, I've highlighted that character string saying this is a place a gazetteer pop up comes up and I make the alignment.
I match this place in the document to the authority file in clear days or in some other gazetteer that has uniform resource identifiers, and by doing that I'm producing link data right. If you press a button, a pop up should appear and this is the scary bit that I don't need to know about. This is if you like the stuff that's happening under the hood. I just want to drive the car. This is what's happening under the hood. The pointy brackets, the pointy brackets.
Give you if you essentially the link data format that you're producing so you can then publish your information in a way that other people can then make make find it and make use of it right on the next slide. So that was the production end of things to be to produce link data. We also wanted to produce a showcase so that people can get a sense of well, why bother? Yeah.
What is this link data stuff in it? What can I do that I couldn't do before?
So this other tool that we developed called periplayo and at this point I should say that my role on the project was largely coming up with the names. So Plagios actually is is ancient Greek for of the sea planning with the idea of the web of sea of data. The Internet is a sea of data. So plagus is, you know, the sea, the sea of data essentially connecting things up like a sea does in the ancient world. And periplaya means voyaging around. So you voyage around that sea of data.
And this is a way of starting to explore linked data sets in here, and you do that by by the fact that they are referencing the same places. So here we have a focus on Athens and you can then start to explore the ancient world resources or other knowledge tradition, resources that have some information to do in that place all on a all on a handy map visualisation. We're going to come back to both of those those tools in a minute. Once I've stopped talking.
Ryan then it's like please.
And fundamentally, my role in the project, apart from coming up with these ancient Greek names, still have resonance for today.
Was helping to develop a community and that's really one of the key aspects. One of the key points I just want to leave you with the idea that linked data isn't just about linking documents. It's not just a technical thing, it's also a social thing. And that's why we talk about Plagiar essentially as a as a social network.
As a as a Community, this is still part of the Andrew W Mellon Foundation. This move towards the community and here you have the different communities that were envisioned as part of the project, an ancient Greek special interest group. That's what SIG stands for.
A gazetteer, special interest group, etcetera. Ryan, last slide for me.
And this has now become formalised as an open association. So plagiarism is a project. It's founded by Jisk. It's founded by the HRC, the Arts and Humanities Research Council. It's funded by the Andrew W Mellon Foundation up until 2019. Since then, we've been existing independent of funding as an open association of partners, equal and interdependent partners.
Anybody can join us. These can be individual searches.
Research groups, projects, institutions.
We are agnostic.
You you join us and you become part of this conversation of how to develop.
A social technical infrastructure for linking data online, for making it really easy to discover connections between things on the web, particularly for historical information, historical places for, for example, or historical time periods, or historical people crucial among that.
It has been the move towards trying to sustain digital tools so that they're not just developed within an individual project and you have the problem of when that project finishes and the money dries out. The tools then just degrading and becoming useless of people.
Or a problem of just keep reinventing stuff.
Because you to get a project to get project money you you need to be able to sell the idea that you're developing something new.
The plug your network is community has started to move towards.
The social infrastructure so we can start to sustain tools we can this idea of a distributed resilience. We can start to sustained board through the partners who are part of the GLAGOS network and that's what I want to leave you with now with Ryan to give you one example of one of these tools that we're helping to sustain.
Rainer Simon 14:27
Thank you, Elton. Yeah, so as Elton said, I think the the social aspect is certainly the most important part about.
Pelagios. But of course we are dealing with kind of this technological underpinning, the pointy brackets, as Elton has said, and it all to make this actually usable by the community, we need to build tools. And as Elton has said, that's the challenge of course, because tools require upkeep and maintenance. You need to develop them further. We have built various tools in plug as I would say, but what I want to talk about specifically today is one tool which I think was particularly successful and and had a long.
Health life. So we started our first release in 2016, actually in 2014 with an internal tool pre prototype. I would say that was used by the team in the project in the Mellon funded round of the work that we did. And then in 2016 we built a tool which was public on the web. So you could go there, you could create an account and you could annotate your own documents with those kind of Social Security numbers.
Again, it was helping job to give a name to the tool. The tools called Recogito. I think that's like a rethink and reconsider, right. You're going through a text you.
You read it, you make your assumptions about it and you make annotations.
Yeah. And the cogito is has been successful.
I I would say just because it's it's still around after such a long time, that's already a success.
All all software that we built is open source, which means it's already essentially community based at the core, so people can come in, they can use it for free, they can set it up for themselves, they can hack it and modify it. In practise, it takes a little bit of testing.
The teas to do that so it's it's more of a question of getting involved and and talking to us the kind of the main, the main maintainers of the system.
But yeah, we are the owners of the system. We just happen to fall into this role of those who have developed it and are now kind of curating it. But everybody really is, is welcome to, to contribute.
The origins of the Korito on this slide. So yeah, we started in 2014 and had funding from the.
Foundation. Very generous funding, I should say. Until the year 2019. After that funding was over, but we still managed to keep it running again through the community. I would say so. I was working at a Research Institute that, for example, volunteered to host a public instance of the tool for free for the community. And these are the kinds of ways that that keep the tool alive.
Yeah. So the tool is focused on semantic Geo annotation, which is just another word for what Elton has just explained.
And this has been sort of the main use case for it, I think. Yeah. As I said, it's been successful. So what you can see here a little bit this in the background, if you go to the tools, if you used to go to the the old, the original tools website, it will show you some statistics on current activity. So we had we ended up with 15,000,000 annotations, which is like a little bit of cheating because there are also kind of automatic annotations that weren't created by users themselves.
But I think there were at least two million annotations where people actually created a manual annotation. There was some real engagement and some real use for research and for teaching.
At every given point, I think we had be around 102,000 edits every 24 hours, so people were using it in breasts. Usually when there was like a class activity or when there was like, a particularly engaged researcher working on it. So there was always kind of constant activity in it.
Yeah. We also got feedback about the tool being easy to use, which personally as the person who designed the user interface, what may be happy the most. So I think we managed to kind of create an environment which was fairly approachable for new users especially. And this is my favourite image here. This is from a crowdsourcing community workshop in Mexico where Ricogito was used in the public setting with people of all ages.
Annotating historical maps from from the from the regional regions of of Central America.
So as I said, 2019, the money ran out, the tool still kept going, but it got old, it got shaky and a little bit brittle.
So again, the community picked up eventually.
We started just two years ago.
A. A collaboration with the University of Bonn in Germany, who have been a great supporter, they essentially said we like Cogito Studio. We like the cogito, but it's old and we want some extra features.
We're also willing to contribute, so they funded a new round of recovery to development, which is a complete redesign. So it's it's a completely new tool which was necessary after all the time of idleness, I would say so after having the tool lie around for six years, you really need to start scratching away. And it also gave us a chance thanks to the generosity of the University of Bonn, to rethink the whole architecture, to keep the features that we thought we are happy with, but also to improve the things that.
What so happy with?
Things that we kept were, for example, support for the same kind of document formats. You can still upload text, you can upload TEI, which is a document format for structured text, which is popular in the digital humanities. You can encode kind of semantics if it's a letter, if it's a play, and if it is a play. You can like, encode the different roles. Who's saying what? If it's a letter, you can encode the different parts of the letter, so it gives you what more semantics, and it's specifically geared towards the digital humanities and especially the kind of audience that we're working.
With in in recovery the studio, but it also adds a range of new features. Most importantly I think, and most excitingly, it's now supporting real time collaboration. So it looks a bit like Google Docs. If you're in a document you can share with other people, other people can come in and you see their cursors. You can see how they select annotations have different colours and the kinds of playful things that you also know from from Google Docs.
Another exciting feature I think is PDF support. Maybe that's not.
In the digital humanities as well, but I think it opens up just a wider use case for education more generally because in many cases, no matter what discipline you're working in, you might always want to discuss research papers with your students, for example. And those will always be PD FS. So it's not really great to have support for uploading apdf and doing collaborative annotation in Recovery studio as well.
Maybe briefly mentioning. So there is also a plugin framework that's a bit of a technical term, but it really means that there is a core platform of recovery tool which doesn't do much.
Maybe you can just take notes. It doesn't actually even include the Geo tagging that was shown, but then there is a mechanism for extending it so communities that have special needs like the Geo tagging, they can write an extension for recovery tool and the Geo tagging extension was actually of course the first extension that we built for the new recovery studio to kind of emulate the old system. But that's not the end. So there might be new extensions coming up.
For art history, for example, that want to use different kinds of tagging systems.
These are all things that can be extended in the new recovery studio platform.
Yeah, and just without further do it, but just want to show you a few impressions of the user interface so it looks a little bit cleaner I believe. But other than that it looks pretty similar to the old system. So you still have this idea of being able to select text to get a pop up which allows you to add comments and tags.
Yeah, here's an example of the PDF interface so you can see just an ordinary display of your of a research paper. You have the same abilities to to Mark and select text. What we can also see here is that we now have also a sitebar.
Which will list the annotations.
Either chronologically or in the order of the text, so that kind of a different view, which is more useful, especially if you want to look at to into text that's already heavily annotated by other people. So it gives you better overview.
And that's the image and location interface. You haven't seen that yet. The older Cubito had a similar one too, but it wasn't that advanced. So this essentially gives you the same features. You can create shapes. You can add notes to them.
It supports multi page documents as you can see here. So there's like a a full manuscript with multiple pages and you can work with that.
And just briefly going over this, it's also a lot better than the old recoil, the studio, when it comes to managing classroom work. So it's really focused on the educational use case, I would say.
The way it works is you set up a project and the project means you upload your content, you invite users and then within the project you can give out assignments. So you can say OK here are here's one document and they want these three students to work on this document and these three students should work on a different document and then as a teacher.
You have a way to sort of jump into each assignment. Take a look what students are doing. Help them out. Give comments to students specifically, and every every assignment is essentially like a layer in the document, and you can either look at one layer in isolation, or you can look at other layers in combination. You can let one group of students look at other groups of students if if they've. If they opt into that. So it's it's kind of a flexible system.
Of dealing with different layers of annotations on the same document.
Yeah, that almost brings me to the end of my part. So yeah, it's open source software. If you are interested in trying it out, please do get in touch.
There is a demo instance that people can try out. There is also an instance of pelagios of course. So pelagios if you are a Pelagius network member, you have access to a specific recruiter studio instance where you can also upload your own content and and do your own your own projects.
Right. And I think with that, I would hand over to Sarah which who will talk about a specific use case in.
Sarah.Middle1 25:02
Thanks very much, Ryan. I'm Sarah middle. I'm going to be talking about an open societal challenges project that I worked on, which involved applying Recapto studio and another plugios tool called Periplayo to cultural Heritage data. Next slide please, Reina.
So for those of you who aren't familiar with it, open societal challenges or OSC is an Open University initiative that facilitates collaboration with organisations outside the university to tackle real world challenges. And in ROSC project we evaluated Recidito studios to annotate museum's collections data with the aim of drawing out narratives and visualising object itineraries, and also to facilitate this process for museum professionals that have limited technical experience.
Next slide please. Right now.
And our focus for the digital storytelling around these objects was this idea of the object itinerary. An object itinerary consists of a sequence of events including creation, use, alteration, movement and acquisition, and might ultimately lead to its decay or destruction. We might also look back to the formation of materials from which the object was constructed, or look forward to future receptions or remediations.
Each of these different events in the object's life occurs in a given place at a given time, and it often involves the actions of a person or organisation.
And as such, these events and relationships should be amenable to representation using linked open data technologies that Elton talked about earlier. Next slide please, Reina.
And the data set we used was a sample of collections data from National Museums Scotland. This comprised metadata about nearly 400 navigational instruments of different types, predominantly from the 19th and early 20th centuries, and they include sextants quadrants, compasses and astrolabes, many of which played a crucial role in exploration and survey expeditions. Some have links to imperial expansion and colonialism, and therefore have many stories to tell.
My folk, my work focused on a subset of.
Of these objects, although the methods that we used to be more broadly applicable to the wider data set.
While the collections management system used by NMS facilitates the representation of much of the fundamental information about these objects as structured data, a lot of these richer stories are contained in unstructured fields containing, for example, research notes, descriptions, and previous exhibition labels. And so it's these unstructured fields that I focused on for the annotation work. Next slide please right now.
To annotate the NMS data I used a data model based on that produced by the Linked Art initiative for representing cultural heritage data. So I annotated the text by adding tags based on entities and relationships from that data model as well as some bespoke tags for concepts that were more specific to our data sets such as voyage and war. In doing so, I found Recagito studio flexible, configurable and easy to use, and I really like the collaborative features as well as the different ways of viewing your annotations.
And have a one issue that we find is that recreateo currently lacks named entity recognition which is available in its predecessor, which would make the annotation process a lot more scalable. So at the moment all the annotation needs to be manual rather than automatically recognising the names of places, for example that are in the text.
I also missed the functionality from the old record gitau of creating relationships between annotations. This might have allowed better alignment with the properties in the data model as well as the classes.
And it would also be great to be able to link and other entities to authority files. So for example, linking people in the text to via the Virtual International Authority file or object types to the AAT, the Gettys art and architecture thesaurus. But Rogito studios modular structure that Ryan mentioned does enable different developers to contribute plug insurance, which should eventually extend its functionality beyond that of its predecessor.
Next slide please, reinner.
So as well as evaluating Rakito studio ourselves, our colleagues Maria and Sarah, who are based in IET.
Conducted a small scale user evaluation with researchers and cultural heritage professionals. Findings are generally positive in the participants found Recogito studio easy to use overall and found the Geo tagger functionality that we developed this project particularly helpful.
There were, however, several minor comments about aspects such as navigation and iconography that refer back to rhino and the other developers, and like us, they also suggested that automatic tagging would be a really useful feature to incorporate in future.
Next slide please, Reina.
And so to follow on from the annotation component, we wanted to produce a more interactive visualisation to illustrate the itineraries of the objects whose data that we'd annotated. And for that we chose to use another openly available pelagios tool, paraplayer, which Elton mentioned earlier.
So Para Player comprises a web-based spatial visualisation that facilitates discovery of digitised objects from multiple sources based on their annotated places.
Following the British libraries, locating a national collection, a national collection project.
Parapleo is now openly available as a GitHub repository that can be cloned and customised for the data set of the users choosing and then deployed on GitHub pages and alongside this repository they've developed an easy to follow tutorial which is great for interested users with limited coding experience like me.
To work properly, poweroplay requires Jason LD data that is compliant with the linked places format, which also started life in plagiar as the geotagger data from Rekogito Studio can only be exported as Geo Jason, this meant that I had to make some money transformations to the data to ensure that it would be compliant. Next slide please.
So while these transformations do result in a map visualisation imperiplaying the exported Geo tag data from Recagito studio lacked the contextual information about annotations and objects. So the initial visualisation ended up just being a collection of points on a map. Next slide please, rhino.
To make it a bit more interactive, I manually updated the data file to include links to relevant objects.
In the NMS online catalogue, as well as giving the context of the geotag text, and to emphasise the connection between the map locations and the museum objects, I added an object, identify property to the data set and configure the visualisation to display these identifiers as a facet. When the Philtre icon is clicked so that you can for example just view the locations that are attached to a particular object. Next slide please.
And so my periplay visualisation was obviously based on a really small data set of only 9 objects, but the potential of peropo becomes much more apparent when you can see how it's been used to visualise larger data sets.
And I'm going to give some examples here. So this first one was produced by the locating a national collection project, and this particular initiative was called Heritage for all. So the project team collected location information for items of historical interest relating to a local area. So this included buildings, monuments, fine spots of coins and literary references from famous novels. And they visualise these as points on a map.
Of the local area which used periplayo with the aim of engaging general public in their local history.
So these particular maps show the distribution of these places of interest in both East London and Exeter. And it's quite interesting seeing the different distribution patterns in each area. And it also demonstrates how openly available tools like parapleo, even if they started life as a research tool, might be used for public engagement as well as applications for local history or tourism. Next slide, please, right now.
This second example is called mapping antiquity, and this is produced by the Fitzwilliam Museum at the University of Cambridge as part of their linking islands of data project.
So catalogue data from their antiquities collection was converted to linked open data with the fine spots of the objects visualised on this map. What we found interesting about this particular map was that there seems to be a particular large assemblage of objects that were found at talk C in Lincolnshire, so this seems to be a very particularly well documented place in the Fitzwilliam Museum collection, so this visualisation could be a starting point for further research relating to places like this, where many points seem to converge.
Or conversely, to identify places that are currently less well represented.
In the museum's collection. Next slide, please, rhino.
And so if you'd like to find out more about this project because obviously this has been quite a whistle stop tour, we've produced various openly available resources. So there are three blog posts for the OSC blog that have been published during the past few weeks. The most recent one is hot off the press from Monday.
We've also got an article in the Journal of Open Humanities data, and this is to provide an enhanced explanation of our data set that we've published in order order under ACC by licence. So this data set comprises the data model, the annotation CSV.
And the tha sun that we exported from Recco ITO studio as well as the enhanced Jason LD file that we used to visualise the data in parapleo the next slide please.
And then for the next steps in terms of this work, we'd like to improve the pipeline between Rekogito Studio and Para Playo. We want to make the process easier for taking data from Reaggito studio and then using this to create a more informative visualisation in parapleo without a lot of manual editing or coding knowledge. We've made some suggestions to Reiner and the other developers about what that might look like from a user perspective.
We'd also like to look at developing workflows for how the annotations, along with more structured components of the collections data might be converted to RDF using existing freely available tools so that it's fully available and that link data format. We've also been looking at named entity recognition so automatically recognising particular places, people or object types. For example in a data set and annotating them with the corresponding type from a particular vocabulary.
And so hopefully with the new modular structure of Recaptito studio, somebody will be able to secure funding for this type of work to take place in future. And indeed, these conversations are already starting to happen on other projects.
Additionally, having communicated the feedback from our user evaluation and from our own experiences to the developers, we're hoping that future funding will allow these issues to be resolved and for future user evaluation to take place.
But the next project that follows on from here, which Elston is also leading.
Is a project with the campus and Community Collective who are a community group interested in digitally curating their local heritage. So they're going to be using Repagito studio with an extension of the data model developed for our OSC, alongside another tool called Memory Mapper, which was developed at University College London. We're looking forward to teaching the results of that project. Next slide please, Reina.
OK. So thank you very much for listening. We're happy to answer any questions that you might have.
Janice Ansine, Senior Project Manager, Citizen Science, and Mike Dodds, iSpot Curator.
Spot is an OU award-winning citizen science platform for biodiversity. This webinar shares highlights from iSpot’s rich biological recording data set, online community engagement and activity posting observations and species identification. Examples of teaching are also shared, as well as contributions to research and society, in the UK and globally.
Janice.Ansine 0:04
Hello everybody, apologies for the initial work there. You know teams that it is as as Neil said, I'm Janice, Ensign signed from senior project manager Pacific in science and I'm happy to talk to you today about about Ispot. So I'll be doing this with Mike, Mike Dodd, my colleague. So we're going to go through the slides and then we can have some discussions, right. So I spot.
Is.
A citizen science platform. That was.
No, my slides are not good day. Sorry if this is a science platform that was developed and led by the Open universe some years ago. So but one of the few things I wanted to start with is just to have a little bit of discussion about what is citizen science now. Citizen Science has been going on for a number of years. It's not the beginning. They even say Charles Darwin was one of the early citizen scientists, but it became more popularised with more activity that relates to.
Research.
And and and and. How we can get volunteers and anybody else involved in research linked with professional scientists and the definition actually went into the Oxford English Dictionary around 2014 and it has involved as a massive area of practise that we've been involved with here at the OU. Now the interesting thing is that for this, for this session, is that we're looking at it in the context of open research.
And open research offers a wide space for citizen science.
And the synergy that comes here is it it's while open, open science, our open Reese open research through open science.
Is about how science is done and shared. Citizen science is about how people can participate and how they can do science, and that's what makes it so exciting. For this synergy, we're going to talk about how I spot displays this in different ways.
So some of the connections I want to establish here are around citizen science links with open research, teaching, learning and engagement, and that the all you would know what the OU is. You know what the OU does. We focus on global reach, higher education. We're known for that. And one of the things that has happened over the years is that citizen science has been integrated through what we do in terms of the emerging technologies and so on and particularly in STEM where we are based.
So in terms of the research area is wide-ranging wide-ranging, we do biology, biology, biological biological recording in our context.
I have to find a collecting and analysing data technology is for Citizen Observatory, so building in new things like AI for example Citizen enquiry Enquirer which is research around how people get involved in what they do to a certain degree and importantly in terms of our core business is what it contributes to teaching and learning. And I've participated quite bespoke to that and we'll talk a little bit more about that in a bit. The other link we have with citizen science at OU is how it connects with practical science and and for example.
What was integrated quite early on in the Open Science lab when it was started developing and also what scientific fact can actually do for in field work in terms of getting people outdoors and helping them to build their skill sets on the other side of things, there's engagement and outreach and specific science is an amazing tool to facilitate this in terms of getting people involved in different ways but giving them their hands on approach or in terms of outreach reaching them through the through the different collaborations we do with the BBC.
As well as through regular media and comms.
So a little bit of a snapshot of the early days that the OU and how before before the use of the terms, it's a science so to speak. The OU involves students in, in, in, in, in, in, in this type of research that brings in that component of students going out there collecting data and this feeding into research. And this is a snapshot of something that was done in the 1970s to get students to collect data.
Pollution and data around sulphur dioxide.
And this was actually linked with an early module so so where, where, where are we now? That was the 1970s. Where are we now?
We have ispot and I spot was developed the the initial the start of Isot was around 2007 when we got this fantastic pot of funding from the Big Lottery Fund under an Opal wider initiative, which initiative actually run from 2007 to 2019, which is basically a lot of money.
Like 12 / 12,000,000 lbs to get people engaged in exploring nature. So that was fantastic. And one of the key things we had to we had at the OU was to create icepot. And the thing about icepot is it was basically an online space where we could actually have people sharing nature. We wanted to a little bit more. We knew that people were interested in nature. We knew from our experience of the BBC programmes and so on. So I spot was developed to sort of move people off the sofa to get outdoors. So what are the games?
To lower barriers to identification, to build identification skills, to make nature accessible, to support a new generation of naturalists. Because of course, if there is a situation where some people, the the people in this area, require all they want to get young people out there doing more and to contribute to biological data recording, we wanted something that was free and easy to use with integrated features and tools. Build an online community. I'm not just experts but also novices, those who just want to learn something. I want to do more.
And the key thing was we wanted a technology that people could upload, an observation, a photo.
And have the community gather together to identify it. So it was achieved, we we we we started development when we got the grant in 2007 and we launched to the public in June 2009 and there's something a snapshot. So some of the successes that came out of this. So for example one of the early discoveries was by a little 6 year old girl who found a moth on her window sill. Her dad knew about. I spot posted an I spot it was a rare moth. It hit the press, it was amazing.
And we couldn't have had better press than that at the launch that happened in October, we launched in June and I thought really took off.
A couple year a year or so after that, we won the the the Wilds Green Panda Panda Award, which was amazing. To win that. And as we developed the the platform further and sought further support, we received love the world. Some people like Sir David Attenborough, about what the power of ice but can do so a little bit more about what the platform is so.
The the key thing about Icepot in terms of how it's evolved because it's it's it has evolved over the years and we're going to talk a little bit more about some of the aspects of aspects of what we do with data. But before we get to that, that Mike will talk about.
Just want to have give you a snapshot or with a stop tour of how the pro the platform works. So the key thing about isop we have to do these four synergies of what you can do with ispot to help guide people on their journey through using the platform. So you can explore and you can browse the thousands and thousands of species spotted so far of your observations. You can see the carousels there with the different observations. It's spit at the moment in a couple of different communities. So you have global, you have UK and Ireland and you have Chile, southern Africa and.
Song 'cause you have some collaborations that get had early on, but you can Ireland community is quite active and the global space is quite active as well. So what you can do, the key thing is record you can submit observations. You can try to identify them or you can work with the community to identify that. And another key thing is identification and and in doing that we have a number of different tools and resources that can help. So we have species browsers and we integrate actual.
Species listings from the various the repositories across the globally and in the UK.
A key thing about this is how you build your reputation as you identify, and this is quite important because it's quite unique to I spot in terms of how you build your reputation in these different areas. You feel we we philtre into different groups as you can see the little icons there for the different groups and and and and then you, you, you post an observation and it appears within the platform you have to register to post. So you you join the team, we can browse and explore but you have to register to actually.
Post an observation and engage with the community. Other tools include like forums. We can talk with others and share as well. But another key thing for us is learning that happens. We have tools integrated to facilitate that where we can. You can collaborate and participate using using projects or you can participate using the ice pot quizzes as well. So this is just this is just for some of the things that you can do with icepot.
I'll hand over to Mike.
Michael.Dodd 9:00
OK, so I'm gonna look at the aspect of the the social network that I spot is so in here on on the right hand side, we've got a map. So what I'm going to talk about will be illustrated by maps quite a lot.
The map shows the dots are observations and the lines connect those observations to a particular person. In this case is myself.
Was interacted with those particular observations who are made which were made by other people. So they're not my observations. So if they're a yellow dot, then I've made comments on those observations. If there are a blue dot, then I've agreed. And if there are a red dot, then I have made the identification myself, so it's showing.
How I'm interacting with large numbers of people all over the UK and some of those lines are going.
In into other parts of the world. So it's showing that there's lots of interaction beyond just the the, the UK.
If you move on to the next.
And so you can extend that to. Here I've shown all of the observations and all of the the people who have interacted with those observations, but only the identification. So the previous picture was for, for for comments and agreements as well. But these are just identifications. So, so.
With the blue dots are the observations and the red lines.
Take those observations to the people who have given the likely identification for those.
Well, that's absolutely observations and you can see it's a. It's a global system.
With large numbers in the in southern Africa and in northern Europe, OK, if we move on to the next one. So I've then this is this is the same data, but I've split it down.
By certain groups of organisms. So here we're just looking at fungi.
Birds and I'm focusing it in more on on the UK so.
If you look at the the image on the map on the left, this is the fungi and you can see there are limited number of nodes there. So the nodes are people. So these these are the people who are providing the identifications and they're providing identifications mainly within the British Isles. There are not many lines going out.
Of the British Isles.
So that that's a fairly tightly knit network of people and there's not very many people providing large numbers of identifications. If you compare that with the birds on the right hand side.
This looks a bit more messy. There are. There are probably more people providing identifications, and the identifications are not just within the British Isles.
They are spread out round the globe so that those lines disappear off into other places. So that's indicating both that people from other places are identifying birds in Britain and people in Britain identifying birds in other parts of the world. So.
People seem to seem to think that that, that.
They can identify birds more globally that they're they're not. Not that many birds in in the whole planet and some some of the bird bird watchers.
Have have pretty good idea of what many of them are.
Well, the fungi, those experts.
Tend to like their own country or or or a bit frightened to to identify things outside their own own area because there are huge number of fungi, many many times more hundreds of times, more fungi than than birds. And once you go outside your local area then you might be finding something completely different.
Yeah. If you move on to the next one.
So if we're looking at ways to ensure accurate identifications.
One of the things is to look at the agreements and who has agreed. So we, Janice mentioned earlier that one of the things that.
The likely identification is arrived at by a reputation system.
But also we we'd also look at agreements.
So here these are the observations in the British Isles, and for each observation I've shown, how many people agree with that observation. So you can see that some I don't have many agreements or none even.
Others have loads of agreements, so you can see that there's quite variable and this often in depends on which group of organisms it is. So if you go on to the next one, we look at that in more detail.
So here, here we are. We've got this is the OU campus. If you can recognise that just a piece of it.
I haven't got a scale on here, but I can tell you that the the darker the the colour, the more reputation those observations have. So so that each cross is is an observation. So darker colour has more reputation. All of these are past the threshold to be likely. But even within that the darker colours are more likely. If you go on to the next one.
I've then chopped the data again. Again, this is the area campus.
By those different groups of organisms. So the, the, the, the amphibians, birds, fish or whatever. So we can see across the the OU campus we've got a whole range of different things have been identified plants and birds and you know a whole variety of different different things. So I'm now going to bring all that information together. If you go on to the next one, Janice. So here we have.
Two graphs on the left.
This is globally. I've hammered all that all that data.
And I've looked at the average number of agreements for each observation.
And divided that by the different groups of organisms. So for birds we generally have lots of agreements, whereas for fish or fungi we have rather few agreements.
And to me this this suggests that there are not many people who know about fungi and that they're difficult to identify compared with birds, certainly.
The the the reputation.
Graph is a little bit different.
That has plants as being very high.
And and birds not quite so high. So it and it's reputation that actually the system uses to give the the likely identification.
And so it is saying, saying approximately the same thing.
But here.
I'm slightly worried that there are a couple of experts who are incredibly active. One person in invertebrates and one person in plants, or a couple of people in plants and.
So they they were giving a huge number of identifications. I mean, they knew what they're doing but but that that that made the reputation higher, the average reputation higher. So so it's just showing it's slightly different aspects of the same kind of thing can move on to the next one.
OK. So we're still looking at the community and and this this is looking at other aspects, other things that the Community have been doing.
So they don't only give a definite they don't only make observations give identifications agree.
Comment and use the forum. They don't only do all that kind of thing, they also like to help with data cleaning and help with the general improvement of the platform the platform.
And.
So they they do a whole load of things, projects on different things.
And at the moment.
One of the people is doing something on habitats, so when you make an observation, you've always been able to note down the habitat, but this person who is trying to make sure that people know how to describe the habitat or how to select the habitat the best way they can.
OK. So if we move on to the next one.
So I have plotted the observations broken down by habitat, and I've zoomed in on on an area. The background of user a Google image, an aerial image and at the bottom of the picture is the sea. The darker area. This is a piece of Orkney.
And the the yellow dots here are marine, the marine habitat, so the user has selected marine and they put put the location on and you can see that the yellow dots are in the sea.
So they're right.
The green dots in the middle are gardens and parks, and there is particular user who lives there, and that's their garden where the red where the green dots are and behind the green dots there are some black dots and that's their house. And so they're indoors there.
If you look up, look up to the towards the the right hand side, the darker area of the image that indicates a bit of heathland and there are some pink dots there which is matched up with Eastland and the there's a line of white dots and I've actually been here to this particular particular location. I can tell you that the line of white dots is is the track down to the house and it's basically if you grass and there's a grassy track.
And there's there's also a line of brown brownish dots which are.
You know, a small piece of Woodlands. So it does seem to match up pretty well, but there are occasional dots which don't match up. For example, in the field at the bottom, there's a black dot, and that's not indoors. So. So obviously that that DOT is a little bit away from where it should be. So this gives us a way of of checking some of the things easily. So we check things.
The whole right of ways, and this is one way we can check things.
So if you move on to the next one.
This is looking at we we've zoom zoomed out a bit so and zoomed down to the South of that the country, this is the Isle of Wight.
And the to the top right hand side of the picture is the New Forest, so it's an area forest and if we look again we've got the yellow dots which are coastal in this case I've changed the scale a bit so it yellow dots are coastal and they are as you can see around the coast.
The dark.
Green dots are woodland and you can see in the New Forest area there's lots of green dark green dots which matches up.
If you know the New Forest, it is not only woodland, but there's a lot of heathland there as well. And if you look at the dot, the the area, there's lots of purpley dots and that matches up with heathland so that that it all seems seems to fit quite nicely there. If you look in the the the area the blue.
That's a rectangle. There are a lot of sort of white diamond or.
Areas which match up with gardens and parks, and that area is Southampton and down to the Southampton is is Portsmouth, so the whitish background is urban area.
And the urban area often has. Well, it is the area which has the gardens and parks, which is what what you expect. So I mean this is this is indicating that the the locations are generally fairly accurate. But again we've got a few of those annoying dots in the sea which shouldn't be there.
So that there's a few woodland ones which are sitting in the the English Channel, which it shouldn't necessarily be there if we move on to the next one.
OK, so this in my view this is the most important slide in the in the in the set this is the use of I spot open data in research papers.
So the data from ispot.
Goes on to national Biodiversity Network and from there on to G BIF, which is a global biodiversity information facility and the top set of data there, 140,000.
That's only been on there fairly recently and it's already had 72 citations in scientific papers. The other data sets have been on there a lot longer and they they've got a lot more citations.
So so far only quite a small chunk of the I spot data is on G BIF but already we've got a lot of citations of of, so a lot of people have used our data in in writing scientific papers.
I think we'll move on to the next one.
Janice.Ansine 23:07
Yeah, I think that's it for you now. Thanks, Mike. So we've explored the beginnings of I spot what I spot does what you can do with it. And we've looked at some of the data which might went through and the functionality not going to talk a little bit about what we contribute through research projects. Now this is a very horrible slide, but it demonstrates some of the projects that we have been involved in through the different science and artificial intelligence research group.
Michael.Dodd 23:08
I don't you take over. Yeah, OK.
Janice.Ansine 23:35
Folks, some of them focus on I spot a link to an I spot others link another platform which is part of a part of the work that I managed called TREEZILLA.
But what we're going to talk about here are two projects cost for cloud, which is a European project that was that we had and and decides. I'm just going to take you through those very quickly and I can see how I spot contributes through these types of initiatives.
So cost per cloud was a European project that ran up to February 2022. Yes, 2022. Yeah, for, for for four years. And the main aim of it was to see how we could.
Map Citizen science into the European Open science cloud through citizen observatories like icepot, the project involved 9 what we call C OS, including icepotic, and see the names of the ones that were involved and that and they were involved in different areas, for example so. So I spot covered all biodiversity planning. It only dealt with plants, not just that I dealt with all and they were covering different regions and areas all across Europe.
As well as as, as, as as Colombia.
Kadaria, which dealt with air pollution so the main thing about it was that we were able to create a range of tools and resources, but not just create them. We're able to Co design them with our platform communities and we've got the ice bucket meeting involved in that and we're also able to integrate them into the platforms, demonstrate and test them. And so 13 different services were created. We're able to integrate 2 into ispot and we're quite happy to say that this.
Integrate AI and we still have use of the plat net API in I spot the fast track cloud one was a test in one which has sort of been removed from the platform now but it was quite successful in its time as well. But the key with this is that these are user centred they're innovative and that we were able to contribute to this process of developing these tools enabling citizen science to support things in this way across Europe.
Now the key thing about this also is that since then, the OU now has observer status.
European open Science Cloud Association, and that's quite an important achievement in in its time. So cost of cloud contributed to this and another European project that also had alliances with citizen science called escape.
So decide was a slightly different project focused in the UK, funded through UKRI, and we collaborated with a range of partners that you can see here.
And it was led by by Ch Centre for for about an entire jointly and and basically it was focused on targeting recording through adaptive sampling. So I spot again was one of the the the the the platforms used where people could actually contribute through their recording.
They contributed in the development of the tool 'cause. They were consulted through different processes and then they were able to actually go out and do actual sampling and test this tool. And that was a really amazing way to call Co develop this tool with the different communities including the I spot community. So what does all of this lead to? It? It all connects with it is the engagement. The engagement is really cool, important in how I spot operate. So we we do.
The tools and the features that we just mentioned.
We do this through the Community and particularly our expert community and the the the biological recording groups and societies that are part of us. They help us to not only make the identification the agreements, but keep the community running, but we also engage with them as well, but we also have wider UK face to face engagement. This was more under the Opal initiative, less so now, but we do a do a little bit of that.
And we do things within.
Leave without the universes and schools and so on. But we also do this quite importantly, within within the OU, and I'm going to talk about that shortly, but very quickly in terms of publications, we have a wide impact in terms of the, the types of the, the, the types of things that we're linked to. So we're we're a partner of the UK state of Nature report and that's really important. That comes out every couple of years and and and sort of looks at what is happening with, with, with, with nature across the UK.
As as been mentioned in various policy documents, as a tool to enable that engagement with nature as well.
And we are also doing different papers and so on, highlighting some of the things that Mike talked about and other areas as well. So what are we doing in teaching and learning? So as I said earlier, I spots main aim of course was to help people build their skills to learn and and through that we've had this, we've had lots of data coming out. But one of the key things we have been doing over the years as well is integrating into OU modules. So this is one example of a model that we've been involved in since 2014.
And in this.
295 biology of survival the students do 2 activities using ice butter biobits which is basically getting out there counting different different species at a set time in a particular day or whatever. And that's basically one activities they do and they also do a study of pollination using icepot and the the the the use of icepot has been commended in terms of what the students get out of it in terms of the benefit of this added bit to their learning and that's been established and we're now.
Discussions with the replacement for S295 in its current in currently in its last run at the moment, but the other thing which we managed to do a couple years ago was to develop a bespoke course on openlearn about citizen science and global biodiversity that features the use of isot and that actually has been developed at the Badged Open course and that's available for anybody to take.
So. So what do we do? We, we we we know that ICEPOT helps to be developed skills and one of the things that we we we're trying to look at is also.
Does it really help people learn? And we did a little bit of analysis some years ago that showed that for about just over 400 observers.
By the time they get to their to their to get to their 50th.
Observation. They're able to sort of self identify to identify and it shows a remarkable.
In thing where through a reputation and data and identification people are learning and developing their skills and that's been amazing to see that evolve over the years.
So the impact, what have we been able to do?
So as to say I thought I'll be 16 years old this year. It's been a lot to keep it going. Lots of changes, lots of adaptations, but it's still going lots of collaborations and over the years we've had over 3,000,000 user visits.
The key thing here is that is the amount of data that's being the information that's been created. So we have 1.8 million images, for example, over 43,000 species from over 180 countries. We have almost 900,000 observations, millions of identifications. And the key thing is the community. We have 83,000 / 83,000 registered users and and and what they do is they they they add identification then they and they and they and they contribute.
My spot is able to do.
Over the years, we've been able to engage with people over 150,000. As I mentioned, we have the schemes of the scientists who support and were used by thousands of other institutions as well as our own OU students and in different reports and and to have impact in different ways.
So almost to the wrap up now. So the thing about Ispot is that as we mentioned, is that connection with open the open research connection that also links with teaching, learning and engagement and this is how I spot has been has been able to evolve and continue over the years with some successes challenges, yes, but some successes as you can see what we put forward today.
So these are the core areas we think will actually.
We will want to continue developing on and collaborating with others on in the future. So for example, within the open research and data side is the development of new innovation. We're really keen to do more and and to look at more what we can do with the data, the data classification, the analysis and so on. We're also keenly trying to learning design side what more can we do to to build interactivity as a social network, what new tools and features can we add in what more can we contribute.
To teaching education in terms of resources and so on, we create and of course to have content having wider impact. You've seen, you've seen some of the exams of what we have done so far, but we do want to do more and continue in that space.
So it's an open call. It's nobody out there interested in working with us. Do get in touch. This is the science of artificial intelligence research group, where a group that that works across KMI and triple-A and we have a number of different initiatives and so on different projects that have been funded so far. But we're keen to develop more. So do get in touch and and and and and and.
Talk to us or raise a question now. Thank you.
Professor Christothea Herodotou, Institute of Educational Technology, OU.
This webinar shares ways technology can help scientists open up processes of doing research with the wider public, such as data collection and analysis. It details how a technological solution, nQuire, is helping scientists achieve this objective through example projects.
Christothea.Herodotou [She/Her] 0:03
Talk today is about the opening research processes to the public, and, as mentioned already, I'm going to talk.
More about The Enquirer platform.
A little bit of background because before we move into the tool.
So more than 25 years ago, there was the ambition of committing volunteers in research.
Not the subjects, but the mainly as active participants in the research process.
And the by this there was the tension of having people to decide on what research to do, commissioning research, interpreting findings and disseminating findings. Since then, we've seen various ways the general public can be involved in research. And actually we've seen those activities under the umbrella of the citizen science. So for those who are not familiar.
Very brief definition of citizen science.
Year is research that is undertaken by members of the public and most of the times these activities in collaboration with universities and academics, it can take different forms, can have different objectives. It can be simple or more complex, but the main characteristic is that you have ordinary people, people who are not researchers.
Involved in the research process.
In terms of how people can take part in citizen science.
You have seen you have seen over the years several taxonomies and I have in my next slide two of them just to give the give you a sense of how engage people can become with research activities. So on the left hand side is a taxonomy with four levels.
The first level is, let's say, a basic engagement with research, and here you may be asking people to, for example, place sensors in their gardens to capture pollution.
So they doesn't require any cognitive engagement, is more about supporting scientists and their activities. In level two, we have citizens as data collectors and data analysts, and actually most of the citizen science projects at the moment focus on level 2, Level 3 and level 4 are a bit more demanding in terms of people's engagement with research. So in Level 3, we have people.
Participating in defining a research problem and also collecting data.
And maybe the most advanced level of involvement is what people called Extreme citizen science. We call it citizen inquiry, and I'm going to tell more about it in my next slides. And so it here is 100% collaboration between scientists and the public, from defining the problem, devising research questions, deciding how to collect data, collecting and analysing data.
On the right hand side is another way of organising.
Let's say the different types of citizen science projects, and I would just draw your attention on the Co created type of projects and here you can see all of the boxes being black. This means that the public is involved in all the stages of the scientific process, while collaborative and contributory projects they involve people in a few of these steps and not all.
So.
What I'm going to explain here is the conceptual basis, let's say, of the inquire platform.
So a few years ago it we've published an edited collection on Citizen inquiry and Citizen inquiry. Is the concept to use to stress the importance of engaging the public, the public actively in the process of research and by actively we refer to all of the stages of the scientific process, from definition of a problem, to deciding how to collect the data, how to analyse and interpret how to disseminate it.
As we paid special attention here on.
Shifting the focus from scientists to members of the public, and we saw the public as being active in defining their own research agenda.
So you could see a problem starting with people based on their needs and requirements and then with them designing and developing the whole research process.
Recently, we revised the concept of the concept of citizen inquiry.
But that the term citizen may not be inclusive enough. So with one of the publications we had in 2021, we talk about community led Citizen inquiry and the reason was that the term citizen is a bit contested in the US so there are requirements for someone to be a citizen in the US also we wanted to put emphasis on collective action and on communities.
And also we wanted to encourage research that can bring solutions to problems that people are facing.
Within their communities, you may be wondering here. OK, how are we going to achieve participation of communities in citizen science projects? What do you have here? Is 3 streams of work that we think can actually enable participation of communities in research. The first one is technological is a technological solution. It's actually inquired. I'm going to talk more about it. The second one is a close collaboration with scientists. And this is because we acknowledge that the general public.
May not have the skills or knowledge to do original research. Therefore they need to work in collaboration with scientists to achieve the best outcomes for science and for themselves. And the third one we see this collaboration being beneficial not only for the scientist, but also for the communities and the people involved in research. So we try to create.
Research research activities that provide learning to participants and they can be seen as a life long learning experience.
And we do this with a number of features on enquire for example, we ask people to do things so they learn by doing and we give them personalised feedback depending on the type of study someone designs. We also open up some of the studies to the public so you can actually see what other people are saying and learn from learn about other perspectives and other opinions. And also we visualise data the moment is collected so you get feedback right after you.
In a study.
So the next part of my presentation is all about enquire for those who are not familiar, enquire is a website, an online website you can access it at enquire.org.uk.
It's a space where you can design a study. You can pilot your study. You can collect data, visualise your data and share findings. While the study runs or when the study has ended.
The reason now we designed enquire. The motivation was to help people with no scientific or research background to learn how research is done and how science is produced and do we achieve this by allowing them to take part in studies and learn through the process and also set up and manage their own studies.
If I zoom out of the focus on citizen science, I would say that the bigger purpose here is to help people start thinking critically and scientifically.
And we know how important critical thinking is in the era we live in, and possibly how AI and Gen AI developments are shaping our realities. So why I think we think and I think that engaging people in research could be a way to develop their critical thinking skills.
Now enquire it's got when you visit, enquire on the home page. You can see three codes for action build take part and learn. So you can build your study. Take part in studies of other and learn through the process. And there are details as to how you can design a study.
How you can test and improve it and get feedback and make your design better? How you can launch your study and collect data and how you can analyse and share your findings with your participants and also on the platform.
What I want to stress here is that all of the studies that are live on on enquire have gone through ethical approvals, either from the OUS body or or other ethics bodies that affiliate with the organisations that are setting up the studies.
So all of the activities are ethically done and treated strictly in terms of ethical standards, also in terms of data security, we the we follow the Open University protocols of GDPR and also the privacy and the electronic communications regulations of the European Commission.
What they have here is a number of screenshots is that showing away the process?
You go through in order to set up a study. We have a tool called the authoring tool and this is the tool that enables people to set up their own studies.
The first stage is called start and here you need to put the name of your mission. There is a question behind your study and describe the mission briefing in the mission briefing. We ask each study to describe the benefits to science and most importantly the benefits participants will have from taking part in the study.
So they need to be explicit as to what I will learn as a participant if I take part in US study, then the second step is the build. And here you are given a range of different types of answers you can use to design your study. So we have basic or let's say traditional type types of questions like multiple choice questions or yes or no questions, but you still have more innovative ways of collecting data I would say.
For example, you can ask people to upload an image as a response to a question, or you can ask people to collect data using the sensors of their devices like sound or light data and upload a graph of the recording to the platform. The latest development here is the qsort is somewhere in the middle. I don't know if you can see it, so this is an activity led by my colleague Anatoms.
That has been working with the Parliament for the last two years.
And there was a need in the Parliament to prioritise topics they would focus their committees on in the future. So we developed this tool, SMEs for people to sort out and prioritise topics as high priority and low priority. So there is a way for voting, let's say, through the platform and through prioritisation. The next step is the confirming consent. So there's always a consent people.
Need to read and abide to before they start collecting.
Getting their data and the consent form says that we need to be 16 years old and older to take part part in inquire.
Having said that, we have we had studies with younger people and the way these work is that to take part you need to do it with the Guardian or a teacher and also the consent form can can be customised based on the requirements of the ethics, ethics, body you are getting approvals from.
The last step and probably.
Is the one that goes to the scientists or the researchers behind and The Enquirer platform. So here you do the piloting, you improve your study and you are ready to launch it before it's launched, it goes through a process of approvals. No, I don't have a slide here. So it goes through a process of approvals. So the study comes to us, we go through all of the questions, all of the answers, all of the information shared. And do you check them in terms of?
Appropriate content the language language.
Appropriate design of questions. We give feedback to the researchers and when they do changes and they are ready, we make the mission live on the platform. I say mission because the studies on enquire we tend to call the missions.
Another question I often get is what type of missions are or studies are hosted on enquire. So I took a screenshot. Screenshot of the Explore page of enquire here.
And you can see that we have studies on literacy, on biodiversity, ageing, teaching activities, trees.
Antibiotic resistance or novels, actually any study can be hosted on enquire. And recently we created.
Themes and subjects so people can philtre out of existing studies and find the ones they want to look at. So the themes we currently have is missions on community, quality, learning, sustainability and well-being.
And in terms of subjects, we have education, environment, literature, medicine, psychology and technology, but we could expand those themes and subjects in the future if we have, you know, topics that do not fit under these categories.
The one other decision we need to do when you design your studies, whether you, the the study or the mission is going to be confidential or social and there is a big difference between these two types of missions. Confidential missions are more like the traditional research studies or surveys we do.
We do with humans, so all of the answers are confidential, confidential, cannot be read or accessed by anyone beyond the person who designs the study.
And I'll give you an example of that in the next slide. Social missions is quite innovative though, because whatever you post there becomes open to anyone who visits the platform to read it. Like it or comment on it. So in a way, you can see what Alice is saying about the topic of the study.
And also social missions have some visualisation features. I would say the first thing is that you can see the data on the map, so you can see where responses are coming from in terms of location. Also you can get the summary graphs of close questions. So for example if you have a multiple choice questions with five options you can see.
Which is the most popular because you can get the graph about the specific question and again I will give you an example of a mission to see what this looks like.
So the first example I've got here is the OU Pollinator watch. This was designed to during COVID by OU academics. It had nearly 8000 responses, and the idea here was to ask people to take pictures of pollinators in their gardens or local parks.
Through the process, we wanted people to learn more about pollinators and try to preserve them and support them with the actions they take.
And it was quite successful because it also encouraged people to get out of their homes and engage with nature while being in isolation.
So it was one of the successful measures on enquire in terms of number of contributions.
Another example is the Garden Watch mission. This happened a few years ago. It was the most successful one in terms of number of contributions. We reached more than 200,000 responses. It was a study in collaboration with the British Trust of our mythology.
And here the idea was to go out in your garden and become a detective and find the Mamas worms and birds that live in your garden and record them on enquire.
What was interesting was the findings of the study, because in a way it gives ideas of how people could encourage more living species in their backyards. So, for example, we found that only 40% of gardens, that 40% of gardens do not have a log pile.
And if they had one, they would encourage wildlife. So it's an action they could take to support biodiversity and some other things here, like leaving grass to grow or feeding hedgehogs. So you can see here how the findings can actually turn into action and improve biodiversity in this case.
Now an example of a social mission. This is called the noise map mission. The idea here was to capture the level of noise.
Where you work or in the classroom or school, and again, the moment you have this kind of data, you could use it to influence policy or take actions to improve the level of noise where you are. This is a social mission, so as you can see here, you could see the contributions on a map. So where the recordings of noise took place and on the right hand side you can see the actual recording uploaded by.
Different participants. In addition to this, I mentioned the data visualisation.
So these are coming from another study on mental health, which was social open and here you can see, for example, what the young people think about the different interventions in terms of supporting their mental health and which of them are more popular and which are less popular. And also you can see here the suggestions of people of how they youth could be supported at schools regarding their mental health.
So any type of close question, I would say it could.
Be associated with a graph that anyone can access while the study is is running on enquire.
Here I have an overview of the organisations and people who create the studies on enquire. We have mainly universities and academics interested in designing studies. Also we had some successful collaborations with non non academic organisations like the Mental Health Foundation, The Young Foundation or BBC and the Regional Meteorological Society.
And recently we got a grant on mental health and we work with the three nonprofit organisations in Northern Ireland to design to support mental health for young people and as part of the activities we're going to use, inquire to design some studies with youth in terms of individuals. So far we only had two cases. Two people, they happen to have a PhD that they wanted to design a study and they actually designed it and made it live on.
We would like to encourage more individuals and more organisations to use inquire so we can actually help those who do not have the skills to develop them through the platform.
One of our one of our our latest developments is a version of enquire for students. So this is part of the horizon funded project that started three years ago.
It is focused on teaching design thinking to students and for those who are familiar with design thinking.
Some phases of it relate to understanding the human needs. So when client we developed a version of inquire that could be used to teach students how to design a study. We called inquire for students. It has the basic functionality of enquire and also additional features. So for example it's password protected, it can be accessed only by teachers and students from one school. So if.
Students are coming from different schools. They cannot see the data of students and teachers.
Coming from other schools, they can only see their school's data and also there are as the as the basic enquiry. I would say they can design, manage and pilot studies and they can also collect data from other students in their school.
The teachers can review studies, give feedback and help students improve their designs through enquire. There's also a classroom management system here that helps teachers create student account and manage their classes, and we hope that to enquire with develop students research skills. Actually we are analysing data at the moment and we try in particular to see the impact of using enquire on critical.
Thinking and development of research skills.
They can't be using dependently or as part of a design thinking project.
So my last part, the last part of my talk today has to do with.
What people say about enquire after they've used it, so in 2022 we ran a study with 150 participants of enquire. The majority of them took part in one mission, one study only. We allocated the survey to them and we asked them to tell us what helps or inhibits the participation on enquire.
And by participation we mean.
Have been here creating their own studies or taking part in studies of others.
So what do we have found is that there are diverse motivations for people to take part in studies on, enquire. The main one is to contribute to research and science. Then there is this belief to help with science and that this is important. Some of them had interest in the topic of the study. They took part in some others they wanted to learn more about the topic.
And a few of them said that they would like to experience what is like to take part in a citizen science project.
Now when we ask them what they learn from taking part on enquire, they half of them. I would say they said that they they know. I know a little bit more about the topic of the mission and 8% they said that I know a lot more about it. So the learning benefits as described by participants is increased awareness about the topic of the mission.
One is a desire to learn more about the topic after taking part in a study, and you can see on the right hand side some actual quotes from the participants.
Some of them said that after taking part in the study, I changed some of my everyday habits. So this was about ambient sounds while working out.
In the forest, so it focuses more on the sounds of nature.
This made that participant feel more productive and calm, and the last one is about taking action to support biodiversity, which I would say maybe is the most difficult learning benefit to achieve.
If originally the objective is to achieve increased knowledge and skills, motivating people to take action, maybe it's the most difficult aspect of learning and we had those some participants saying that.
It led me to read up on helping pollinators and provide the habitat in my garden.
So you can see here that there was an impact, an actual impact following participation in this study about pollinators.
And this question asked them about whether they would create their own study in the future, and it was quite interesting because as you can see, 70 or even 80%, they said it's unlikely to create my own study.
They explained this.
Referencing factors like lack of time to design a study and analyse data.
That they don't have the knowledge and skills to do so, that they need support and they don't have it. They didn't know that this was possible. They didn't know that it was possible for them to design a study and maybe the most striking to me was that the creating a project is not for the general public. So there was this perception that those that who should be creating projects as scientists and not members of the public.
Because of the widespread perception that universities and academics academics are the ones.
The only ones that should be designing projects, so there was a question of why should I?
So the lessons we learned from enquire and following those studies is that there are free tools available that can support individuals and communities to design their own personally relevant studies.
There are learning benefits from taking part in studies, but the important factor here is to design the study in a way that promotes learning.
So you need to think what is the objective for science, but also what is the objective for the public to take part in the study.
The Third Point here is that we need all of us to raise awareness of the importance of creating, of helping people create their own studies. And of course there are issues here about improving decision making, addressing challenges in the society.
Developing ownership in any solutions we propose, so there are lots of benefits.
Of working with people to design studies or help them design their own studies.
In the last lesson I wanted to stress here is that there is a need for universities to open their doors to communities to facilitate citizen science activities. And the reason is that communities need support to design a study.
Or test study or complete a study. So the way of doing this is if there is a close collaboration between communities and universities.
And in terms of this point we interviewed last year, we published it two years ago, 14 academics from the OU that did they do are doing actually research with people. Some of them are focusing on citizen science specifically some others are just engaging people in research. And we asked them how we could possibly democratise our research practises through citizen science.
And they raised the need for a centre or a facility with those five factions that could actually help them work closely with communities and develop networks and collaborations that can empower people and communities to take part in research. So they talk here about the learning and networking function, research, function, facilitation, funding allocation and innovation in research. So they would like to see a centre.
Promoting or enabling this kind of factions, SMEs of enabling collaborations with communities in the form of the public being involved in all the stages of the scientific research.
So this was actually my last slide. I want to thank you for listening and of course I'm more than happy to answer any questions.
Professor Leon Wainwright, Editor in Chief, and Alice Sanger, Managing Editor of The Open Arts Journal.
This webinar details Leon and Alice’s experiences of producing a peer-reviewed, open access journal.
Alice.Sanger 0:03
Good, good morning. Thank you so much for joining us to hear about the Open Arts journal.
Leon and I have been working on the Journal for a good number of years since 2011.
We're both as Neil Armstrong from the art history department. My PhD is in art history. I'm an associate lecturer and honorary associate in the department.
And managing editor of the journal, as Newland said, Leon.
The editor in chief and if I may say, the visionary.
Of the journal.
And we both feel very privileged to be invited to participate in Open Research Week. What a brilliant initiative. And Many thanks to research leads Claire Taylor and Shafter. He's been nominating us and to Neelam and team.
I should say at the beginning that the Journal has benefited from the support of FAS and other funders, and Leon will say more about that in due course.
I want to begin by referring to what we feel is the importance of the open arts journal in terms of its role in disseminating.
New and ground breaking research in the arts and what our ethos of openness means. So first and foremost.
We're committed to providing genuinely Open Access content. Leon has provided the link for you to the journal.
We're open to a very wide range of content we publish. I mean, in the arts, we publish contributions from academics, artists, architects, museum professionals and others.
And as such, we challenge the tendency to over specialisation that can compromise arts and humanities.
Some of our issues do focus and very constructively and productively. So on a specific time and place.
But we're open to exploring creative and critical questions that stand over and above a particular historical period of interest, and we're seeking ways to press beyond this, engaging more widely and benefiting from approaches other than the historical. Moreover, we're very proud that working on the journal, working with colleagues both within the OU and outside the OU.
Continues to be for us, extremely enjoyable.
And inclusive over to you, Leon.
Leon.Wainwright 2:44
Thank you, Alice. Yeah, continues to be constructive and and enjoyable too. And the to give a bit of background on sort of how that's done.
I'll mention that our business case is really to focus on the sustainability of Open Access.
Publishing and to maintain Open Access approaches in sort of every way. Well, say a bit more about what what that means for us.
When we started out with the journal.
You are quite aware of the fact that many of the dominant platforms for academic publishing.
That are free for their readers operate by drawing fees from their contributors and that seem to us an arrangement that sort of frustrated, if not subverted, the opportunities presented by digital distribution of academic scholarship for free and Open Access and.
The use of such published material as it were on demand. So what we've done with the journal is we have avoided, we continue to avoid those arrangements and we strive to retain. Most importantly, we've strive to retain academic control of the journal academic ownership of it. Now there's a there's a an upside and there's a downside to that so downside.
But obviously is greater intensity of our roles.
Umm. In the journal, Umm Alice and I coordinated team of journal staff from guest editors. We regularly undertook business planning. We assign employment contracts, we deal with the long term academic planning. We maintain an editorial board. We deal with the legal safeguarding of publishing. So all the sorts of things that.
Most of those things probably are asked with with a publisher, but they're in our hands in this space.
And upside though, is the way that we can do all of that.
Reflect our passion for developing research and in general for the Open Access publishing agenda.
I believe through the journal we show how Open Access.
Can serve arts and humanities research, so it's been around.
Including across academic disciplines but also stakeholder groups, which is what Ellis had mentioned and in many reaching also beyond the bio education sector.
Funding for the journal. I've put some of that recent funding notes on that on the slide towards the bottom. See if funding paying from the Open University.
All those years ago, 1314 years ago and then the Journal served as a vehicle for generating income to sort of sustain its costs, to sustain the staffing costs of this activity.
Funding in the past five years I've listed there on the slide.
Come from a range of sorceries, some of it from, umm, the stand faculty in of the Open University, from the designing group.
For issue 9, the Baron Teaching Centre impossible studies at the university and also externally from the HRC, from the arts in the 90S Research Council for issue 13 and some of the journal articles have been repurposed.
As mentioned in curriculum production, so we've seen material.
Having a dual partners in our teaching mission at the OU and we know that.
Certain articles are used for teaching in other universities, and we've also seen.
The material being published in hard copy, so you've seen.
Through a partnership with Manchester University, perhaps, and with funding from the leading human trust.
The publication of two Umm Books, 2 Stand Alone and books.
So what this has done?
Is it has produced.
4.
Material that's high quality. That's. And as I was said, peer reviewed and obviously freely available and I've listed we've listed on this slide the issues that we've covered the the scope. Obviously you know a wide scope of thematic scope.
Scope for new research.
That we've disseminated with the journal.
We find contributors and found this really appealing. I'll also.
Have.
Enjoyed really the being able to stretch the legs with this. There are things we can do with Open Access publishing, but you probably couldn't do.
With conventional publishing and those have to do.
In part with our use of images.
And the generous way in which each of these this viewers have been illustrated, the way the company in matures. I, Alice, I think, wanted to make a brief comment on on that.
Before we move to the next slide, yeah.
Alice.Sanger 8:31
Yeah.
Yeah. So I just wanted to say that often our contributors are very surprised that we don't set a maximum number of images per SA per issue because the scope is you know, we're open to having many images, as many as as authors need.
And flexibility, bearing in mind we're working with a PD FA4, there's flexibility to be able to scale within that. One of our issues has an image gallery. We publish visual essays as well.
And some occasions where that's relevant and images aren't simply illustrative, we use them as evidence for developing arguments. The other last point is that.
We do require authors to cover the costs, any costs related to images, copyright and so on, but they do then have access to the image rights expertise at the OU. Thanks, Liam.
Leon.Wainwright 9:33
Thank you, Allison. We can take up any discussion about images of it's about a big topic if anyone's interested about that.
Some.
What we'd like to do, just for the rest of the permutation, is just focus in on two of the themes views on this. First of all, issue #5. This was a film this sue, that resulted from a collaboration.
That we've had with colleagues at Lydon University in the Netherlands and with the Trooper Museum, the Royal Tropical Institute.
You see him and.
An arts charity, the international at the Institute of International Visual Arts, or Innova in London. And what we did is we events at the Tropan Museum in Amsterdam and an event in London at Innova with artists and curators from all over the English and Dutch speaking Caribbean, as well as the global diasporas for those communities.
And umm uh, essentially, we managed to bring together umm some very good material from artists, curators uh, policy makers, scholars and so on for for this issue and it resulted in fact in a book with Manchester University class.
But also.
Lent well, the project behind the issue gave rise to.
Quite sort of similarly conceived events.
And collaborations in the Caribbean itself, which was a nice kind of outcome.
And it lends support to, I would say, marginalised and excluded individuals in the arts by producing materials that, if they could go on to use in their studies or in their in their future careers.
The second coming back to Alice about about the Sue letter of the Journal from I.
Alice.Sanger 11:43
Yeah. So this one is our current most recent issue from last summer.
Dwelling on the everyday houses, ghosts, ellipses. It's a collaboration with Helen Hills.
Professor Emerita at the University of York and it has its origins in two workshops which were funded in part by FAZ from July 2022. And that's often the case with our issues that they originate in a conference panel or or a symposium or a workshop. And the core idea about for this one was about the artist's house and the earlier of visiting.
An artist or writer's home and its presentation as an as a house museum. So Helen and I share an interest.
Shrines and sacred relics in early modernity, and that gave us a way into thinking about the aura of the artist's house. The way that possessions of celebrated people are fetishized.
And become kind of secular relics, and we wanted to challenge the great genius narratives conventionally portrayed in artists, houses, that kind of collapse house and work, and artists into one.
So by dwelling on the everyday, we were able to cast the net.
Quite wide and include studies of other homes of non artists of people who are not famous of people who are marginalised or invisibleised. And so we're interested in the politics of erasure that people overlooked as well as asking questions about social class and gender and sexuality and race, as well as this theme of sacred soccerality matter and materiality.
So in the end we had we have 10 essays and quite a bulky introduction. And as I say, very diverse content.
And as I say, that's the model that works for us. This themed issue that tested and proven through a conference guest editor submits an application which is goes to the board and then the process.
And in this case, I was one of the guest editors, but the process then supported by.
The editors, consultant editors and so on that note we welcome very much proposals for themed issues from perspective guest editors.
Leon.
Leon.Wainwright 14:30
Thank you Alice for that. Yes, I'm, I mean we I think what attracts.
Contributors to the journal as well as the visibility of the journal and the.
The the lengths that we go to to maintain.
Well, of social media presence, but.
More particularly, to ensure that I sort of online search ability and visibility of the content.
And and as well as that, I think the way that we have tried to set an example for collaborative and excessively published research and to maintain relationships with our contributors, so which we can do, you know to the scale.
Of this journal to have those lasting relationships with authors and contributors means also that the journal content circulates and it gets recommended. Its content is recommended and it's cited.
This is really the last slide. Yeah, of course it is the last slide.
A. A summary really in response to that question about how Open Access publishing.
Can support and expand.
Public facing research in the arts and humanities, I mean, for us, I think above all, Open Access is a means of way of community building and that community.
What to do as wide as possible, as widely ranging in its interests? Umm, the content we produce, umm, is by significant and quality in originality. When we go the keywords I think for the open arts journal that it's cost cutting, it's interdisciplinary, it's international and it offers a really excellent fit for the optimum mantis the entire.
Sort of medium and the way that we go about doing that.
As such, should say the open Arts Journal continues to flourish. It's growing and flourishing.
And we're at the moment rather involved in production of two new issues, future issues which happen to be led by open new university researchers and a wide range of disciplines that include music, English and art history.
The slides are there and.
Dominique Walker and Paul Clarke, Scottish Universities Press, and Dr Richard Marsden, School of Arts and Humanities, OU.
Scottish Universities Press (SUP) is a fully open access, not-for-profit press managed by 19 academic libraries, including the Open University. This webinar details the open access book publishing landscape and provides some background to SUP and the relationship with the OU. It also covers the editorial workflow and process of publishing an open access book, and concludes with the perspective of an academic who is due to publish with SUP. The talk is aimed at anyone interested in open access book publishing.
Dominique Walker (Staff) 0:03
The slides all OK on the screen. Still great. So hi everybody, I'm Dominique and I'm the publishing officer for Scottish universities. Press or Sup, as I'll refer to it in the presentation. So today in the session, I will start off by providing some background to Sup the reasons for setting up an Open Access press and a bit about the Open Access books publishing landscape more generally. Then I'll hand over to Paul, our commissioning editor who will go through the process of publishing an Open Access book, our Future plans.
Upcoming books and then lastly, Doctor Richard Marsden is going to provide his perspective of an author publishing with us and his kind of personal thoughts on Open Access.
So we are, as you can see on the slide, we're a library led press and you can see here the 19 institutions that were involved, SU, PS being developed through the Scottish Confederation of University and Research Libraries or Skull. And that is a membership body that's been supporting collaborative initiatives across Scotland's academic libraries for over 30 years.
And the Open University is also a member of Skull. So that's why the OU has been involved in setting up the press from the start. We've got a really wide variety of different institutions involved in setting up the press. They were ranged from very large research, intensive to smaller, more teaching focused or specialist institutions. However, despite all of these differences, we've made sure that each member, regardless of their size, has got a completely equal voice in all of the decision making around the press.
So a kind of sense of shared ownership across our institutions.
Is really key to our development. So why did a group of libraries decide to start their impress? Academic libraries have been very heavily involved in the transition to Open Access for research over the years. For example, we manage our institutional repositories and we're very involved in the implementation of different institutional and funder Open Access policies, and we are often supporting our researchers to publish their research Open Access.
And to date, this has mainly focused on journals and articles. However, one of the key drivers for starting the press.
The move by funders to also require Open Access for books, say, for example, the new UKRI Open Access policy for books came into force in January last year. And Ref will also require Open Access for books from 2029 onwards. So with the the kind of advent of Open Access institutions and funders and libraries are now not only paying to access the research.
But we're also covering the costs of publishing Open Access through things like article processing charges. But processing charges, transformative agreements, for example.
And we've seen huge price increases from publishers alongside this scale member libraries, we spend around £30 million a year on access to E resources for our staff and students and the upward trajectory of the costs and the kind of lack of clarity around the price increases from publishers was a really strong prompt for some collective action from scarle. Our view was that digital delivery had ruddedly, you know, changed too much and.
Bringing savings and efficiencies in many areas of work.
However, publishing seems to be on a different track and we were really curious to understand that to probe the true cost of publishing and then ultimately find out whether we could reduce those costs by bringing it in House. We also wondered whether a kind of in House solution would benefit from the spirit of legality. So having authors and publishers working in partnership, since we're all working for and within HT is ebooks are a particular issue for us with incredibly high costs for libraries?
Often, publishers impose very strict limits on how many users can access a title at the same time.
And they're usually very locked down, so only a small portion can be printed or copied and interesting. Interestingly, worldwide revenue of academic publishers stands at over $19 billion a year, and 50% of that is attributed to just five companies. And you may have seen that Springer nature just announced a €500 million profit for 2024. So when you consider how much kind of public money is going through the system and how much work by our researchers is sustaining the system, it makes you think that things could be done a little bit differently.
Say some questions started to emerge. Can publishing be done differently? Can we better support our academics to find a clearer and more cost effective way to publish Open Access? They we decided to find out before I look in more detail at SCP itself, it's important to look at some of the reasons for publishing Open Access books in general. So in the last slide I mentioned that kind of compliance with funder policies such as UKRI being a kind of specific driver.
And these policies are helping ensure public access to publicly funded research.
However, the reasons for publishing Open Access do go well beyond compliance, and I've taken some of these points from the Open Access Books tool kit, and that is an incredibly useful resource that is aimed at researchers. And the link is in the slide. I really encourage you to have a look, so publishing Open Access means your book or chapter can be read, downloaded, reviewed, shared, re, used, and cited at no cost without any barriers for readers, it's immediately available to everybody wherever they are in the world.
There are no limits to printing and copying like you get with some ebooks.
It's a much better experience for readers and one of the biggest benefits is reach. Your work is available to a much wider and more diverse audience. For example, it can be accessed by those outside of academia, practitioners, policy makers, those in further education industry developing countries, and this links into your research having a much bigger impact. It reaches those who need it and can reuse it. Open Access creates more possibilities for readers to engage with and improve upon. Your research as well.
For example, chapters from Open Access books can easily be extracted.
For using course packs, you can also retain the copyright of your work when you publish Open Access, and that gives you much greater control. Sup. We use Creative Commons licences and they describe how others can share, reuse and build upon your research, but it ensures that credit is still given to you as the creator. And I'm sure Rich is going to talk about this later on as well about the kind of personal benefits of higher usage and higher citation rates. But I do recommend having a look at that Open Access books toolkit if you're interested in the wider landscape.
And SCP are not alone in trying to bring publishing back into the institution. There is a new wave of institutional publishers who are leading the way in offering alternative publishing venues to authors and working together to help drive change in scholarly communications. And this sector is growing at such a pace that in 2023, a group of these presses came together to form the open Institutional Publishing Association, or UPA. That's a new community of practise who are looking at sharing experiences, training opportunities, advoc.
Promotion for all of the Open Access publishing that's done at institutions and this includes new university presses such as UCL press and White Race University Press, as well as publishing and hosting services run by libraries to individual journals that are run by researchers and despite having different publishing models, we are all united by our shared values of openness, community, collaboration, engagement and support. And of course, we're all not-for-profit.
Weepers Co chair recently published an article in the Times Higher Education talking about why.
And institutional publishing matters, and she said that in championing principles and ethics in publishing over revenue generation and profits. So in putting values above value, we look to build a fair, equitable, accessible, inclusive and diverse publishing ecosystem.
So.
There's a chat about, so I just saw something in the chat there about technical issues that distracted me. Sorry. So now to look specifically at Supi already mentioned that the idea for the press began with an identification of a shared challenge across Scottish institutions. So how to provide that clearer, more cost effective route for our researchers to make their work Open Access? We're really keen to explore alternative approaches to academic publishing with the needs of the community at the centre.
And we're constantly asking ourselves what we can do differently.
Ultimately, we'd like to contribute to the wider global efforts to create a fairer and more equitable academic publishing model. And lastly, we wanted to create a shared solution, so pooling our resources and working together rather than having each institution start up their own press. It also allows smaller institutions who do not have the resources to start the rain publishing initiative to become involved too.
So way back in 2019, Skirl commissioned some research to test out the proof of concept for a collaborative universities press. The report was really favourable towards the prospect and discussions began on how we could take forward the findings safely were at 2020, a partnership between the different institutions was formed. Then in 2021 the project plan was developed and the Management Board board of the Press was formed. 20/22 was the year when things really got going.
We formed the editorial board. We developed our work flows and our financial model and all of our technical infrastructure.
Which allowed us to launch our first call for book proposals in early 2023. The rest of 2023 then focused on our editorial processes, working on our first books, as well as developing our future plans for scaling up, which we'll discuss later on.
So the remit from the start has been really clear and focused. We wanted to start with one type of publication focus on the high quality of the research and have completely open dissemination. The scope of publishing is going to be. So it does initially focus on monographs and edited collections on any subject by academics from UKHIS. We do cover all subject areas that receive submissions. We're conscious that arts and humanities and social sciences are likely to be the most involved in publishing books.
And this has proved to be the case so far with our proposals, all objects are published as Open Access ebooks, with Creative Commons licences on the Sup platform and print copies of books are available to purchase at a reasonable price too. Because we understand that having a print option is still be important for books, authors still earn royalties not on any print copy sold.
And it was really important for us to prove to our academics that we take research integrity seriously and that we publish high quality content, so all of our proposals and manuscripts go through a rigorous peer review process that's managed by Paul and our editorial board. And Paul talk about this in a moment. We also provide a full publishing service to our academics, including everything from copy editing and typesetting through to marketing and discoverability reporting on usage.
And also securely archiving the books to ensure that they'll always be available.
So here you can see a bit more about the structure of the press. So Sup is a skull project. Then we have our management board with a representative from each member library. They meet quarterly and make decisions on the overall direction of the press. And Richard Nurse from the Open University Library is the OU Rep for the management board.
We're also working with some existing school working groups. The copyright and Legal Group, for example, are helping us develop some author guidance for the use of third party materials such as images in Open Access books.
As we know, this can be an area of confusion and it's these types of in kind contribution that are really central to the collaborative model. So being kind of library led as a clear alternative to the profit making publishing models.
Then we have our editorial board, which is formed of 14 academic colleagues from across our Member institutions. They also meet quarterly and focus on the review and quality of our content. And lastly, we have our peer review network of academics who are interested in conducting reviews for our content. And again, Paul will talk about them in a in a minute.
I'll look quickly at our progress to date. The first significant milestone for us was the recruitment of our editorial board. They quickly developed our peer review process. We then put out our first of all call for content in early 2023 and as a brand new press with kind of without an existing reputation. We were really not sure what to expect. So we were very happy to receive several. But proposals from this initial call that first rancher proposals then progressed to kind of the peer review workflows and our first books were accepted for publication.
In May 2023, but also now established as a Community interest company, and that embeds the not-for-profit status in our operating model going forwards, that also allowed us to issue out our first book contracts and our first manuscripts. They were delivered in January 2024, moved into the production marketing workflows and we published our first book in October. Then after initially any accepting book proposals from our Member institutions.
We extended to accept proposals from researchers at any UK university from November 2024. Then from April 25, we'll also open a call for textbook proposals.
So we're really keen to better understand the costs associated with publishing and to scope out the potential for savings associated with collaboration. And we're really keen to be transparent with our costs too. Sup works on a subsidised model, so member libraries are covering the fixed costs via a subscription. And I also mentioned we've got lots of in kind contributions from across our network. But in order to cover the title specific publishing costs like the copy editing and typesetting, we have a small production charge and.
To split the cost into 3 bands. As you can see on the slide ranging from 3 1/2 up to five and a half £1000 including that for our member libraries, that includes the Open University and then 5 1/2 up to seven and a half 1000 lbs. Increases inclusive of that for the rest of the UK and that price difference is due to members paying subscriptions and all their in kind contributions. So due to the subsidised model we are able to offer much lower.
Charges than commercial publishers do, whose book processing charges range from 8:00 to well over £15,000. A book before that we've even heard of a £33,000 fee from one publisher.
But we'll say well under the 10,000 LB. Maximum funding level set by UKRI for books that they're funding, we do recommend checking in with the team at your institution who deals with Open Access. For more information on local funding options. So I'll hand over to Paul now. He's going to cover our books and publishing process and our future plans.
Scottish Universities Press Editorial 14:23
Thanks very much, Dominique. Yes. So this is our first book. This is conversations with Tim Ingold, anthropology education and life, which we published during Open Access Week in October. And it's available to download for free now. So yeah, it's really accessible introduction to the work of a major leading anthropologist presented as a series of very lively interviews conducted by three anthropologists at the University of Glasgow over a period of two years.
Explores his key contributions to our anthropology and other disciplines as well. So far it's been downloaded over, downloaded over 1000 times, and print copies are selling very well, so we're still seeing a healthy demand for the the print books alongside the Open Access ebooks.
Our launch plans were Co ordinated across Sup members through in person events and displays across participating libraries, including a stall with the authors at the University of Glasgow.
And that's a great way for us to present our books to a wide range of potential readers. And it's a real selling point for the press harnessing the power of our network. So the launch webinar was also a success around 230 registered to attend that. So this was an informal fireside chat and format that will form part of our ongoing offer to our authors for future book launches as well.
And a follow up Q&A blog post is available on our website too. Again, something that will aim to do for each book.
So our second book is called Digital Editing and Publishing in the 21st century. It's edited by James O'Sullivan from UCC and one of his co-authors is from the University of Glasgow. Again, so say a 20 chapter edited collection. Looking at the current state and future of digital editions. It's due to be published on the 29th of April.
And we're planning to launch Web and Arthur early May and we have several contributors lined up to take part.
So if you're interested in that, do please keep an eye on our on our website and our our social media channels for further further details and our third book is the vowel stability by Richard Irvin from the University of Saint Andrews. That's our second anthropology title. The author spent many years living in a Benedictine monastery in the book engages with the everyday dynamics of life in a close community.
And it publishes on the 20th of May and we'll have a launch webinar for that one on the 22nd May.
So again, details should be available for that soon.
And now I'll move on to the the actual editorial process. Looking at how researchers can submit a proposal and then what happens to it. So as a new University Press, we wanted to ensure that we had an editorial process that compared with other academic publishers that ensure the same high level of quality.
This slide provides a general overview of the different stages, starting with the submission of the proposal, and we have a minimum of two single blind peer reviews.
And then ultimately, the project is approved for contract by the editorial board at quarterly meetings. So we aim for this, the whole process to take 12 to 14 weeks. Obviously this timeline depends on a number of factors and it can take longer depending on reviewer availability, what the peer reviews reports actually say and if any significant revisions are required to the proposal as well as the timing of the editorial board meetings themselves.
And the next couple of slides have a little bit more detail about the process on them. So authors can download the book proposal form from the Sup website. It's a Word document and hopefully quite straightforward. And once that's complete, that's then emailed to me as the commissioning editor. And when I have the form, I'll conduct a very brief.
Eligibility check, as Dominique mentioned previously, were we were only open to submission to submission.
Skull Heis, which included the Open University, and we now welcome submissions from anybody affiliated with the UK Institution.
I'll also check that the proposals complete, and I might have some feedback and work with the author to improve the proposal before sending it out to reviewers, and obviously it's never too early really to get in touch with a commissioning editor and I'm happy to work on proposals with people.
In draught form and and work towards that.
Version of the proposal form.
So once everything is in place, the proposal then passes for external peer review by two reviewers and we aim to complete that in around four to six weeks. Once we have both of those reports back from the reviewers, we send them on to the author, who's then asked to prepare a response to the reviewer's comments and if necessary, they might provide a revised proposal.
Everything then goes to the editorial board members for their input.
Next editorial board meeting. They can either accept the book based on the reports and the author's response, or they can request a further review or further revisions. And there is a possibility that they may reject the proposal at that stage.
Rather unlikely, though we require a majority of eight members to proceed and for accepted proposals, we then move on to the contract stage. We agree on all the important details such as the the number of words and images.
And the delivery date for the complete manuscript before we issue the contract to the author for sign in. So once the full manuscript comes in, which might be months or even years after the contract is signed, we send the whole manuscript out for another round of peer review. It's usually sent to one of the proposal reviewers and we ask for feedback within eight weeks. And the main aim with this clearance read is to make sure that the manuscript has delivered what was promised in the proposal.
The office then given some time. Sorry. Excuse me. Then give me some time to make some final changes to the manuscript. Based on this feedback and if significant changes are required, we we might send the manuscript out for again for another round of peer review. But ideally we will just agree a date for the final delivery. And once we have the final version of the manuscript along with the image files and the supporting documentation.
Then the book goes into production.
I just want to mention here about the Sup peer reviewer network, so academics and researchers can sign up to the to join on the Sup website. We then add their details to a list of potential reviewers for proposals and we use this list to source any suitable reviews for proposals that come in. We now have 132 academics signed up to this, so it's quite it's a great resource that we have a great pool to draw from.
And also helps encourage academics to get involved with the press. So if you are interested, please do give that a look.
We're really keen to provide a supportive atmosphere for our authors, so in our review guidelines we ask for professional and respectful communication in the review reports and they should provide constructive criticisms for authors that can help to improve their study. So ethics are really important to us, so we follow the Committee on Publication Publication Ethics.
Guidelines which includes specific guidelines for reviewers and we try to ensure impartiality in our reviews. Our guidance asks them to declare any potential conflicts of interest that we might not already be aware of. So yes, and we also we offer an honorarium to our reviewers. We pay them 75 lbs for proposals and 150 lbs for manuscript reviews.
So it's difficult to reflect the true cost and value of providing a peer review.
But we wanted to make sure that we were doing what we could to recognise the value of of our reviewers and the time that they give up to help us.
So once the authors provided the final manuscript and it's been checked and we we then hand the manuscript over into production and there's another important point to us for us in putting together the SCP process was to ensure that our books are produced to a professional standard. So the manuscript is copy edited, typesets and proofread by our production partners, and it goes back to the author.
At various stages for checking and then indexing.
So we produce print copies in both paperback and hardback, and ebooks in PDF, and also epub.
And this entire process can take around four to five months, depending on the level of work required and the book is then published. Open Access on the Sup platform and print copies can be purchased from the usual vendors, including Waterstones and Amazon.
So really important part of the publication process is marketing the book and most importantly for for an OA press is making sure that the title is easily discoverable online. So we provide authors with a marketing questionnaire which helps us build a tailored marketing plan that will help us get the book out to the right audience and ensures that the research is having the impact that the authors looking for. So we ask you who needs to read the book, can it go beyond the acad?
Market could it be of use as Dominic mentioned earlier to practitioners, policy makers, industry figures who wouldn't usually be able to access the research. So one of the benefits of publishing OA.
And this this slide has some examples of our marketing activity. So the key difference that we have to most of the publishers is that we're part of 19 different institutions. So we can use our network of member libraries and their their own university comms channels to promote our books.
We're always looking at ways for all of our Members to promote Sup titles regardless of the authors affiliation.
O once ublished, we have a variety of marketing materials available as well, and we'll also assist in arranging a launch event with the author.
Academic titles. As I mentioned, they also need to be easily discoverable and accessible. So we're working with the University of Saint Andrew's cataloguing team to create high quality library catalogue records and our books will be available on various platforms. We have listed here, such as JSTOR, project views and Google Books. And we manage that through our partnership with the TOT Open metadata programme.
We'll also.
With the author to make sure that we're on all the right platforms for their particular discipline.
So look into the future. We mentioned that our second book is publishing in April and our third in May. We have 11 further books in the pipeline up to 2029. So our forthcoming books cover a variety of subjects including ethnography, archaeology, video game design, and the history of sport. So we're also looking at setting up our our first book series as well.
And from inception, we've aimed to incrementally develop our publishing offer to include the main types of academic publishing publications.
So this started with monographs, then edited collections, handbooks, and the next phase of development for Sup will be to publish textbooks. There will be an open call for textbook proposals soon, and the next step after that will be for us to start publishing Open Access journals.
And ultimately we want to maximise the benefits of the model model that we've created to explore the freedoms of our not-for-profit status to surface new voices, work with ecrs and hopefully publish some more experimental content. So we're always exploring different ways to do that.
And I mentioned 11 more forthcoming books, and one of those is Richard's book on academic conduct and integrity, which is due to publish next year.
O I'll now *** you over to Richard and he can tell you about his experience so far of publishing with Sup.
Richard.Marsden 27:27
Fab, thank you very much Paul and Dominique. Yeah, so I'll be, I'll be brief because I'll leave some time for questions. So the the, the, the book that I'm publishing with Sup, excuse me, I've got a frog in my throat is as it says on screen, academic conduct and integrity research and practise in higher education and the it's actually an edited collection. So what it's got is I think the next slide talks about what it's got. So I'll wait a moment before I share that. But the reason for doing this is.
That obviously.
There's been a rise in cheating websites, and then, more recently, the rise in AI means that you know, academic conduct, academic integrity around assessment and research in HE for students is kind of a more challenging kind of space than it has been for a long time.
Am I able to move these slides? I don't know who's got control next slide please.
Scottish Universities Press Editorial 28:22
You you should be able to take control, but I can move you along.
Richard.Marsden 28:26
Oh yeah, I don't. It's not letting me. It doesn't like me poor. So it's an editor collection and the names of my fellow editors were on the previous slide. But what it is, it's kind of a it's a book or two halves. So the first half is a series of six chapters on various themes in relation to the kind of overarching topic of the book. And the second part is about 25 practise based case studies.
From right across the sector, right across Europe.
And beyond. So it's a kind of unusual format and I'll, I'll the reason I'm mentioning that is because it's one that a lot of presses would have perhaps kind of you know baulked at or found problematic next slide please Paul.
So the book was commissioned in autumn 2024. I'll just share these dates to show our quicker discs, because what I'm going round to do is how great it is to work with Scottish University's breth book itself is kind of irrelevant to that. It's the kind of, you know, it's the delivery mechanism for me to wax lyrical about my colleagues here. But so it was commissioned in autumn. We came up with the idea in spring 2024 and we went for proposals process.
And it was the proposal was peer reviewed as Paul as described Commission in order 2024, scheduled for publication in spring 2026 and the the experience of working with Scottish universities press I think has been has been really positive, much more so than I think the experiences that I and my fellow editors have had with a lot of traditional presses so.
We got kind of multiple offers. We got multiple case studies, OK, that in my experience that can be as I mentioned.
A challenge to negotiate bureaucratically with other presses? Absolutely not. The case of Scottish universities press very flexible, very supportive. They've never done a book with case studies before. They created a process to cope with that just for us, and they did it very, very quickly.
This flexibility around the illustrations we're not being asked to pin things down in their entirety at the outset.
The costs are, you know, very reasonable as Dominique has, has described contractual document documentation, very easy to deal with compared to some experiences that I've had and the timescales, it's all very quick. It all clips along at a really nice pace. And again, that's a real that's a real change from a lot of presses. So you know the, the, the bureaucratic premium is very low.
Oh, and they're kind of supportive. The support you get is very high. Next slide please, Paul.
So you know the reason we wanted to publish this particular work Open Access is because it seems very important for, you know, people working in HE for teachers and academic related colleagues. And you know, HE leaders to be able to access this kind of research, which is part cutting edge reflections on what's happening right now at a moment of real change.
You know, for academic integrity in higher education and excuse me.
And then secondly, it's those case studies about what people are actually doing in response to that. That moment of change. So. So we thought it was really important to be able to share that as widely as possible. I mean, it's worth noting that the, I think it's published by Routledge, but the Handbook of Academic Integrity costs £400. Now, that's a very comprehensive work. We're not trying to rival that, but it's just indicative of.
The kind of prohibitive nature of some of these costs in terms of, you know, being able to colleagues being across the sector, being able to access these kind of works.
So just to finish my reflections on Open Access as a whole, I mean obviously it's an absolute good, isn't it, you know, bringing in widening the access to research and scholarship for teaching and learning. In this case, it's a boost in terms of equity and inclusion, all those kind of good apple pie things that we all stand for. It's also good for the individual because, you know, I mean I've I've the the my first book that I published.
I think it cost when I was back in 2014, I think it cost about £100 and do you know what hardly anybody bought that? It was quite boring as well. Let's be fair. But I maintain that mainly it was the price that put people off even though I chose a lovely yellow cover for it. So if you're publishing Open Access, you know more people are going to read your stuff, which you want on a personal level. Because otherwise why did you bother writing it? But it's also good in terms of being able to demonstrate impact, generate citations.
F stuff that you know, that KPI stuff that a lot of academics get kind of assessed on, and particularly for scholarship, for teaching and learning. You know, you kind of it's a you're doing service to your colleagues across the sector and I think that's an important aspect. Flip side of that of course is you know, Open Access publishing is.
Just come forward a long, long way. It's part of the way there, but there are still challenges and I kind of raised these because I think we need to acknowledge them, that there's more to do and we need to keep pushing on it.
So the worry is that you know, although the costs are low with Scottish University's press, you know they still exist and somebody's got to stump up for them. Why is the individual or the institution? And if it's the institution, you know, there's a finite pool of money available. Where are they going to put it? Are they going to roll the dice on something that's a little bit more speculative? Could be amazing, could not be. Are they going to just back the same old, you know, the same old?
Favourites the same old, the same old horses. So there's a what that raises is implications for kind of diversity.
Representation, equity and potentially for quality because you know people who have or teams who have produced good quality research in the past doesn't mean they necessarily will continue to do that in a way beyond, you know, people who are coming up. So I've you know, there's a kind of generational challenge I think that Open Access publishing potentially raises that we need to think about. My main message is that Paul and Dominique are super lovely. It's been a really painless process.
So I you know, if you're in a position to work with them, I very much encourage you to do so, Finney.
Dominique Walker (Staff) 35:09
Thank you. Thank you. Rich, that was very lovely. Got tear to my eye.
So lastly on this side, just some of the ways that you can get in touch with us and find out a bit more our main e-mail address is on there, but also Paul's editorial e-mail address can be used for any proposal enquiries or if you just want to chat about the book.
Our website is a great place to start too. It's got all the proposal form our author guidance, but also our FAQs and our news posts. And this is where you can sign up to our mailing list too. So I recommend you sign up to the mailing list to keep in up to date with what we're up to, and then our social media channels there too. If you want to to give us a follow. So yeah that that's it. Thanks very much and happy to take some questions.
Professor Petr Knoth, Founder of COnnecting REpositories (CORE), and Dr David Pride, Research Associate, CORE.
This webinar details developments across three AI projects:
Petr.Knoth 0:05
Thank you so much for the introduction. I really appreciate being here. It's it's an absolute pleasure before I get started, I just you know, this is a day when we really like, you know, when we talk about open research. This is a day when we celebrate research and I'm afraid that many of you will be aware of the fact that with the difficult financial situation that many of our campuses research is or I perceive it and many of us perceive it as being under threat.
And I'd like to just mention something that I have seen a lot happening at the moment and that is that is a false narrative that I think is being widely spread around many of our campuses.
About something called underfunded research. This is, you know, some what I what I kind of perceive is that when, for instance, I create a project for and which is externally funded by, let's say UKRI and I put in research associate on it, then the cost of the research associate might be, let's say, 60 K the project with overheads might cost 150K and the grant might be let's say 120 K this is something that would be perceived by many of our universities.
As a 30,000 lost.
Basically to research and that is a completely false narrative because this doesn't account for, of course, the innovation potential that research really brings to our campuses that bring that research brings to teaching. It doesn't account for also the references which is actually cash return of money that comes back to our campuses. So what I wanted to just say before I even start, I wanted to fight this sort of false narrative, which I think is being which is very, very bad for all the research.
That is being done across the UK across lots of UK universities and I think we need to fight it with facts. That research is important, it brings innovation to universities, brings innovation to teaching and it's absolutely key to the mission of our universities. And with that, I'd like to continue and I do apologise for this because our Institute is currently being under threat exactly due to this narrative.
So my name is Peter North. I'm here with David. And also I will be presenting work of one of my research associates, doctor Sucheta Namba Arkuna, who has done.
One of the one of the pieces of the work that we will cover today.
I will. I will introduce the core. I will tell you a little bit about what, why it's important and why we really care about machine access to research content, why it's why that why that is key. And then I will talk about these three projects which were already which were already mentioned and towards the end I will talk about opportunities and challenges and the part about core GPT will be presented kindly by David. Let me tell you a little bit about our research group, our research, our research group is called big scientific data and.
Analytics. But many people know us as the core or connecting repositories. Group core is well known for being one of the most one of the largest, or if not the largest.
Database comprehensive database of research papers or Open Access research papers. More specifically, we have 42,000,000 full text research papers directly hosted in core and we provide access to metadata records from from across to from across the globe.
With about 360 million metadata records today, our team is really interested in not just sort of indexing this content, but also with by by sort of developing in in the development of the of services and that sort of improve.
Discoverability, accessibility and reusability of this content we work both with universities as well with some, and we are also supported by some commercial partners who use research.
Who use research content to do very important things for the society, such as, you know, fact checking, for instance, is is is something that I would really like to highlight as a very important tool where research content plays an important role.
So overall, the key topic of our group is about providing seamless access to open research for both humans and machines and.
Very often people understand the part about why humans need access to research papers.
But with the sheer volumes of scientific content, we need to understand that humans will not be able to read everything, and therefore we will rely and we will continue relying more and more on technology to help us read access and find effectively the relevant research content that that is important to us at a particular point in time. And with the advancement of.
Artificial intelligence. I believe that many of you will be aware why the macro machines being able to read research content is is very important.
What's the mission of core? So the mission of core is really to create a comprehensive index of all open research around the world.
And we have lots of users on the website and we want to do it not in isolation, but we are absolutely committed to doing it together with the community. So we we are a not-for-profit scholarly infrastructure that is completely dedicated to the Open Access mission where also adopters of the principles of open scholarly infrastructure or so-called positive principles.
And we are here to serve the global network and a lot of the services that we build on top of core we built together with the network of repositories. So we apply often things like you know like Co creation methodologies.
And with that, I thought I I think I gave a brief introduction and I'd like to hand over to David, who will who will enter the first part of the talk.
David.Pride 6:02
Everybody fantastic to be here. Let me just make sure that I've got everything set up and you can hear me. Hi. My name is David Pryde. I'm one of the researchers here at core. I've been here 9 1/2 years now, which is kind of scary. I was Peta's PhD student originally, and I'm now a postdoc researcher working in Cornwell projects that we've worked over recently. This is this idea for core GPT, and I'm sure everyone in the call is aware of the GPT models and the massive transformation that happened a couple of years ago.
And the work of core GPT is a taking is essentially taking those large language models and combining them with.
The content we have in core, which at the moment runs to about 40 million full text papers, first slide please better.
So what we did at the start of this now?
They're getting better is that is the is the bottom answer this. But we did, we did. We test this when we first looked at this project a while ago and we asked this was GPT 3.5 and four we asked 500 different questions and we asked it to cite the sources for the answers to those 500 questions. And at the point we did the study, it very, very confidently gave an awful lot of wrong answers. In fact, you can see that.
Fictional and other ones that are made-up.
Conflated of the interesting ones, really, because that's where it kind of it would take a real journal title or a real paper title, but it wouldn't be what it had referred to in the answer it it was a complete it was a complete false narrative still. And these are, of course, these are hallucinations and the common crazy, yes. But this is not what their generative models, but they're not what they're designed for and have a slide that can show you 20 different examples in the legal profession in the medical profession, in the scholarly domain where people have relied on citations.
Rated by GPT and they've these have turned out to be entirely entirely false and inaccurate, as I say with GPT 4 and now GPT 4.5. They are getting considerably better, but they are far from perfect, so something came out a couple of years ago that was that was introduced that we worked on. I worked on with Mateo Cantilleri, who is the lead developer of core and it's it's called a rag framework and next slide please put it and we put this together.
Into what we call core GPT and it's AQ and a platform.
That uses an LLM. In a couple of different stages to query the 40 million plus documents that we have within core and the way this works is next slide. Thank you.
It is a two stage process and so we take a user's question and this is clearly a naturally phrased question and the first thing we do is we turn that into an elastic search query. An elastic search is just the way the search across the service works in the background in core. From that we return the five most relevant papers for that question and then.
The bit where the rag comes in, which is what's called retrieval augmented generation is you take the content of those papers and you then.
Provide the model with those papers with boundaries and you say using only the content that has been provided. Please now generate the answer and within the answer give me the citations to the papers from which that answer is drawn and the effect of that is twofold. First off, you know the papers are real. You know the the answer you're being given is grounded in scientific research, and the second reason for that is.
If the model is carefully framed to only answer on the provided content if the if the question is unanswerable with the provided content, the model will say so. It will tell you that it does not have enough information to provide an answer on the content given. It won't then make up some completely random answer. Next slide please.
And so here we have an example is what is the effect of changing composition of metal alloys on their mechanical properties? And you see the opening sentence is POGPT, but then it goes for instance in the 15 something alloy exposure to intermediate temperatures can cause embrittlement through the martensitic decomposition. I even have to check that that was real because I'm not a chemist. Etcetera. And you can see there's your answer. And underneath are the five papers that that answer is drawn from. And if we go to the next slide.
Like this one out my my apologies. Go back. I'll talk about the next slide. I'd skip one there you. There was just an example that showed the paper that it was drawn from. The other thing that this does is if you've used language models at any point in the last year or so, you almost will undoubtedly. And again this is improving. The answer will be phrased with. I'm very sorry, but as a large language model, my knowledge cut off is X. And originally this date would have been November 2022 and then.
For September 2023, when these models are updated, there is even examples of research papers that have included this statement, which is pretty poor copy editing. If anyone comes across that. One of the advantages with core GPT is you do not need to retrain the model to get up to date answers because the answers are drawn from core core is aggregating constantly aggregating content from.
12 1/2 thousand repositories. Once the content is in core, it can be used by core GPT to answer questions.
Next slide please. I've got to move on quickly because I'm on a very tight time here.
This was tested with three different evaluators across three different.
Rankings. We asked them to write to rank the answers on comprehensiveness, trustworthiness, and utility. So how well-rounded is the answer? Do you believe the answer, and how useful is the answer to you? And you can see overall the the in particular in hard sciences, the questions are very good. It wasn't so good in some domains.
And the next slide please.
We got a quick example of the results here.
We can see and in this is for the answer quality is equals go very very highly in some domains not as well in some others, but still not not bad at all. One of the interesting notes about this graph is that the trust is is the yellow score.
And because of the nature of the system.
People tend to trust the answers because they know they're based on scientific peer reviewed literature, which means even when the answers were deemed to be less comprehensive or less useful, the trustworthiness of the answer actually remained significantly high, which is a real positive for the system. Next slide. And then I think I'll wrap up quickly after this one.
Citation relevance. How relevant this is. This is really a test of core because this is.
This is how relevant are the papers that we're returning in terms of answering your question and in terms of how we rank the papers as well, if your ranking algorithm works well, then logically the 1st paper that's returned should be the most relevant, the second, the third, the 4th, the 5th, etcetera. And you can actually see by the mean correlation stats here. But by the mean scores you can see that's exactly how it does work.
So you get cited answers, really nice, trustworthy answers with far fewer hallucinations than you get in a standard LLM and the last slide is the wrap up.
And that's perfect. But that one is perfect if we can skip to the next one, because otherwise I'm going to run out of time. Thank you. Sorry Pata. So we have core GPT. Gpt models are expensive. We're currently doing it a test with things like llama and some of the open models. And Mr L7B. And knows because core is currently used by 30 million people a month. And to be Frank, if we run, rolled this out to every one of those people.
We'd single handedly bankrupt the Open University and I don't think they'd let us do that. So we're working on models to bring this to a wider audience. We're also working on what we can do with it and having a conversational agent putting it's part of the recommend the API system. Having it as part of the recommender which is already used by the Open University as within Oro, which is the open universities repository, you can use the the core engine behind that.
And yeah, fantastic. Really exciting piece of work. Just wish we had more time to do these kind of things and.
And I hope that was interesting. I'm open for any questions at the end and at this point, I think I should probably hand up to PETA because I'm just about out of time.
Petr.Knoth 14:56
Thank you so much, David. And what I will attempt to do now is I will tell you a little bit about the next project, which is about automating the classification of research outputs into United Nations sustainable development goals. In this case, I will actually use the slides very loosely and I'll try to or attempt to give you a little bit of a demo towards the end of this section and then there will be one more section about reproducibility of research. So this this project.
Is effectively something that we've been doing. It's a, it's a one year project which has just finished basically.
And it has built a a dashboard for universities to understand the contributions that these universities are doing towards the United Nations Sustainable, Sustainable Development Goals. Why is this important?
And also I want to mention actually before I get there what David talked about is actually for those of you who are who know a little bit deep learning, you will know that GPT models these large language models. They are so-called decoder models then there are.
Models which are called encoder, decoder models and then there are models which are called just encoders and in this case I'm going to be using encoder models because they are incredibly efficient in tasks which are related to classification.
So.
Why is SDG classification important? Why does it actually matter for us?
The society needs to make progress towards United Nations sustainable develop sustainable development goals.
17 goals have been formulated and we in order to make progress, I believe that it's it's very important to be able to monitor it. If we can't measure a particular variable, then it's very hard for us to demonstrate and also sort of work towards improving it. And what I understand is that you know many universities, not just the Open University, are.
Here working a little bit in the dark because we produce a lot of research, but we don't necessarily know how does that research.
Contribute to those specific sustainable development goals. We could, of course, do one thing which would be to label all research papers with regards to sustainable development goals, so that we can understand then which where do we contribute. Maybe different universities contribute towards different goals and and. And if we know how to contribute, how to how to how to find those papers basically, then we can also better communicate what universities are doing.
In order to progress. So that's why you know, we realise that.
Manually sort of labelling papers is incredibly tedious and therefore we wanted to develop an AI model that would do this labelling very, very quickly, yet still accurately. So at the Open University we also have something called the open societal challenges platform. This is this is a really nice programme, basically within the university where we are sort of supporting where where we're supporting basic projects which.
Are supposed to contribute towards the society and we are feeding these labels to the.
To to this to this platform basically, but perhaps if you you know if you are from another university.
That part is not at this point so important. What I want to show you is basically that we take research outputs basically from our institutional repository called the Open Research Online, you know and there are, but we take basically outputs from any other repository in the world. We then run it through, we are pass it through an AISDG classification tool, which is based on this encoder encoder architecture.
We get multiple, one actually doesn't have to be always one. It could be 0 up to North United Nations sustainable development goals labels and then we feed it into a new dashboard tool. Any institution in the UK can get access to this dashboard tool. If you are interested in that, this is how the dashboard tool looks like and I'm going to give you a little bit of a demo. As I said before.
But it effectively keeps your university an overview of how.
Of which papers in which group you have.
You have more so you can take scope. You can make decisions and and and and and and so forth.
There are many. There were many technical challenges in developing this model, so some of them are based on the idea that the previous models for automatic classification into sustainable development goals are only based on keywords, and keywords are incredibly imprecise. So there for instance, there is for instance a model I think, which is part of the Scopus package, and that is not using machine learning and therefore the quality of that model is is not so good.
And which we wanted really to have a supervised model. We also realised that there are some existing models which only classify, you know, you give them an output and it only gives you up to a maximum of one label. That also wasn't sort of sufficient for us because we are. We thought that it is important to recognise that some papers can contribute to more than one sustainable development goal. So we wanted something which is called.
A multi Labour classification model.
And we also because in core we have 300 million plus scientific records. Basically we needed something that operates incredibly quickly. We could not afford or we would not be able to afford to run everything via a large language model even if we wanted to, because that would draw in us. So we wanted something that is really fast and that's why we went for this encoder architecture. This is how it looks like I'm going to. It's our model is based on something called sentence part.
Sentence part for those of you who are you know, who have technical background.
Sentence birth is an encoder architecture which is which is really still state-of-the-art. IT projects the text into the text you supply to it to to a multidimensional vector space where basically those papers which talk about similar things then effectively will be close to each other and we fine-tuned this expert model using.
A new database that we created by manually labelling 500 papers from the open Research online repository, and this labelling was actually quite quite challenging, but I don't think I have the time to talk about this, so I'm going to skip this, but I'm going to now skip to the last slide, but before I go there, I'm just going to attempt to.
To move the to to share with you the.
To share with you actually.
Yes, yes. Hopefully I'm sharing the the tool, so this is how the tool looks like in the O in the core dashboard. So this is how it looks like for the Open University and you can see that you know in the open research online depository, there's about 50,000 or 60,000 records, something like that. So not every paper necessarily.
Is linked to Nsdg, but many papers are so you can see that for instance, we labelled automatically 25,000 papers using SDG labels, so you know just just below 50% of our papers are related to some or to at least one sustainable development goal. You can then click for instance on a specific sustainable development goals. So I can for instance, I selected good health and well-being, and then I see how it compares to all of the outputs. I can compare multiple.
Multiple things, as you would probably not be so surprised. The Open University does a lot of research on quality education.
I can also look at SDG overall distributions so I can see how what's the profile of our university and so I can see that for instance, we do a lot of quality education partnership for the goals and good health and well-being. I can then also look at specific papers, so I can, for instance, say oh, I'm interested in this particular paper and I can see that it's been labelled with two sustainable development goals and for each of them we generate something called a confidence score. So.
This, sadly, the the labelling to against the stable development goals is actually quite.
It's it's quite a difficult process, which is to some extent subjective. So humans agree to something like 0.6 or so about at about 60% they agree on these labours. So we thought it was very important to show and be transparent about the confidence scores for for these labours. And when you when you go there, you can of course then directly go for instance to the reposters. If I clicked, I will, I will be redirected to our repository.
Open research online to be able to see exactly that paper or I can open it in cork.
So it's all about about this. I will go back to, I will go back to the slides, I'll try to do it quickly. Yes, OK. I'm back. So.
The key thing about the key contributions of this are that you know this is we're sort of we've created a new machine learning model for sustainable development goals labelling. We are just creating a research paper in which we we actually compare this model with some other models. And we have seen quite significant boost in performance over existing models. But I will not speak about this because the paper is coming up coming out basically soon.
We we have embedded it into a dashboard which other institutions can use and the data can also be available over an API if someone is interested.
And with that, I'd like to move to the last part of the project. This is about a project that is very closely related to reproducibility of of research and it is a project which is funded. Actually. Let me give you an intro and then I'll tell you how the project is funded. So more than 70% of researchers according to a study of Baker from 2016.
Have tried and failed to reproduce another scientist experiment, so many of us have been here basically and failed to do something that other people have done.
And and getting the same results. But what is even more staggering is basically that 50% of researchers have failed to reproduce even their own experiment, and the majority of of of people of researchers who were surveyed basically in this very well known study from 2016 said that that they actually consider that there's within science, that there's a considerable sort of irreducibility crisis.
So what can we do about this?
What can we do about this? So what? There are many things that can be done about this, but when these people were asked about what can be done then many of them said that not being even able to find the source code for those studies is is is effectively preventing them to reproduce experiments. And this is the problem that the software project is trying to is trying to sort of. I don't want to say just solve but contribute to solving because there are.
Many things that need to be done here.
But so one of the many issues that are hindering reproducibility and sort of discoverability of open research software is actually that the existence of the software itself is often hidden and remains hidden within the manuscript of research papers. So when we think about what libraries tend to do, we are and. And what researchers tend to do, we are usually very good in putting a lot of effort in writing research papers and curating these research papers and putting these research papers into.
Repositories giving you know attributing metadata to these research papers and so forth. There has been a lot of discussion about the fact that we need to be doing exactly the same thing about research data, and I think that we are moving in the space towards doing it for research data. But in fact there is a third asset and that's software research software which requires the same kind of level of of.
You know of curation, basically, and care. And sadly, I feel that the research community has not been really taking care of research software in a particularly good way.
So what we need to start really doing is we need to start treating research software as first class bibliographic records and once we start doing that, we will be able to link every research paper to the research software that was used within that paper. So the software project is is a two year SCHISTIRA funded project. This is an international EU project, the.
The Open University is is a coordinator in this project actually happens to be myself.
And we collaborate with with an institution in India, France with the Bernard University of Technology with IPL plan in Poland and also with Europe PMC in the UK. So we have four countries and there's one thing which I would like you to remember about sofair and that's actually the workflow which I want to show you and tell you about right now so.
What normally happens is is that there is there is a particular workflow in terms of in in which sort of shows how.
How manuscripts are? How research manuscripts are basically handled in repositories, and that workflow can be, let's say adopted?
To take care of research software in a similar way, and that's what we are sort of attempting to do here. So normally what happens is that an author at some point writes a research paper and deposits. Sorry.
An author at some point as they are working on producing a research study, they they they usually.
Have some have some software which basically they deposit or you know they typically deposited in many commits to a code repository which can be for instance GitHub. Then the author at some point when they write the research paper they deposit the manuscript into one of the many Open Access or institutional repositories and in that paper they will typically put one sentence maybe.
Where they will say, by the way, for this study I used the software.
Sometimes they put the URL to get up. Not always they do it, but sometimes they do.
So that's, that's the sort of first step once it is in a repository, we can do one thing that could help basically facilitate the the better curation. So core aggregates or indexes that paper and we can use a machine learning model to find that specific mention of the source code of the use of the software basically.
OK, how can this look like so? This sentence can look like this. For instance. This is the research paper and I can see that there's a sentence which says the advion chips of X software was used to capture an optical image of the prepared slide on a flatbed scanner. Pyrite analysis and what we would really like to do is for the AI language model to effectively extract that advion is the publisher or the creator of the of the of the software and that chip softx is actually some piece of software.
Ideally, we would also like to get a link to that software somewhere.
And so once we have that, we actually we actually put this information to, we represent this information in the core repository dashboard, the core repository dashboard is typically used by repository managers. And what we want these repository managers to do is to just enable us allow us basically as as core to push notifications about new research software to their own repositories.
So effectively we want them to to to, to authorise us to be able to send to their repositories information that we found that they are academics, have created new software which should be archived basically.
So you can probably imagine that core at some point when we discover that a new software has been created by an academic from a particular institution, we can send some sort of a notification to to that repository. And at that point the institution can send an e-mail back to the author.
Which effectively says something along the following lines. We found that you might be an author of this new software we have extracted automatically the metadata for you and we would like you to confirm that that is the case because of course AI can make mistakes and we don't want to propagate mistakes to to archives. So we want to do this we.
What, you might ask the question of why would core not not send the notification directly to the author?
Well, multiple reasons. First of all, if I as an academic get an in, get an e-mail from my own institution, I really trust it, I will probably respond, especially in my institution asks me to do something so it becomes an institutional if it becomes an institutional policy, things people will probably do it. And of course.
An institution will always have the e-mail address or the author basically, so that's that's another reason core doesn't.
Always have the e-mail address for authors of scientific papers, even though we might have, you know e-mail addresses for corresponding authors.
Ones that once it has been established that the research software is being created by a particular person by a particular author, and then that author can of course get something for it because they can also get the credit for the fact that now they will be recognised as authors of this research software. There can be an automatic.
Notification to an archive called Software Heritage.
Which effectively archives the source code from the, let's say, GitHub or another source code repository. There are many code repositories. GitHub is is the widely. This is of course one of the most widely used, but there are there are probably 5000 plus code repositories like GitHub.
So I'm close, sort of quickly moving towards.
Towards the end of the talk, so I just wanted to.
You know, just to conclude this for from my perspective, I think that identifying an archiving software assets is really, really important. We should be doing it. It's it. It matters because if we cannot establish, if we cannot make research reproducible, then this will decrease the perception of the society and and sort of the trust in research. So if we want researchers to be trusted, we should do as good of a job as we can.
To to to make it as reproducible as we can.
And.
And and doing it, of course shouldn't make the process particularly onerous. So.
It it it is, it is valuable in my opinion to make use of AI in this space to help us with these administrative tasks which otherwise would be very difficult to to do because, you know, putting another burden on academics is is difficult.
I'd like to generalise a little bit so you have seen now three talks basically or three, three projects that.
Group has been sort of involved in and I'd like us to sort of understand that, you know, there are many challenges in which for the use of AI in, in academic research and some of them I have listed here.
There are also many opportunities of AI for academic research. If we think about how.
If we think about what researchers tend to do about the kind of processes, so you know, typically research starts with some sort of.
The literature review and then researchers are involved in hypothesis generation, and then they probably create experimental code and they need to experiment, and then they need to interpret the data and then they will draught manuscripts and then they will. There will be something about it will be submitted to a peer review.
And the research process will then involve some sort of assessment, and there will be a process of dissemination. There are probably other processes which I have missed, but but this kind of shows a little bit the the life cycle of research and.
If you think carefully about this, then you will probably be able to imagine that every single step can these days be facilitated by AII don't. I'm not suggesting that it should be replaced by it, I'm just saying that we can imagine AI being used to support the researcher in every single step here, and as a result of that, I believe that research processes will dramatically change over the next 10 years.
Hopefully this they will change for the better and hopefully they will be the case that we will be. We will live in a period of almost exponential.
Almost exponential.
Research. Exponential research discoveries, basically. So we will. There's there's Good Hope that that many new things will be discovered over the next year thanks to AI. That's all from me before I finish. I'd like to thank a lot the members of the, you know, core members. Without them, we would not be able to do the research which I have presented here. We would not be able to run core core members.
Our institutions that kindly donate resources to the core team to be able to, to, to run core and and we really, really appreciate this.
Yes. And on that note, I would really like to finish and hand back over to to the organisers. Thank you so much.