Thomas Drugeon’s research while affiliated with Institut national de l'audiovisuel and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (4)


Archiving Social Media: The Case of Twitter
  • Chapter

July 2021

·

178 Reads

·

8 Citations

·

Jérôme Thièvre

·

Thomas Drugeon

Around the world, billions of people use social media like Twitter and Facebook every day, to find, discuss and share information. Social media, which has transformed people from content readers to publishers, is not only an important data source for researchers in social science but also a “must archive” object for web archivists for future generations. In recent years, various communities have discussed the need to archive social media and have debated the issues related to its archiving. There are different ways of archiving social media data, including using traditional web crawlers and application programming interfaces (APIs) or purchasing from official company firehoses. It is important to note that the first two methods bring some issues related to capturing the dynamic and volatile nature of social media, in addition to the severe restrictions of APIs. These issues have an impact on the completeness of collections and in some cases return only a sample of the whole. In this chapter, we present these different methods and discuss the challenges in detail, using Twitter as a case study to better understand social media archiving and its challenges, from gathering data to long-term preservation.



A technical approach for the French web legal deposit

January 2005

·

32 Reads

·

7 Citations

In this paper we present the technical approach we developed at the National Audiovisual Institute (INA) for the forthcoming extension of the French legal deposit law to web contents. This paper covers crawling and storage aspects of the project. The INA being in charge of the archiving of websites which are related to the media and AV (Audiovisual), focused crawling strategies are used to discover and subsequently archive relevant websites. A two-level crawling architecture is described, carrying on focused and continuous crawling policies using the web site as the unit. Thereby, issues raised by the harvest of the internals of a website and those of managing the global crawl are handled independently, allowing for finer control and better comprehensiveness in data archiving. The lower layer of the storage model used to archive and preserve web contents relies on a custom, distributed file system, using the same networking solutions as the crawling system. Specific issues led by streamed AV contents are taken into account in both crawling and storage parts.


Citations (2)


... Social media archives are typically created by web crawlers or other automated tools that capture content from social media platforms such as Twitter, Facebook, and Instagram (Anderson, 2020;Pehlivan et al., 2021). These archives capture and preserve social media content, including posts, comments and other interactions, and are used by researchers studying topics such as social movements, political campaigns, public opinion as well as journalists and other media professionals to track breaking news and monitor public sentiment. ...

Reference:

Indigenous Research and Data Management in Electronic Archives: A Framework for African Indigenous Communities
Archiving Social Media: The Case of Twitter
  • Citing Chapter
  • July 2021

... The resulting shape of a corpus of Web archives depends on both the crawling mechanism and the storage technique 14 . In what follows, we refer to the specific INA's crawler and DAFF file format (Drugeon, 2005). At the INA, in order to archive an entire Web site, the crawler has to frequently collect all its constituent pages provided they evolve since their last harvesting (Lobbe, 2018). ...

A technical approach for the French web legal deposit
  • Citing Article
  • January 2005