What is the Extract Summit 2022 coding contest?

Calling all data lovers and web data enthusiasts! We’re excited to be back in person at the Extract Summit 2022—and what better way to get things rolling than with a coding contest?

This is the time to let your web scraping skills shine. Crawl and scrape all expected items from a given website using Scrapy Cloud. The first to succeed claims bragging rights and an exclusive grand prize.

Don’t fret if you don’t win—lucky participants will get some cool prizes too!

How to participate

Scrapy Cloud Account

Register for a Scrapy Cloud account if you do not already have one. There is a forever free plan.

Join Scrapy Cloud

Scrapy Discord Access

Join the Scrapy Discord. The target website’s URL will be revealed through Discord.

Join Scrapy Discord

Register for the contest

Fill up the form to register. We will need the information to ensure that you are correctly enrolled for the contest.

Register Now

Contest Outline

The contest is free to join and open to everyone.

Make sure that you’ve registered your Scrapy Cloud account and have Scrapy Discord access. You’ll need to submit your registration form as well.

Once these are in place, you’re ready to take part in the contest when it launches on 28th September.

Here’s how the contest will go:

  1. The URL of the target website will be revealed on Discord. There will be a specification of the item fields that need to be extracted.
  2. You must write a spider that extracts all items with the specified fields and run it in Scrapy Cloud.
  3. Once the Scrapy Cloud job finishes, you must submit the job ID to a bot in the Scrapy Discord server.
  4. The bot will let you know if you have managed to extract all items with complete data.
  5. If you failed, update your code and try again with a new Scrapy Cloud job. The bot accepts unlimited job submissions for the duration of the contest.
  6. To win, be the first to submit a job that successfully extracts all expected data.
  7. The website does not ban clients, so you will not need a proxy. But crawling the website and extracting item data will not be straightforward, so do not expect to get a working spider on your first run.

Prepare for the Coding Contest

For a few days before the contest, we will enable a testing website so you can practice and prepare a code base. Once the actual contest starts, simply update your code to point to the contest website and update your crawl and extraction logic accordingly.

Good luck, and may the data be ever in your favour!

Join the Contest