2ちゃんねるBOTの作り方実装編1

2ちゃんねるBOTの作り方準備編 - GIOの日記
 2ちゃんねるBOTの作り方設計編 - GIOの日記

必要な機能を思いだそう

1.掲示板一覧からニュー速VIPのURLを取得
2.ニュー速VIPの全てのスレッド情報を取得
3.全てのスレッドから全ての画像URLを取得
4.全ての画像をダウンロード
5.同じ画像をダウンロードしないようにスレッド情報を保存

最初にAPIを決めよう

最初にAPIを決めるのはテスト駆動開発を進めたり、美しいコードを書く上で有利です。まずはスケルトンコードっぽく実装

module Bot2ch
  class Menu
    def get_board(subdir)
    end
  end
  class Board
    def get_threads
    end
  end
  class Thread
    def get_images
    end
  end
  class NormalImageDownloader
    def download
    end
  end
  class App
    def execute(subdir)
      menu = Menu.new
      board = menu.get_board(subdir)
      threads = board.get_threads
      threads.each do |thread|
        images = thread.get_images
        images.each do |image|
          image.download
        end
      end
    end
  end
end
Bot2ch::App.new.execute('news4vip')

それでは機能をスケルトンコードに肉付けしていきましょう。

1.掲示板一覧からニュー速VIPのURLを取得

掲示板のホストは頻繁に変更されるので、最も変更が少ないであろうサブディレクトリからURLを取得しよう。ニュー速VIPのサブディレクトリはnews4vipです。

require 'open-uri'
class Menu
  def initialize
    @bbsmenu = 'http://menu.2ch.net/bbsmenu.html'
  end

  def get_board(subdir)
    reg = Regexp.new("href=(.+#{subdir})", Regexp::IGNORECASE)
    open(@bbsmenu) do |f|
      f.each do |line|
        return Board.new($1) if line =~ reg
      end
    end
  end
end

2.ニュー速VIPの全てのスレッド情報を取得

掲示板URL/subject.txtからスレッド情報を取得しよう。「DATファイル名<>タイトル」となってます。
※2chのデータは全てSJISなので環境に合わせて変換しましょう。

class Board
  def initialize(url)
    @url = url
    @subject = "#{url}/subject.txt"
  end

  def get_threads
    threads = []
    open(@subject) do |f|
      lines = f.read.toutf8
      lines.each do |line|
        dat, title = line.split('<>')
        threads << Thread.new("#{@url}/dat/#{dat}", title)
      end
    end
    threads
  end
end

3.全てのスレッドから全ての画像URLを取得

次にDATから画像のURLを取得します。DATは<>で要素を区切られており、
4つめの内容が実際の書き込み内容になっているので、そこから画像URLを探せば良いのであります。
また、色々なアップローダーに対応できるような仕組みにしましょう。
URLらしきもの全てに対して、対応するダウンローダークラスでダウンロードする感じです。

class Thread
  attr_accessor :title

  def initialize(url, title)
    @dat = url
    @title = title.strip
  end

  def get_images
    images = []
    downloaders = [NormalImageDownloader]
    open(@dat) do |f|
      lines = f.read.toutf8
      lines.each do |line|
        contents = line.split('<>')[3]
        while contents =~ /\/\/[-_.!~*\'()a-zA-Z0-9;\/?:\@&=+\$,%#]+/i
          url = "http:#{$&}"
          contents = $'
          image_downloader = downloaders.find { |d| d.match(url) }
          next unless image_downloader
          images << image_downloader.new(url)
        end
      end
    end
    images
  end

  def dat_no
    File.basename(@dat, '.dat')
  end
end

4.全ての画像をダウンロード

とりあえず通常画像のダウンローダーを実装しましょう。URLの拡張子がjpgならばダウンロードできるクラスです。

class NormalImageDownloader
  def initialize(url)
    @url = url
  end

  def download(saveTo)
    puts "download: #{@url}"
    open(saveTo, 'wb') do |f|
      open(@url) do |img|
        f.write img.read
      end
    end
  end

  def self.match(url)
    url =~ /.jpg$/i
  end
end

つじつまあわせ

一通り実装できたのでAPPクラスの辻褄合わせをします。

class App
  def execute(board)
    image_root_dir = File.dirname(__FILE__) + '/images'
    menu = Menu.new
    board = menu.get_board(board)
    threads = board.get_threads
    puts "total: #{threads.length} threads"
    threads.each do |thread|
      images = thread.get_images
      next if images.empty?
      parent_dir = "#{image_root_dir}/#{thread.dat_no}" 
      Dir.mkdir(parent_dir) unless File.exists?(parent_dir)
      puts "#{thread.title}: #{images.length} pics"
      images.each_with_index do |image, index|
        image.download("#{parent_dir}/#{index}.jpg") rescue next
        sleep(0.2)
      end
    end
  end
end

以上のように、前回決めたディレクトリ構成に沿ってスレッド毎連番で画像を保存するように変更しました。
ここまでのコードはコチラ

実行

ruby bot2ch.rb

ここまで書いて気づいたのですが、5.の実装を忘れてた。
起動毎に同じ画像がダウンロードされてしまうので対策しなければいけません。
疲れたので、明日にでも。
ちなみに完走させてません。実行すると鬼のようにダウンロードしてくるはずです。

実装編1おしまい！