當前位置:首頁 » 編程語言 » 爬蟲java代碼

爬蟲java代碼

發布時間: 2022-03-06 06:47:54

❶ 求一個網路爬蟲的java代碼,

貼吧有特定的嗎?還是泛指各種貼吧?

❷ 跪求Java網路爬蟲 代碼

我不會,我知道一個人,他肯定會,我同學Q:820215725,不是廣告

❸ java網路爬蟲

源代碼如下
package com.cellstrain.icell.util;

import java.io.*;
import java.net.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
* java實現爬蟲
*/
public class Robot {
public static void main(String[] args) {
URL url = null;
URLConnection urlconn = null;
BufferedReader br = null;
PrintWriter pw = null;
// String regex = "http://[\\w+\\.?/?]+\\.[A-Za-z]+";
String regex = "https://[\\w+\\.?/?]+\\.[A-Za-z]+";//url匹配規則
Pattern p = Pattern.compile(regex);
try {
url = new URL("網址");//爬取的網址、這里爬取的是一個生物網站
urlconn = url.openConnection();
pw = new PrintWriter(new FileWriter("D:/SiteURL.txt"), true);//將爬取到的鏈接放到D盤的SiteURL文件中
br = new BufferedReader(new InputStreamReader(
urlconn.getInputStream()));
String buf = null;
while ((buf = br.readLine()) != null) {
Matcher buf_m = p.matcher(buf);
while (buf_m.find()) {
pw.println(buf_m.group());
}
}
System.out.println("爬取成功^_^");
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
pw.close();
}
}
}

❹ 求java網路爬蟲的源代碼

package com.heaton.bot;import com.heaton.bot.*;import java.net.*; /** * The SpiderWorker class performs the actual work of * spidering pages. It is implemented as a thread * that is created by the spider class. * * Copyright 2001-2003 by Jeff Heaton ( http://www.jeffheaton.com) * * @author Jeff Heaton * @version 1.2 */public class SpiderWorker extends Thread { /** * The URL that this spider worker * should be downloading. */ protected String target; /** * The owner of this spider worker class, * should always be a Spider object. * This is the class that this spider * worker will send its data to. */ protected Spider owner; /** * Indicates if the spider is busy or not. * true = busy * false = idle */ protected boolean busy; /** * A descendant of the HTTP object that * this class should be using for HTTP * communication. This is usually the * HTTPSocket class. */ protected HTTP http; /** * Constructs a spider worker object. * * @param owner The owner of this object, usually * a Spider object. * @param http */ public SpiderWorker(Spider owner,HTTP http) { this.http = http; this.owner = owner; } /** * Returns true of false to indicate if * the spider is busy or idle. * * @return true = busy * false = idle */ public boolean isBusy() {<-文章出處: http://www.diybl.com/course/3_program/java/javajs/200797/69988.html

❺ 用java編寫 網路爬蟲求代碼和流程 急

import java.awt.*;
import java.awt.event.*;
import java.io.*;
import java.net.*;
import java.util.*;
import java.util.regex.*;
import javax.swing.*;
import javax.swing.table.*;//一個Web的爬行者(註:爬行在這里的意思與抓取,捕獲相同)
public class SearchCrawler extends JFrame{
//最大URL保存值
private static final String[] MAX_URLS={"50","100","500","1000"};

//緩存robot禁止爬行列表
private HashMap disallowListCache=new HashMap();

//搜索GUI控制項
private JTextField startTextField;
private JComboBox maxComboBox;
private JCheckBox limitCheckBox;
private JTextField logTextField;
private JTextField searchTextField;
private JCheckBox caseCheckBox;
private JButton searchButton;

//搜索狀態GUI控制項
private JLabel crawlingLabel2;
private JLabel crawledLabel2;
private JLabel toCrawlLabel2;
private JProgressBar progressBar;
private JLabel matchesLabel2;

//搜索匹配項表格列表
private JTable table;

//標記爬行機器是否正在爬行
private boolean crawling;

//寫日誌匹配文件的引用
private PrintWriter logFileWriter;

//網路爬行者的構造函數
public SearchCrawler(){
//設置應用程序標題欄
setTitle("搜索爬行者");
//設置窗體大小
setSize(600,600);

//處理窗體關閉事件
addWindowListener(new WindowAdapter(){
public void windowClosing(WindowEvent e){
actionExit();
}
});

//設置文件菜單
JMenuBar menuBar=new JMenuBar();
JMenu fileMenu=new JMenu("文件");
fileMenu.setMnemonic(KeyEvent.VK_F);
JMenuItem fileExitMenuItem=new JMenuItem("退出",KeyEvent.VK_X);
fileExitMenuItem.addActionListener(new ActionListener(){
public void actionPerformed(ActionEvent e){
actionExit();
}
});
fileMenu.add(fileExitMenuItem);
menuBar.add(fileMenu);
setJMenuBar(menuBar);

❻ 200分求java網路爬蟲的源代碼

http://search.gougou.com/search?search=%E7%BD%91%E7%BB%9C%E7%88%AC%E8%99%AB&id=2

❼ Java源碼 實現網路爬蟲

//Java爬蟲demo

importjava.io.File;
importjava.net.URL;
importjava.net.URLConnection;
importjava.nio.file.Files;
importjava.nio.file.Paths;
importjava.util.Scanner;
importjava.util.UUID;
importjava.util.regex.Matcher;
importjava.util.regex.Pattern;

publicclassDownMM{
publicstaticvoidmain(String[]args)throwsException{
//out為輸出的路徑,注意要以\結尾
Stringout="D:\JSP\pic\java\";
try{
Filef=newFile(out);
if(!f.exists()){
f.mkdirs();
}
}catch(Exceptione){
System.out.println("no");
}

Stringurl="http://www.mzitu.com/share/comment-page-";
Patternreg=Pattern.compile("<imgsrc="(.*?)"");
for(intj=0,i=1;i<=10;i++){
URLuu=newURL(url+i);
URLConnectionconn=uu.openConnection();
conn.setRequestProperty("User-Agent","Mozilla/5.0(WindowsNT6.3;WOW64;Trident/7.0;rv:11.0)likeGecko");
Scannersc=newScanner(conn.getInputStream());
Matcherm=reg.matcher(sc.useDelimiter("\A").next());
while(m.find()){
Files.(newURL(m.group(1)).openStream(),Paths.get(out+UUID.randomUUID()+".jpg"));
System.out.println("已下載:"+j++);
}
}
}
}

❽ Java源碼 實現網路爬蟲

給我郵箱~~~~ 看你問好幾天了

❾ 網路爬蟲解析網頁怎樣用java代碼實現

爬蟲的原理其實就是獲取到網頁內容,然後對其進行解析。只不過獲取的網頁、解析內容的方式多種多樣而已。
你可以簡單的使用httpclient發送get/post請求,獲取結果,然後使用截取字元串、正則表達式獲取想要的內容。
或者使用像Jsoup/crawler4j等這些已經封裝好的類庫,更方便的爬取信息。

❿ 求用JAVA編寫網路爬蟲的源代碼

我不知道 你用來干什麼 網路爬蟲太多了 你說的詳細點 才能給你編寫 我有一套採集qvod視頻 自己編寫的 你可以告訴我你想採集那個網站 我給你編一套

熱點內容
虛擬伺服器如何開店 發布:2025-07-16 11:32:28 瀏覽:298
C語言考過 發布:2025-07-16 11:32:17 瀏覽:89
linux桌面系統排名 發布:2025-07-16 11:29:14 瀏覽:777
編譯桌面布局 發布:2025-07-16 11:22:48 瀏覽:857
mc怎麼免費開伺服器網易版 發布:2025-07-16 11:22:36 瀏覽:272
php字元串數組替換 發布:2025-07-16 11:00:08 瀏覽:470
java詞雲 發布:2025-07-16 10:56:22 瀏覽:633
手機h5上傳圖片 發布:2025-07-16 10:49:49 瀏覽:878
編程屋網址 發布:2025-07-16 10:49:15 瀏覽:89
寶沃7配置怎麼樣 發布:2025-07-16 10:42:00 瀏覽:687