+ Reply to Thread
Results 1 to 1 of 1

Thread: Why web data extraction service?

  1. #1
    webmining's Avatar
    webmining is offline Junior Member
    Join Date
    Jun 2008
    Posts
    6
    Ranking
    0

    Post Why web data extraction service?


    Without extraction tools
    Tools are needed to manage all available information including the Web, subscription services, and internal data stores. Without an extraction tool (a product specifically designed to find, organize, and output the data you want), you have very poor choices for getting information. Your choices are:
    Use search engines Search engines help find some Web information, but they do not pinpoint information, cannot fill out web forms they encounter to get you the information you need, are perpetually behind in indexing content, and at best, can only go two or three levels deep into a Web site. And they cannot search file directories on your network.
    Manually surf the Web and file directories Aside from the labor-intensive aspect of this option, the work is tedious, costly, error prone, and very time consuming. Humans have to read the content of each page to see if it matches their criteria, whereas a computer is simply matching patterns, which is so much faster.
    Create custom programming Custom programming is costly, can be buggy, requires maintenance, and takes time to develop. Plus the programs must be constantly updated as the location of information frequently changes.
    Inefficient methods means the information analyst spends time finding, collecting, and aggregating data instead of analyzing data and gaining the competitive edge. This also affects the application programmer who has to spend time developing extraction tools instead of developing tools for the core business.
    New solutions improve productivity
    Extraction tools using a concise notation to define precise navigation and extraction rules greatly reduce the time spent on systematic collection efforts. Tools that support a variety of format options provide a single development platform for all collection needs regardless of electronic information source.
    Early attempts at software tools for “Web harvesting” and unstructured data mining emerged, and started to get the attention of information professionals. These products did a reasonable job of finding and extracting Web information for intelligence gathering purposes. But this was not enough. Organizations needed to reach the “deep Web” and other electronic information sources, capabilities beyond simplistic Web content clipping.
    A new generation of information extraction tools is markedly improving productivity for information analysts and application developers.
    Uses for extraction tools
    The most popular applications for information extraction tools remain competitive intelligence gathering and market research, but there are some new applications emerging as organizations learn how to better use the functionality in the new generation of tools.
    Deep Web price gathering The explosion of e-tailing, e-business, and e-government makes a plethora of competitive pricing information available on Web sites and government information portals. Unfortunately, price lists are difficult to extract without selecting product categories or filling out Web forms. Also, some prices are buried deep in .pdf documents. Automated forms completion and automated downloading are necessary features to retrieve prices from the deep Web.
    Primary research Message boards, e-pinion sites, and other Web forums provide a wealth of public opinion and user experience information on consumer products, air travel, test drives, experimental drugs, etc. While much of this information can be found with a search engine, features like simultaneous board crawling, selective content extraction, task scheduling, and custom output reformatting are only available with extraction tools.
    Content aggregation for information portals Content is exploding and available from Web and non-Web sources. Extraction tools can crawl the Web, internal information sources, and subscription services to automatically populate portals with pertinent content such as competitive information, news, and financial data.
    Supporting CRM systems The Web is a valuable source of external data to selectively populate a data warehouse or a CRM database. To date most organizations focus on aggregating internal data for their data warehouses and CRM systems. Now, however, some organizations are realizing the value of adding external data as well. In the book Web Farming for the Data Warehouse from Morgan Kaufman Publishers, Dr. Richard Hackathorn writes, “It is the synergism of external market information with internal customer data that creates the greatest business benefit."
    Scientific research Scientific information on a given topic (such as a gene sequence) is available on multiple Web sites and subscription services. An effective extraction tool can automate the location and extraction of this information and aggregate it into a single presentation format or portal. This saves scientific researchers countless hours of searching, reading, copying, and pasting.
    Business activity monitoring Extraction tools can continuously monitor dynamically changing information sources to provide real time alerts and to populate information portals and dashboards.
    by knowlesys
    Last edited by prashant32; 09-11-2008 at 02:22 PM.

+ Reply to Thread

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
SEO by SubmitEdge
SEO by SubmitEdge

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293