Hello, I need a HTML correction Bot. we are scraping data from a website and we have html. Unfortunately their html have alot of wrong formatting and styles which we would like to eliminate. so, the job of the bot is to eliminate those html format errors and styles. SO, i will be giving the bot a csv or excel. There will be 2 columns in it. Description and short description. JOB 1 of the bot, is to visit, http://www.tinymce.com/tryit/full.php Press the HTML button. Clear content. And then Insert cell A1's HTML into the HTML Source Editor. Once it is inserted, it should press update & close the window. Then, it should re-open html editor to copy the contents. (We are doing this step because TINYMCE will correct all wrong formatting automatically once this is done.) Now, we will have a good format HTML. This new GOOD HTML will then we Inserted into A1 of another sheet. Job 1 is done. WE have a good format HTML and half the job is done. Next Job of the bot, JOB 2 is to clear unnecessary styles, classes within the GOOD Format HTML. Example, <table style="width: 269px; border-collapse: collapse;" border="0" cellspacing="0" cellpadding="0"> -- If there is <table style= Blah Blah> or <table class= blah blah> or <table WATEVER> it should delete the style blah blah and just leave it as <table> <colgroup><col style="width: 202pt; mso-width-source: userset; mso-width-alt: 8608;" width="269" /></colgroup> -- If there is <colgroup> it should totally delete till </colgroup> Example, <tr style="height: 30pt; mso-height-source: userset;"> - Again remove the style or class or watver blah blah and just leave it as <tr> Example, <td class="xl27" style="width: 202pt; height: 30pt; background-color: transparent; border: #ece9d8;" width="269" height="40"> - Again remove the style or class or watver blah blah and just leave it as <td> Example, <span style="font-family: Times New Roman;"> - Again, remove style or class or watver and just leave it as <span> Once the styles and classes are removed, that will be the final HTML output. Can this be done?
Bid on this project or post similar!