By taking at look at Web Pages, we may expect to discover that
some patterns of semantics are encoded using very few HTML sets of, let us say, “combinators”
(HTML parts); this may be due to the lack of abstraction capabilities which is
inherent to the HTML alone. We have compared this situation to the Noisy-Channel model in a previous post
where we presented some interesting figures and data illustrating the claim.
Let us continue our journey showing further instances of this phenomenon whose formal
analysis is crucial for intelligent refactoring tools as the kind we have been pursued to introduce by
means of this sequel of posts. In other words, let us know other forms of “HTML noise”. As a word of warning, we recall
that the data is the result of a particular muster of crawled pages by the way we explained before.
For this post, we are experimenting
with tables that are potentially used as page layouts or page structure. For
those kinds of tables, we want to study the table shape or page surface, no the specific content; we may think of that as a
way to filter potential candidates for further deeper semantic analysis. (We briefly
recall that our muster contains 819 pages and about 5000 table instances,
roughly speaking.).
The exercise is simple: we
postulate an intuitive definition for a table as surface and see how well it is
supported by our data in muster.
Let us try our shallow analysis
by classifying a table as a page layout candidate
if its container is the page body
tag, eventually followed by a chain of div
tags (assuming such div tags are intended to be organizers or formatters of the
table), it has at least two rows and at least 2 columns (two columns is the
most interesting case, we consider it as a base).
Such a pattern definition sounds
reasonable in appearance; however, we will see that its empirical support is
not as high as one may expect, at least in our muster.
We find 261 of such candidates;
they represent a 31% of all pages, which is a quite interesting amount; however
it is unexpectedly small because one may guess there should be at least one per
page. Among these 261, we have 83 where the table is hanging directly from the
body tag (32% of the candidates; 10% of the whole muster). As a matter of fact,
such 83 tables present irregular patterns, albeit often we find 2 columns (65%)
with a high variance. For instance, we may find a pattern of the form 6.2.2.2.2.2.2.2,
where we use our convention of showing a table of n rows as a sequence of n
numbers, each of one being the number of cols (in example 8 rows, the first of
them with 6 columns the rest having 2 columns). But even worst, we find the
irregular pattern 2.2.7.2.7.7.6.5.5.4.4.5.2.3.2.7.2.7.
And talking about irregularity, let us take a look at this interesting one: 19.2.7.4.6.2.2.2.2.2.2.2.2.2.2.5.7.2.2.2.2.4.4.2,
whatever it means.
With this simple analysis, we may
learn that, perhaps, some intuitive definitions occur not as frequent as we may
expect in our muster. Actually, and after seeing in detail some of the irregular
cases, a sound conclusion might be that we may need first to pre-classify some
parts of the page before using general patterns like the one we directly tried.
In other words, we see that some noise needs to be filtered out for such a kind
of pattern.
In a forthcoming post, we will continue
studying that kind of patterns and their support.
Last week, a developer from a company that is evaluating a trial version of the Visual Basic Upgrade Companion sent us an email, asking if they should use the Microsoft Visual Basic 6.0 Upgrade Assessment Tool and the Code Advisor. Perhaps someone else has a similar doubt, so I thought it may be a good idea to share our response here.
First of all, let's remember that we are talking about three separate --and different-- tools:
- Visual Basic Upgrade Companion (VBUC): this is ArtinSoft’s Visual Basic 6.0 to VB.NET/C# migration tool. Basically, you use this tool to convert your VB6 code to .NET.
- Microsoft Visual Basic 6.0 Upgrade Assessment Tool: this tool was written for Microsoft by ArtinSoft, and can be downloaded free of charge from http://www.microsoft.com/downloads/details.aspx?FamilyID=10c491a2-fc67-4509-bc10-60c5c039a272&DisplayLang=en. The purpose of this tool is to generate a detailed report of the characteristics of your VB6 code, giving you an idea of the size and complexity of the code from a migration standpoint. The tool itself does not make any modification of conversion of the source code.
- Code Advisor: this tool is also provided by Microsoft, free of charge, and can be downloaded from http://www.microsoft.com/downloads/details.aspx?familyid=a656371a-b5c0-4d40-b015-0caa02634fae&displaylang=en. The Code Advisor analyzes your VB6 source code and looks for particular migration issues within the code. Each issue is marked with a code comment that suggests how to modify the VB6 code to avoid the problem.
The purposes of the Microsoft Visual Basic 6.0 Upgrade Assessment Tool and the Code Advisor are different, so it is recommended that you use both of them. However, it is important to note that the Code Advisor was designed for users that plan to migrate with the Visual Basic Upgrade Wizard (the conversion tool that comes with Visual Studio .NET), and since VBUC has a greater migration coverage, some of the issues that will be flagged by the Code Advisor will be fixed automatically by VBUC. For a detailed discussion on those issues, please refer to my article “Visual Basic Upgrade Companion vs. Code Advisor”: http://www.artinsoft.com/VB-Upgrade-Companion-vs-CodeAdvisor.aspx
Yesterday, one of the attendees from the
Virtualization events asked this question which I though would be worthwhile to share:
For a simple .NET application like this, would we need different applications when running on 64 vs. 32 bit hosts?
Before answering, please allow me to elaborate more on where the question is going. Virtual Server has a COM API that allows it to be managed by applications and scripts. Virtual Server R2 SP1 Beta 2 (phew) comes in two flavors: 32-bit and 64-bit. The owner of the question wondered if you could manipulate a 64-bit instance of Virtual Server using a 32-bit application (or vice-versa).
Ok, now that the question is (hopefully) a bit clearer, the answer to the question is
no, you do not need to have a different version for accessing Virtual Server from an application regardless of its bit-architecture. Why? Virtual Server's COM API is accessed by an out-of-process COM library, which means that everything is done by means of RPC. When two applications are communicating with each other by means of RPC, the 1st commandment of 64-bit is
not broken (thou shall not run 32-bit and 64-bit code within the same process space).
Riddle me this: How many licenses of Windows Server Enterprise Edition would you need if you are planning on running 20 Virtual machines inside a server that has 2 processors? Very, easy, you would need only 5 licenses. Too tough? How about this one...what would be the price difference if you were running 50 machines running Windows Server 2003 on a virtualization server with 2 processors if you chose to run the host machine with Windows Server Enterprise Edition vs. Windows Server Datacenter Edition? Very easy...running Datacenter edition would be $25,580 cheaper.
It definitely is tempting to say that I can pull this info right off the top of my head, but that would be a big big lie. The secret lies in the sweet web application Microsoft has published. It is called the
Windows Server Virtualization Calculator, and without a doubt, it will clear a lot of doubts and will show you the best way to go (in terms of licensing) when consolidating your data center, enjoy!
Have you ever seen the Exit rows in an airplane? They longer leg space than coach, and after business or first, they are the best seats in the place. The bad news is that these seats are not reserved for anyone, or at least not in American Airlines. These seats are reserved for those travelers who have some kind of status such as Platinum or Gold. This means that if you do not have a status, you cannot choose them on-line (the seats will show up as unavailable), but fear not - I have found a workaround in some cases.
Say that you have no status at all in American Airlines, but you are traveling with a colleague or friend that does have this status. Before purchasing the ticket, you must tell your travel agent to place both tickets within the same record locator. The person with the high status will be able to select these exit rows for you and you will be able to fly a lot comfortable without having to have a high status.
Be warned that if 2 or more people are on the same American itinerary, and one of them selects an Upgrade to business, everyone in the itinerary will have a request to first. If they do not have enough upgrade stickers, the consequences can be quite bad - such as losing the exit row that was pre-selected and having to fly (if lucky) on the worst seat in the plane :S
During a migration project, the issues that your team will face may tend to become repetitive. Because of this, it is important that you have mechanisms that allow team members to share the knowledge that they have earned in the process of migrating the application. This way, you are not likely to have a team member struggling to fix a migration issue that someone else in the team already knows how to solve.
An ideal way of sharing team knowledge during the migration process is the creation of a Project Knowledge Base, where team members can post the solutions that they have applied to previous migration issues. With a good knowledge base, developers will be able to make searches and retrieve information that will help them fix the issues that they are facing, possibly increasing team productivity and reducing costs.
To be effective, your project knowledge base needs to have the following characteristics:
- Easy access: team members should easily retrieve information as well as add new items to the knowledge base.
- Search capability: of course, you don’t want team members navigating the knowledge base for hours to find a solution to a problem.
- Periodic backup: place the knowledge base on a server that is being backed up regularly. In a later project, the information stored may be useful again.
It is common to implement these knowledge bases using a Wiki engine. More information on Wiki can be obtained at http://www.wiki.org/wiki.cgi?WhatIsWiki.
Also, some examples of popular wiki sites are Wikipedia (http://www.wikipedia.org/) and Memory Alpha (http://memory-alpha.org/en/wiki/Portal:Main), this last one is a personal favorite :)
Ayer nos mudamos a las nuevas oficinas de Curridabat. A (casi) todo el mundo le queda más cerca de su casa y la verdad es que están muy lindas, por lo que reina la alegría. Poco importa que el edificio esté a medio construir. El nuevo lugar de almuerzo está genial y estamos por anotarnos en masa en un gimnasio con pileta y todo.El team room tiene muchas paredes y eso sacía mi apetito de post-its. Si me acuerdo, mañana traigo la cámara y empiezo a postear imágenes del lugar de trabajo.
En la retrospectiva del sprint #1 hubo un comentario de parte de un miembro del equipo que me pareció interesante: a pesar de nunca antes habíamos hecho Scrum, su sensación es que las reuniones eran un poco caóticas y que el ScrumMaster (i.e. yo) tenía que poner orden. Mi intención original era comenzar el proyecto siendo permisivo (timeboxing estirable, opiniones de miembros externos no moderadas, roles un poco difusos), pero me di cuenta que el truco está justamente en empezar siendo ortodoxo. Uno de los primeros puntos en los que decidí ponerme inflexible desde el comienzo mismo del sprint #2 es el timeboxing: las reuniones estaban comenzando tarde y muchas veces se extendían demasiado. Siendo que soy un tipo más visual que otra cosa, decidí comunicar la idea del timeboxing de la forma más explícita posible. Lo principal, claro está, fue el cambio en mi actitud, pero estos dos bichos me ayudaron bastante:
- El chanchito: una caja de cartón con un agujerito en su tapa. El que llega tarde a un daily meeting o a la retrospectiva paga según lo estipulado en una tabla que está pegada en la pared:
- 0'<t<5' : 200 colones (0,40$)
- 5'≤t<10': 500 colones (1$)
- 10'≤t": 1000 colones (2$)
- El sapo: una cajita simpaticona que tiene 4 posibles cronómetros - básicamente los únicos que usamos son el de 15' y 60' - a todo el mundo le parece simpática, pero además viene siendo muy efectiva
Nota: hasta el momento llevamos recaudados unos 7000 colones (14$) - la idea es usarlo para comprar snacks para picar durante las reuniones
Finalmente decidí darle un número al sprint abortado, por lo que el que acaba de terminar fue el #3 nomás. El review del viernes salió bastante bien...o por lo menos mucho mejor de lo que me lo esperaba. A diferencia del primer sprint review, esta vez hubo mucho demo y no tanta discusión filosófica. Creo que en ese sentido ayudó bastante el aclarar que íbamos a ser estrictos con el timeboxing y el simple de hecho de que ya habíamos tenido un review antes, que en mi opinión había salido bastante mal.
Unas horas antes del review MC y MR me dijeron que había que preparar una presentación (i.e. PPT, o al menos eso entendí yo) en las que se iba a introducir lo hecho en el sprint. Yo contesté que no era aconsejable invertir más de 1 hora en total en la preparación del review y que, además, el Product Owner era quien había elegido los user stories que iban a ser desarrollados, por lo que no valía la pena explicarle lo que él mismo ya conocía bien. La contra-respuesta fue que iban a asistir a la reunión
personas que poco sabían del proyecto (i.e. futuros miembros del equipo y un directivo de la empresa, así como LC). Mi contra-contra-respuesta, tal vez un poco dura, fue que no era responsabilidad del equipo subsanar el hecho de que no todos los stakeholders habían hecho sus deberes. La c-c-c-respuesta tuvo mucho sentido: "van a pensar que trabajamos mal". Y sigo en forma de diálogo:
- El review no es para quedar bien, sino para obtener feedback
- Pero de qué nos sirve el feedback de alguien que no entiende lo que ve
- Buen punto, pero no tapemos agujeros. Si hace falta que sepan y no saben, que se note
Sin embargo me quedó un sabor amargo después de esta charla...¿Quién tiene que poner al tanto a los stakeholders? ¿Y qué pasa si esos stakeholders están por ingresar al equipo?
Maybe I’m
wrong but after 8 year in web development, 4 asp classic year, 2 year
transition to .Net world and last year doing heavy development on asp.net 2.0 I
think that I may have a good opinion on the best resources online for the
asp.net development.
I was
thinking sometime ago about give the credit to the great work of
Mads Kristensen and his .NET SLAVE blog,
to me, the best blog around the blogsphere when talking about ASP.NET
development. But I’ve been kinda lazy and never did so, today I read a blog
post from HIM asking about some stuff , you should read here.
After
playing around with all free resources online to see coding techniques and
styles (forums, tutorial, blogs, Starter “piece of s***” kits, I easily can say
that Mads’ blog is the best asp.net blog around, why? Because if you see around
and read a lot of asp.net blogs an related technologies, forums, you can find
good code but NEVER believe me NEVER the complete solution, or not a quality solution,
and to make it even better Mads “KISS” approach just make his blog articles perfect.
I understand that people shouldn’t give away everything they know, that
everybody’s problem to decide to share or not.
Small,
concise, ready for deployment in must cases,
an the best of all, HE SHARES real solutions for real problems on real
scenarios, his code snippets are piece of gold when you have the enough
criteria to judge. I don’t want to sounds like a biased person, I don’t
know Mads personally but I bet you he is a great person why? Because persons
who SHARE KNOWLEDGE, - not just simple knowledge –I’m talking about real
knowledge, is great people. I invite you to read his blog everyday and if you
can donate when find something useful I encourage you to do so ( I should do
that to J) Read all the post
Mads wrote, I guarantee you that would be amazed to read all that valuable
asp.net an C# stuff.
I will make
a resource or blogs list that I read everyday that keeps me on track on latest
news, trends etc, related to an asp.net developer but now I just feel necessary
to give Mad something small back compare to his great knowledge.
As I said
Mads code snippets and opinion rocks, and here is my favorite ones.
Latest: http://madskristensen.dk/blog/Search+Engine+Positioner.aspx
Search Engine Positioner, I saw this yesterday
and now is used in our marketing department, very valuable tool for SEO
(search engine optimization) , Mads If you read this, this is my "wish a song", Proxy settings, to use 3rd party proxies, this
is very useful when doing SEO out of the US because Search engines give results
depending on your IP country so if you do search engine marketing for another country
rather that yours ( in my case Costa Rica) that would be very valuable).
Some other
favorites:
And many more, if you put together all the code Mads
provides you can build a great software library to a small general purpose web
shop.
Thanks for all, Mads keep sharing, keep rocking!
Visit the
.NET SLAVE BLOG now !
Microsoft and ArtinSoft published the Upgrade guide for Visual Basic 6 to .NET. This guide is been re-purposed as a list of FAQs to easily search and allow programmers and managers to find out about the best practices when planning a migration project from VB to Visual Basic .NET 2005.
The first two chapters are out, more will come in the next weeks.
"The purpose of these pages is to provide a comprehensive FAQ for the Upgrading Visual Basic 6.0 to Visual Basic .NET and Visual Basic 2005 guide. This VB migration material was developed jointly by Microsoft and ArtinSoft, a company with vast experience in Visual Basic conversions and the developer of the Visual Basic Upgrade Wizard, the Visual Basic 6.0 Upgrade Assessment Tool, the Visual Basic Upgrade Companion and the ASP to ASP Migration Assistant, among other software migration products."
Link to Upgrading VB6 to .NET – migration guide FAQ
The other day I heard a joke about project managers told by John Valera, one of the Project Management professors at Costa Rica's Universidad Nacional, so I wanted to share it in this space. I’m not really good at telling jokes, but here it goes…
There was a big project which had three key team members: a software architect, a QA leader and a project manager. These three guys used to go together for a walk after lunch, to relax and talk about the project. One day, they came across an old lamp and when they picked it up, a genie appeared and said:
- “You have awaken me. I’m supposed to grant you three wishes, but since you are three, I will grant a wish to each one of you”.
First, the QA leader said:
- “I wish to flee to some place where I can have all the money that I want, and spend it in whatever I want!” Suddenly, he disappeared and became a rich man in Las Vegas.
Then came the software architect:
- “I wish to flee to some place where I don’t have to worry about anything, and I can have all the fun in the world!” Suddenly, he disappeared and found himself walking in the beautiful beaches of Rio de Janeiro.
At the end, it was the project manager’s turn. With no need for extra thinking, he just said:
- “I wish to have those two guys back at work by 2:00 PM!!!!” :)
Sample Code to Print in Java
import
java.io.ByteArrayInputStream;
import javax.print.Doc;
import javax.print.DocFlavor;
import javax.print.DocPrintJob;
import javax.print.PrintService;
import javax.print.PrintServiceLookup;
import javax.print.SimpleDoc;
import javax.print.attribute.HashPrintRequestAttributeSet;
import javax.print.attribute.PrintRequestAttributeSet;
public
class Class3 {
static String textToPrint = "Richard North Patterson's masterful portrayals of law and politics at the apex of power have made him one of our most important\n" +
"writers of popular fiction. Combining a compelling narrative, exhaustive research, and a sophisticated grasp of contemporary\n" +
"society, his bestselling novels bring explosive social problems to vivid life through characters who are richly imagined and\n" +
"intensely real. Now in Balance of Power Patterson confronts one of America's most inflammatory issues-the terrible toll of gun\n" +
"violence.\n\n" +
"President Kerry Kilcannon and his fiancée, television journalist Lara Costello, have at last decided to marry. But their wedding\n" +
"is followed by a massacre of innocents in a lethal burst of gunfire, challenging their marriage and his presidency in ways so shattering\n" +
"and indelibly personal that Kilcannon vows to eradicate gun violence and crush the most powerful lobby in Washington-the Sons of\n" +
"the Second Amendment (SSA).\n\n" +
"Allied with the President's most determined rival, the resourceful and relentless Senate Majority Leader Frank Fasano, the SSA\n" +
"declares all-out war on Kerry Kilcannon, deploying its arsenal of money, intimidation, and secret dealings to eviscerate Kilcannon's\n" +
"crusade and, it hopes, destroy his presidency. This ignites a high-stakes game of politics and legal maneuvering in the Senate,\n" +
"the courtroom, and across the country, which the charismatic but untested young President is determined to win at any cost. But in\n" +
"the incendiary clash over gun violence and gun rights, the cost to both Kilcannons may be even higher than he imagined.\n\n" +
"And others in the crossfire may also pay the price: the idealistic lawyer who has taken on the gun industry; the embattled CEO\n" +
"of America's leading gun maker; the war-hero senator caught between conflicting ambitions; the female senator whose career is at\n" +
"risk; and the grief-stricken young woman fighting to emerge from the shadow of her sister, the First Lady.\n\n" +
"The insidious ways money corrodes democracy and corrupts elected officials . . . the visceral debate between gun-rights and\n" +
"gun-control advocates . . . the bitter legal conflict between gun companies and the victims of gun violence . . . a\n" +
"ratings-driven media that both manipulates and is manipulated - Richard North Patterson weaves these engrossing themes into an\n" +
"epic novel that moves us with its force, passion, and authority.";
public static void main(String[] args) {
DocFlavor flavor = DocFlavor.INPUT_STREAM.
AUTOSENSE;
PrintRequestAttributeSet aset =
new HashPrintRequestAttributeSet();
/* locate a print service that can handle it */
PrintService[] pservices = PrintServiceLookup.lookupPrintServices(flavor, aset);
/* create a print job for the chosen service */
int printnbr = 1;
DocPrintJob pj = pservices[printnbr].createPrintJob();
try {
/* * Create a Doc object to hold the print data.
* Since the data is postscript located in a disk file,
* an input stream needs to be obtained
* BasicDoc is a useful implementation that will if requested
* close the stream when printing is completed.
*/
ByteArrayInputStream fis =
new ByteArrayInputStream(textToPrint.getBytes());
Doc doc =
new SimpleDoc(fis, flavor, null);
/* print the doc as specified */
pj.print(doc, aset);
}
catch (Exception ex){
ex.printStackTrace();
}
}}
TIP: If you are just testing create a Printer. Just go to Add Printers, select FILE for port and Manufacturer Generic and TEXT Only
Today I decided to test out the Volume Shadow Copy (VSS) support in Virtual Server 2005 R2. In theory, as I mentioned in an earlier post, with VSS, virtual server can create a consistent “snapshot” of a running virtual machine so other applications, such as backup clients, can use that snapshot without interrupting the virtual machine itself.
The only VSS-aware backup application I had installed was Windows’ very own NTBackup. So, I enabled VSS on the volumes, run NTBackup, and proceed to make a backup of my virtual machine. Everything started out OK, until NTBackup got stuck with the message “Waiting to retry shadow copy…”. Following my standard error-solving checklist, I checked the Event Viewer, and I found this message logged:
Volume Shadow Copy Service error: Error calling a routine on the Shadow Copy Provider {f5dbcc43-b847-494e-8083-f030501da611}. Routine details BeginPrepareSnapshot({f5dbcc43-b847-494e-8083-f030501da611},\\?\Volume{0cb1b616-8ea6-11db-88de-806e6f6e6963}\) [hr = 0x80070002].
We use Acronis imaging solution for deploying our server, and it turns out that Acronis’ VSS Provider has an issue with Microsoft’s VSS provider. Apparently the issue is well-known, and is documented in two forums posts. It is solved in the latest version of Acronis’ products, but I didn’t really had time to perform an upgrade (and Acronis’ products are notoriously stubborn when you try to uninstall them). So, I applied the solution suggested in one of the forum posts. I unregistered Acronis’ VSS provider using the command:
regsvr32 /u \windows\system32\snapapivss.dll
After that, the backup went without problems:
Opening up the log once the backup is complete shows you that all files from the virtual machine were backed up succesfully:
This was done without turning the virtual machine off, taking advantage of the VSS functionality in Virtual Server 2005 R2 SP1 Beta. I performed the same operation on a Windows XP box, disabling NTBackup’s VSS support, and the backup predictably failed.
Here’s some information on VSS: Volume Shadow Copy Service (VSS)
We continue our regular series of
posts talking about refactoring of Web Pages based on semantic approaches; we
invite the interested new reader to take a look at the previous contributions
to get a general picture of our intentions.
In this particular and brief post,
we just want to present and describe some simple but interesting empirical data
which are related with the structural (syntactic) content of some given muster
of pages we have been analyzing during the last days. The results are part of a
white page we are preparing, currently; it will be available at this site in
short time.
We may remember from our first
post that we may want to recover semantics from structure given particular clues
and patterns we usually may come across when analyzing pages. The approach is
simpler to describe than to put into practice: Once semantics could be somehow detected,
refactoring steps can be applied on some places at the page and, by doing so, some
expected benefits can be gained.
However, syntactic structure is
the result of encoding some specific semantics and intentions on a web page
using HTML elements and functionality; the HTML language is (expressively
speaking) rather limited (where too much emphasis on presentation issues is the
case, for instance) and some common programming “bad practices” increase the
complexity of recovering semantics mainly based on syntactic content as input.
And being HTML quite declarative, such complexity can make the discovering
problem quite challenging in a pragmatic context, indeed. That is our more
general goal, however, we do not want to go that far in this post, we just want
to keep this perspective in mind and give the reader some insight and data to
think about it. We will be elaborating more on recovering in forthcoming posts.
As usual in NLP field, it is
interesting to use the so-called Noisy-Channel model as point of reference and
analogy. We may think of the initial semantics as the input message to the
channel (the programmer); the web page is the output message. The programmer
uses syntactic rules to encode semantics during coding adding more or less noisy
elements. Different encodings forms do normally exist, noisy can be greater
when too much structure is engaged for expressing some piece of the message.
A typical example of noisy
encoding is the use of tables for handling style, presentation or layout
purposes beyond the hypothetically primary intention of such kind of table
element: just to be an arrange of data. Complex software maintenance and sometimes
lower performance may be a consequence of too much noise, among others matters.
Let us take a look at some data
concerning questions like: how much noise in page? What kind of noise? What
kind of regular encodings could be found?
As a warning, we do not claim
anything on statistical significance because our muster is clearly too small
and was based on biased selection criteria. Our results are very preliminary,
in general. However, we feel they may be sound and believable, in some way
consistent with the noisy model.
Our “corpus” comes from of 834
pages which were crawled starting for convenience at a given root page in Costa Rica,
namely: http://www.casapres.go.cr/. The size depended of a predetermined
maximal quantity of 1000 nodes to visit; we never took more than 50 paths of
those pointed in a page and we rather preferred visiting homepages to avoid
traps.
Let us see some descriptive profile
of the data. For current limitations of the publishing tool, we are not
presenting some charts complementing the raw numbers.
Just 108 kinds of tags were
detected and we have 523.016 instances of them in corpus. That means, very
roughly, 6 kinds of tags per page, 627 instances per page. We feel that suggests the
use of the same tags for saying probably different things (we remark that many
pages are homepages for choice).
The top 10 of tags are: pure text,
a, td, tr, br, div, li, img, p and font (according to absolute frequency). Together
text, a (anchor) and img correspond to more than 60% all instances. Hence 60%
of pages are some form of data.
We notice that ‘table’ is 1% and
td 8.5% of all instances, against 42% from text, 15% from anchors. In average,
we have 7 tables per page and 54 tds per page, 6 td per table, roughly
speaking.
Likewise we just saw 198
attributes and 545.585 instances of attributes. The 10 most popular are: href,
shape, colspan, rowspan, class, width, clear and height, which is relatively consistent
with the observed tag frequency (egg. href for anchor, colspan and rowspan for
td).
We pay some special attention to
tables in the following lines. Our corpus has 5501 tables. It is worth to
mention that 65% of them are children of td; in other words nested into another
table. Hence a high proportion of nesting which suggests complexity in table
design. We see that 77% of data (text, a, img) in muster are dominated by tds
(most of the data is table dominated). In the case of anchors, 33% of them are
td-dominated, what may suggest tables being used as navigational bars or similar semantic
devices in an apparently very interesting proportion.
We decided to explore semantic
pattern on tables a little bit more exactly. For instance, we choose tables of nx1
dimension (n rows, 1 column) which are good candidates for navigational bars. A
simple analysis shows that 618 tables (11%) have such a shape. The shape may be
different which is quite interesting. For instance, we see a 5x1 table where all td
are anchors. We denote that but a sequence of 1 and 0, where 1 means the
corresponding td contains an anchor (a link to some url): in this case ‘1.1.1.1.1’
is the sequence. But another table of the same 5x1 size presents the pattern ‘1.0.1.0.1’.
This same pattern occurs several times for instance in 50x1 table. Another case
is this: ‘0.0.0.0.1.1.1.1.1.0’ maybe suggesting that some
links are not available. We mention that 212 patterns are 1x1, which would be a
kind of navigation button. We will present more elaborated analysis of this
table patterns in the following post.
To finish, we notice that 875 tables (16%) are not regular:
some rows have different size. Some of them are very unusual like in this 28x8
table, where each number in following sequence denotes the size id tds of the
row: 4.4.4.6.8.8.7.2.8.4.4.6.6.6.6.5.4.5.5.5.5.5.5.5.5.5.5.1.
Noisy, isn’t it?
End users are sometimes ignored when planning a migration project. Traditional software development methodologies often lack an appropriate level of involvement from the end user, and this can limit end-user satisfaction with the final product. Before you begin a migration, it is important that you understand the needs of the users of the original application: after all, they are the ones who will use the migrated application in their everyday activities. Be sure to gather the following information on the perception that end users have on the original application:
-
Features that the users dislike: sometimes the users consider that certain features of the original application are not suited to their needs, or should be improved. If this is the case, you will be migrating something that the users don’t like, so you can expect the same disapproval when you finish the migration. Because of this, it’s a good idea to make the necessary improvements after you reach Functional Equivalence on the target platform. On certain cases, rewriting those particular features or modules can be a good option too.
-
Features that the users depend on: in several applications, you will find that there are features the users can’t live without, and even the slightest change of functionality could cause a problem. For example, in a data-entry form that is designed for fast-typing users entering lots of information, something as simple as changing the TabOrder of the form controls could be disastrous.
Of course this list is not exhaustive, so be sure to involve the end users form the beginning of the project and gather enough information from them. Whenever possible, make their needs part of the requirements for the migration or the post-migration phases.
Este viernes termina nuestro segundo sprint, que empezó a un tanto abruptamente hace dos lunes. ¿Qué pasó? Érase el martes de la primer semana del que originalmente era el segundo sprint, cuando apareció en nuestro Outlook un tímido mail de QM contando que acababa de salir un nuevo release de PR. ¿Qué es PR? Un producto que en un principio parecía ser un acérrimo competidor, después cambiamos el rumbo con un buen diferenciador...¡y de repente salen con una versión que parece una copia de nuestro Product Backlog! Emergencia, gritos, llantos y la crisis que es oportunidad. Miércoles reunión a las corridas con ZF y AC. Baraja que te baraja alternativas y decidimos volver a reunirnos el jueves, que también resultó a puro debate. Ya pasado el mediodía decidimos apretar el botón rojo: abnormal sprint termination. El viernes lo usamos para preparar entre todos algunas User Stories y el lunes siguiente, a correr. A ver qué nos depara el Review del 2/2...
All of you are probably aware that you can download MSDN Pre-Configured Virtual Machine Images and of configurations you can get with the VHD Test Drive. There is another option, though, if you want to evaluate a Windows 2003 R2 installation by itself on a virtual machine or as a host for Virtual Server 2005 R2. You can get a 180–day evaluation of Windows 2003 Server R2 at the trial software page over at Microsoft. This makes it easier to evaluate the performance of the server product, for virtualization, or for any other tasks that you may be considering it.
Link: Windows Server 2003 R2: How to Get Trial Software
There are many many alternatives out there that will assist you to migrate a Physical machine to a Virtual - heck, even NT Backup can be used to accomplish this. The supported procedure recommended to carry out this procedure is to use ADS to create an image of the source machine and then dump it to a Virtual Machine. I am currently testing this procedure and trust me, it is not a straightforward one.
Given the choice, I would recommend any other approach when carrying out a P2V migration. VMWare currently released they migration utility that allows to move physical machines to virtual ones. It even goes the extra mile and imports various virtual machines from other solutions such as Microsoft's Virtual Server.
This is perfect for users of VMWare, but what about if you want to carry a P2V migration to the Virtual Server format? Well, you can still carry this out by using VMWare's tool and then using this utility to convert from VMWare to Virtual Server format. Not the cleanest solution, but I guess this is a perfect example in which the ends justify the means ;)
Como dije, mucho no voy a contar, aunque cuente mucho. Llamemos al proyecto en el que estamos embarcados
CC. El nosotros ya es, como la realidad, complejo de describir: quien suscribe cumple el rol de ScrumMaster,
ZF es Product Owner y el Equipo lo integran, por ahora y solamente por ahora,
AD,
MC y
MR. Estamos trabajando en conseguir por lo menos 5 personas más. Además está
AC, que participa del lado del Product Owner, aunque no puede dedicar mucho tiempo al proyecto. Y además
LC, que va a integrar un equipo aparte, que va a tomar tareas de investigación. Y
QM, que del dominio sabe un montón. Y
SL, que está con temas de marketing. Pero en lo que a mí respecta todos estos últimos no son más que stakeholders. Importantes, cruciales, pero siguen siendo
chickens.
El objetivo es poder tener un beta "lo antes posible". La idea es salir con la versión 1.0 a mediados de año. Como creo haber contado antes, lo que tenemos entre manos es un producto masivo. Lo que se dice software enlatado, aunque la imagen evoque conservas o duraznos en almíbar. El lugar de trabajo por ahora son las oficinas de Sabana Norte, aunque en breve nos vamos a estar mudando a la zona este, por Curridabat. Por ahora tenemos algo bastante parecido a un "team room" (belicosamente conocido también como "war room"), del que espero poder postear algunas fotos en breve. Para realizar el tracking de product backlog items y tareas estamos usando básicamente post-its y papelitos. Estamos probando hacer iteraciones de 2 semanas de duración. El Daily Meeting se hace todos los días a las 13:30hs y está durando unos 8 minutos más o menos. El Sprint Planning Meeting está timeboxeado en 2 horas (1 hora con el Product Owner y 1 hora solamente para el equipo), el Sprint Review Meeting en 1 hora y la Retrospectiva también en 1 hora. Para las llegadas tarde al Daily Meeting tenemos una cajita que usamos de alcancía y que bautizamos "el chanchito". Mandé a comprar un cronómetro para las reuniones de lo más chulo.
(suspiro)...creo que con esos datos ya se pueden ir ubicando...